Talk:Wireless bonding

From ArchWiki
Latest comment: 7 February 2016 by Lahwaacz in topic Style remarks

Style remarks

Bonding is presented as if it applied only to "wireless + wired" interfaces, but this is not required.

For a more thorough exploration of "channel bonding", please read the two pages linked in the first sentence, https://www.kernel.org/doc/html/latest/networking/bonding.html and https://en.wikipedia.org/wiki/Bonding_protocol and additional links therefrom.

Presents only one specific set of tools (not the most generic one) to achieve the goal, without mentioning the alternatives.

There is a netctl solution discussed at https://wiki.archlinux.org/index.php/Netctl , Tips and tricks/Bonding/Wired to wireles failover, as you know, that does not rely exclusively on systemd. That approach needs additionally the netctl package, of course, and the ifenslave package, and also requires customized configuration files and systemd Service Unit Files. I do not know that it will seamlessly handle "hot plugging" wireless devices without the same customizations discussed here.

This article was originally titled "The kernel bonding module and systemd" before you titled it more generally. The article is specifically about using systemd and the kernel bonding module to automatically configure a wireless network that happens to involve a "bonded" wireless interface. Perhaps we can collect here sections, to also cover other approaches to wireless bonding - or that is what you are saying?

If you know of other "bonded" Automatic Wireless Network Configuration solutions with which you have had a successful experience, please add links.

I did also try building this system using dhcpcd, instead of dhclient and wide-dhcpv6AUR, and libteam, instead of the standard kernel "bonding" module, but I find them generally quite "fragile" at best, and could not get them to run at all in this specific context.

For starters, IPv6 only complicates things.

Are people still using IPv4, in this day and age? ;)

ArchWiki is not a blogging platform.

Quite. It is a wiki, and presumably a place where people come to gather ideas and concepts to implement solutions to specific problems or goals. That is the way I use it. And the ArchWiki is one of the best sources on the net for Linux solutions. While I understand that some people are only looking for specific instructions, other people are more conceptual, and are looking for abstract models, both to understand how things work, and to understand how to extend solutions to more complex circumstances. I am sure other people will appreciate the concepts discussed here.

I suspect that there are still many people new to "systemd" and its ways. Its adoption was quite contentious and presents a steep learning curve.

Thx1138 (talk) 19:32, 4 February 2016 (UTC)Reply[reply]

this would belong to rfkill, but if the driver *had* a bug it is not relevant anymore
Such optimism! The ath5k driver appears to have been fixed in linux 4.4.1, but it will still *have* a bug in any previous kernel. More significantly, the ath5k driver is only one of over 100 wireless drivers provide with the linux kernel, and I have not tested them all. The recovery technique may or may not work in any particular case, but the technique is the same for all drivers. It seems to me that it is a disservice to people, to hide this information, especially considering the mood they will be in, should they have reason to go searching for it.
—This unsigned comment is by Thx1138 (talk) 17:44, 7 February 2016‎. Please sign your posts with ~~~~!
It would have been reported on the main Wireless network configuration page ages ago, if it was as serious as you think. It seems that the current information in Wireless network configuration#Rfkill caveat (especially the note at the bottom of the section) is sufficient. -- Lahwaacz (talk) 20:11, 7 February 2016 (UTC)Reply[reply]

wpa_supplicant

The wireless interface is added to the bond0 "bridge" manually, so why does wpa_supplicant need the -b switch?
If it doesn't, wpa_supplicant can be run by a separate service to reduce the bloat (systemd can handle the dependencies and ordering).

In practice, this -b bond0 switch is required when using EAPOL authorizaion, but it also does no harm when not using EAPOL.

man wpa_supplicant
If the interface is added in a Linux bridge (e.g., br0), the bridge
interface needs to be configured to wpa_supplicant in addition to the main interface:
wpa_supplicant -cw.conf -Dnl80211 -iwlan0 -bbr0
2008
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483207
Also, it is very important that -b bond0 be used, even though -b is only mentioned in TFM
as for a "bridge." It should also be mentioned as required for a bond.
2009
http://lists.shmoo.com/pipermail/hostap/2009-February/019232.html
On Tue, 03 Feb 2009 13:53:30 +0200, Jouni Malinen wrote:
Do you tell wpa_supplicant about the bonding device? I think that
bonding behaves similarly to bridging in the sense that the EAPOL frame
reception needs special handling and you will likely need to add
-b<bridge/bond ifname> to the wpa_supplicant command line.
2015
./wpa_supplicant/wpa_supplicant_i.h
       /**
        * bridge_ifname - Optional bridge interface name
        *
        * If the driver interface (ifname) is included in a Linux bridge
        * device, the bridge interface may need to be used for receiving EAPOL
        * frames. This can be enabled by setting this variable to enable
        * receiving of EAPOL frames from an additional interface.
        */
       const char *bridge_ifname;
https://en.wikipedia.org/wiki/IEEE_802.11i-2004
The initial authentication process is carried out either using a pre-shared key (PSK), or following an EAP exchange through 802.1X
(known as EAPOL, which requires the presence of an authentication server).
https://en.wikipedia.org/wiki/IEEE_802.1X
IEEE 802.1X defines the encapsulation of the Extensible Authentication Protocol (EAP) over IEEE 802,[1][2] which is known as
"EAP over LAN" or EAPOL.[3]

Yes, you could conceivably separate the Service Unit Files, though whether that is simpler or "reduces bloat" is a matter of opinion. Personally, I do not like "scattered" or "spaghetti" configurations, where the code being referenced is not present for inspection in context. But still, as you point out, with -b bond0 required for EAPOL, a second parameter would need to be passed to a custom external wpa_supplicant Service Unit File, with the value hard coded or using an Environment or EnvironmentFile variable, so there is nothing to be gained there.

NOTE - do NOT "BindTo" a device that will also be deleted in this Unit File! - then the conflicting BindsTo should be removed, since each unit should have fully automatic teardown.

This is not an issue with the BindTo per se, which is not in conflict with anything. Rather, deleting a Device Unit File can cause the whole bonding setup to fail. in particular, during a systemctl restart ..., because of idiosyncrasies with Starting, Stopping, and ReStarting the DHCP clients using systemd. It was probably not clear that the BindTo is defined in the Wanted or Required Unit File and sets-up a dependency from that Unit File back to the defining Unit File. This referred to the DHCP clients binding to the bond0 network interface. There are also BindTo dependencies upon the bond0 and wireless interface Device Unit Files, but these are not what cause trouble with the ReStart.

If the bond0 interface is torn-down, and the IP addresses are lost, in a systemctl restart ..., properly restarting the DHCP servers appears to be very problematic. In particular circumstances, systemd does not Stop and Start a "Wanted" or "Required" supporting Service Unit during a ReStart, since that Unit is assumed to be already running. So, when the bond0-supplicant@.service Unit File tears-down the bond0 interface, and ReStarts there will be no DHCP clients and no IP addresses. I will argue that this appears to be a bug in systemd, and will investigate. It seems reasonable to expect that a "Start" in the "Stop-Start" sequence of a "ReStart" should start a wanted or required Service Unit File that is known to be inactive, instead of stopping the starting Unit.

What exactly does it mean to tear-down? What if the bonding module or the DHCP client processes are managing other interfaces? It is not quite a simple problem.

Be that as it may, if the BindTo reference to the bond0 Device Unit File is removed, then, if the bond0 interface were to "disappear" - to use the systemd lingo - the DHCP client Service Units would go on running as if nothing had happened, but with their interface gone. On the other hand, if the DHCP client or DHCPv6 client should "die" or be "Stopped" or be "killed", wpa_supplicant and the wireless link and the wired link and the active backup and all of the status commands for the active-backup will be undisturbed and continue to function smoothly, but now with nonexistent DHCP clients.

DHCP is a bit odd in that, when it disappears after an IP address has been configured, there is no immediate problem and the link remains functional. Conversely, if the DHCP clients are running, and the IP addresses disappear, the DHCP clients are oblivious and do nothing. Similarly, if the bond0 interface is torn down and then rebuilt, the DHCP clients are again oblivious and do nothing, except that now there are no IP addresses, and the link is nonfunctional. That is a problem.

Further, for the DHCP clients to exit cleanly, they must be signaled before the network link goes down, or they will hang while trying to release their IP address leases, which will ultimately fail, before timing-out.

With dhcp6c, there is the dhcp6ctl tool, which will allow the bond0 interface to be reconfigured without Stopping or killing the process, but only assuming that a keyfile has been configured, which is an extra, not otherwise necessary step for the administrator. For dhclient, there is "dhclient -r -pf /run/dhclient-bond0". Or there is "/usr/bin/sh -c "kill `cat /run/dhc...-bond0`". But these seem to exit immediately, rather than actually waiting for the DHCP processes to terminate, allowing the following "wpa_cli -i %I terminate" to run before the DHCP clients have actually finished. Alternatively, "systemctl stop ..." waits for the DHCP processes to exit properly before terminating wpa_supplicant, but then will not allow the DHCP clients to start again on a "systemctl restart ...", which is sort of where this thinking started.

With that said, I notice that the DHCP clients should be properly Stopped before terminating wpa_supplicant, to formally release the IP address leases. I have found no effective way to both allow systemd to manage the DHCP clients and properly Start them during a systemctl restart ... event. But I found that using Requires instead of Wants on the DHCP clients converts Restart cleanly into a Toggle for bond0-supplicant@.service, and allows the set-up to be torn-down as much as desired. That seems to be a useful trade-off.

Just link to wpa_supplicant where all necessary details are.

The whole point of the article is to pull-together all the required pieces into one pile, to be able to see them as one whole, a gestalt, a formula, not to scatter them all over the wiki. This is not an Easter Egg Hunt! Besides, the section is quite small, but it does need the link to the wpa_supplicant article.

Thanks for adding the Table of Contents.

Thx1138 (talk) 07:29, 7 February 2016 (UTC)Reply[reply]

You can have perfect restarting of the DHCP clients if you add PartOf=bond0-supplicant@<interface>.service to the services. Splitting the bonding and wpa_supplicant into their own services would also help with this problem - I guess you only need to restart bond0-supplicant@.service because of wpa_supplicant, not because of the bonding. Most of your problems, including the things you call "systemd's idiosyncrasies", seem to come from misunderstanding or ignoring the idea of systemd units: a service should do one thing, manage one resource, and not stack everything together. Use dependencies for grouping. Create configuration files (for EnvironmentFile) to avoid values hardcoded at multiple places.
I can see that you like to "pile things together", not only the configuration, but also the text on the wiki page. I'd imagine that when people configure bonding, they don't start from scratch. They first configure the the connection for each interface to see that it works, and only then try to bond them together. They will see the generic pages (Network configuration and Wireless network configuration) first, then come here and see a lot of information duplicated (DHCP and wpa_supplicant). Generally, this wiki works the way that when multiple topics overlap, the intersection gets its own page which is referenced from relevant places. This page should do the same, but if you intend to keep the pile in one piece to see it as a whole, please feel free to use your userpage for this purpose.
-- Lahwaacz (talk) 12:58, 7 February 2016 (UTC)Reply[reply]