network:troubleshooting_ethernet_bridging
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
network:troubleshooting_ethernet_bridging [2019/11/30 18:36] – removed peter | network:troubleshooting_ethernet_bridging [2020/07/23 01:10] (current) – old revision restored (2016/07/07 15:46) 158.69.243.99 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Network - Troubleshooting Ethernet bridging ====== | ||
+ | |||
+ | To diagnose problems arising from use of the Linux bridge module. | ||
+ | |||
+ | |||
+ | ===== Background ===== | ||
+ | |||
+ | An Ethernet bridge (or switch) is a device for forwarding packets between two or more Ethernets so that they behave in most respects as if they were a single network. | ||
+ | |||
+ | |||
+ | ===== Symptoms ===== | ||
+ | |||
+ | The most likely symptoms of a bridging problem are that: | ||
+ | |||
+ | * the bridge does not forward traffic, | ||
+ | * the bridge forwards traffic intermittently, | ||
+ | * the bridge causes a storm of duplicate traffic, or | ||
+ | * the machine hosting the bridge appears to freeze. | ||
+ | |||
+ | |||
+ | ===== Investigation ===== | ||
+ | |||
+ | ==== Strategy ==== | ||
+ | |||
+ | If the bridge is not forwarding traffic then there are at least six possibilities to consider: | ||
+ | |||
+ | * The bridge has not been created. | ||
+ | * The appropriate interfaces have not been attached to the bridge. | ||
+ | * The bridge or the attached interfaces are not in the ' | ||
+ | * The bridge ports are not in the ' | ||
+ | * The traffic to be bridged is not reaching the relevant interface. | ||
+ | * The traffic is being filtered by a firewall. | ||
+ | |||
+ | Intermittent forwarding usually has some form of intermittent connectivity as its root cause, however there are two ways in which the use of bridging can exacerbate what might otherwise have been a less serious problem: | ||
+ | |||
+ | * If STP is enabled then the spanning tree may become unstable due to the topology changing faster than the tree can converge. | ||
+ | * Even without STP, the bridge forwarding delay typically adds 15 seconds to the recovery time for even the briefest of outages. | ||
+ | |||
+ | If the problem is likely to reoccur frequently then it may be possible to tune the bridge parameters so that the network is more resilient to outages of this nature. | ||
+ | |||
+ | A storm of duplicate traffic almost certainly indicates that the network contains one or more loops. | ||
+ | |||
+ | * finding the loops and breaking them manually, or | ||
+ | * enabling STP (the Spanning Tree Protocol) or an equivalent, which automatically disables any link that would cause a loop. | ||
+ | |||
+ | **WARNING**: | ||
+ | |||
+ | If the machine appears to freeze after adding a network interface to a bridge then this could be because: | ||
+ | |||
+ | * you are administering it remotely via that interface (for example using SSH), or | ||
+ | * the machine depends on that interface for vital services (for example NFS or LDAP). | ||
+ | |||
+ | Removing the interface from the bridge will solve the immediate problem. | ||
+ | |||
+ | Remember that rule changes made using the **brctl** or **ifconfig** commands are not persistent. | ||
+ | |||
+ | |||
+ | ===== Check that the bridge has been created and the appropriate interfaces attached to it ===== | ||
+ | |||
+ | A list of bridges can be displayed using the **brctl show** command: | ||
+ | |||
+ | <code bash> | ||
+ | brctl show | ||
+ | </ | ||
+ | |||
+ | the output from which should be of the form: | ||
+ | |||
+ | < | ||
+ | bridge name | ||
+ | br0 | ||
+ | eth1 | ||
+ | </ | ||
+ | |||
+ | Verify that the bridge exists, has the name you expect, and is attached to the appropriate interfaces. | ||
+ | |||
+ | |||
+ | ===== Check whether the bridge and attached interfaces are up or down ===== | ||
+ | |||
+ | Bridges, like network interfaces, have an ' | ||
+ | |||
+ | <code bash> | ||
+ | ifconfig br0 | ||
+ | </ | ||
+ | |||
+ | Here is an example of the output from this command for an interface that is down, with the relevant line highlighted: | ||
+ | |||
+ | < | ||
+ | br0 Link encap: | ||
+ | BROADCAST MULTICAST | ||
+ | RX packets:0 errors:0 dropped:0 overruns:0 frame:0 | ||
+ | TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 | ||
+ | collisions: | ||
+ | RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) | ||
+ | </ | ||
+ | | ||
+ | and for the same interface when up: | ||
+ | |||
+ | < | ||
+ | br0 Link encap: | ||
+ | inet6 addr: fe80:: | ||
+ | UP BROADCAST RUNNING MULTICAST | ||
+ | RX packets:0 errors:0 dropped:0 overruns:0 frame:0 | ||
+ | TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 | ||
+ | collisions: | ||
+ | RX bytes:0 (0.0 B) TX bytes:168 (168.0 B) | ||
+ | </ | ||
+ | | ||
+ | If the bridge needs to be brought up then this can be done using the **ifconfig** command: | ||
+ | |||
+ | <code bash> | ||
+ | ifconfig br0 up | ||
+ | </ | ||
+ | |||
+ | The same considerations apply to each of the attached Ethernet interfaces: these can be brought up or down independently of the bridge, and they will only pass traffic if they are up. | ||
+ | |||
+ | |||
+ | ===== Check whether the bridge ports are in the ' | ||
+ | |||
+ | At any given time, a Linux bridge port will be in one of five possible states: ' | ||
+ | |||
+ | <code bash> | ||
+ | brctl showstp br0 | ||
+ | </ | ||
+ | |||
+ | the output from which should be of the form: | ||
+ | |||
+ | < | ||
+ | br0 | ||
+ | | ||
+ | | ||
+ | root port | ||
+ | max age 20.00 | ||
+ | hello time 2.00 | ||
+ | | ||
+ | | ||
+ | hello timer | ||
+ | | ||
+ | flags | ||
+ | |||
+ | |||
+ | eth0 (1) | ||
+ | port id 8001 state forwarding | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | flags | ||
+ | |||
+ | eth1 (2) | ||
+ | port id 8002 state forwarding | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | flags | ||
+ | </ | ||
+ | |||
+ | The relevant fields have been highlighted. | ||
+ | |||
+ | ' | ||
+ | |||
+ | If you want a particular network segment to be used in preference to any other paths that might be available then there are two ways to achieve that safely: either change the network topology manually so that it becomes the only path, or adjust the STP path costs so that it becomes the cheapest path. Otherwise, be assured that the ‘blocking’ state is a normal part of the operation of STP and does not by itself indicate that there is a problem. | ||
+ | |||
+ | ' | ||
+ | |||
+ | ' | ||
+ | |||
+ | ' | ||
+ | |||
+ | |||
+ | ===== Check which MAC addresses have been seen by the bridge ===== | ||
+ | |||
+ | In the course of its operation a bridge must attempt to determine which MAC addresses are reachable through each of its attached interfaces. | ||
+ | |||
+ | <code bash> | ||
+ | brctl showmacs br0 | ||
+ | </ | ||
+ | |||
+ | The output is typically of the form: | ||
+ | |||
+ | < | ||
+ | port no mac addr is local? | ||
+ | 1 | ||
+ | 1 | ||
+ | 1 | ||
+ | 1 | ||
+ | 2 | ||
+ | 2 | ||
+ | 2 | ||
+ | </ | ||
+ | |||
+ | The value of this information for troubleshooting is that it tells you whether any packets from a given machine are being processed by the bridge. | ||
+ | |||
+ | * packets from the machine in question are not reaching the bridge for some reason; | ||
+ | * the receiving interface (see above); | ||
+ | * the bridge port is disabled (see above); or | ||
+ | * the address was in the table but has since expired. | ||
+ | |||
+ | Addresses typically expire after 5 minutes, so this is unlikely to be an issue if packets are being actively sent at the time you check the table, but it is a point to bear in mind if there has been any substantial delay between sending and checking. | ||
+ | |||
+ | |||
+ | ===== Check the Firewall ===== | ||
+ | |||
+ | ==== Check ebtables ==== | ||
+ | |||
+ | Ebtables is a packet filter that is similar in concept to iptables, except that it operates at the link layer rather than the network layer (acting on Ethernet frames as they are bridged as opposed to IP datagrams as they are routed). | ||
+ | |||
+ | If the ebtables command is available then you can view the rulebase using the **-L** option. | ||
+ | |||
+ | <code bash> | ||
+ | ebtables -t filter -L | ||
+ | </ | ||
+ | |||
+ | Normally you would expect this to be empty: | ||
+ | |||
+ | < | ||
+ | Bridge table: filter | ||
+ | |||
+ | Bridge chain: INPUT, entries: 0, policy: ACCEPT | ||
+ | |||
+ | Bridge chain: FORWARD, entries: 0, policy: ACCEPT | ||
+ | |||
+ | Bridge chain: OUTPUT, entries: 0, policy: ACCEPT | ||
+ | </ | ||
+ | |||
+ | The same applies to the nat and broute tables. | ||
+ | |||
+ | If the ebtables command is not installed then that strongly suggests ebtables is not being used, although it is conceivable that rules could have been added by some other means. | ||
+ | |||
+ | If the rulebase is non-empty then you can obtain some insight into what effect it might be having by inspecting the counters associated with each rule: | ||
+ | |||
+ | <code bash> | ||
+ | ebtables -t filter -L --Lc | ||
+ | </ | ||
+ | |||
+ | Each rule has two counters: pcnt (the number of packets) and bcnt (the number of bytes). | ||
+ | |||
+ | |||
+ | ===== Finding bridge loops ===== | ||
+ | |||
+ | Bridge loops are by their nature difficult to track down because the resulting packet storms will propagate throughout the entire network unless stopped. The packet source addresses are unlikely to be helpful because they say only where the traffic was originally sent from and not where it was replicated. | ||
+ | |||
+ | A more effective method is to partition the network until the symptoms disappear, then cautiously reconnect it one link at a time until they reappear. | ||
+ | |||
+ | A packet capture tool such as tcpdump or Wireshark can be used for monitoring. It does not matter greatly where this is attached, provided that you are not using bridges that have active protection against packet storms (see below). | ||
+ | |||
+ | You should also ensure that there is a source of broadcast traffic on the network, so that a packet storm will occur promptly whenever a loop is created. An ongoing attempt to ping a non-existant IP address on a local subnet will have the required effect. | ||
+ | |||
+ | If the act of reconnecting a network segment causes the symptoms to reappear then there are two possibilities to consider: | ||
+ | |||
+ | * The segment may form part of the loop that you are investigating, | ||
+ | * The loop may lie beyond that segment, in which case it existed throughout the test but was unreachable from the monitoring point while the segment was disconnected. | ||
+ | |||
+ | A characteristic of the first condition is that there will be connectivity between the two parts of the network even when the segment under test is disconnected. | ||
+ | |||
+ | If the loop lies beyond the disconnected segment then you can reconnect the remainder of the network then repeat this diagnostic procedure for the problematic region in isolation. | ||
+ | |||
+ | A complicating factor is that some networking equipment attempts to detect packet storms and actively protect against them, typically by disabling the port receiving the traffic. In the best case, where the loop is located at the edge of the network, this can both contain the effects of the loop and greatly simplify diagnosis (as the cause may be obvious once you know which port has been disabled). | ||
+ | |||
+ | |||
+ | ===== Using STP ===== | ||
+ | |||
+ | Rather than attempting to find and break loops manually you can use the Spanning Tree Protocol (STP) to achieve the same result automatically: | ||
+ | |||
+ | <code bash> | ||
+ | brctl stp br0 yes | ||
+ | </ | ||
+ | |||
+ | Ideally STP should be enabled on all bridges throughout the network. | ||
+ | |||
+ | For small networks STP should just work without further configuration. | ||
+ | |||
+ | |||
+ | ==== The machine appears to freeze ==== | ||
+ | |||
+ | As noted above, adding an interface to a bridge causes it to stop acting as an Internet Protocol endpoint. | ||
+ | |||
+ | * you are administering it remotely via that network interface, for example using SSH, or | ||
+ | * the machine depends on the network for vital services, for example NFS or LDAP. | ||
+ | |||
+ | The solution is to remove the interface from the bridge by the most graceful means possible. In order of preference: | ||
+ | |||
+ | - Log on using the console and issue a **brctl delif** command, for example **brctl delif br0 eth0**. | ||
+ | - Reboot the machine gracefully, for example by sending control-alt-delete to the console. | ||
+ | - Forcibly reboot the machine, for example by power-cycling it. | ||
+ | |||
+ | If the bridging commands have been inserted into the startup scripts then you will need to remove them. You may be able to do this by booting into a recovery mode or from a live CD, however for a remotely hosted machine you may have to resort to reimaging it (with loss of all data). | ||
+ | |||
+ | |||
+ | ===== See also ===== | ||
+ | |||
+ | * [[Network: | ||
+ | * Persistently bridge traffic between two or more Ethernet interfaces (Debian) | ||
+ | * Persistently bridge traffic between two or more Ethernet interfaces (Red Hat) | ||
+ | * Persistently bridge traffic between two or more Ethernet interfaces (SUSE) | ||
+ | |||
+ | |||
+ | ===== References ===== | ||
+ | |||
+ | * [[http:// | ||
+ | * [[http:// | ||
network/troubleshooting_ethernet_bridging.1575139005.txt.gz · Last modified: 2020/07/15 09:30 (external edit)