Effect of the Spanning Tree Protocol on SOSS


The ScaleOut service can be affected by managed switches that enable the Spanning Tree Protocol (STP). This is especially true as the ScaleOut service is starting up. Under normal conditions, when the ScaleOut service is starting up, it monitors the network for other SOSS hosts. If communication with other SOSS hosts is detected and established, these hosts create a store membership containing all active SOSS hosts.

In a network environment that has the STP enabled, the ScaleOut service is able to bind to its own network ports, but is initially unable to reach the other hosts. This could lead to the host being isolated, creating its own store and taking load. This could also lead to data loss. A typical symptom of this scenario is if there are multiple unexpected restarts of the ScaleOut service on hosts after one of the physical servers is rebooted.

If it is determined that the STP is enabled on network switch the SOSS hosts are connected to, there are two different ways you can mitigate the abovementioned issues:
  • Change the startup_delay parameter in the SOSS host configuration file (soss_params.txt) on all hosts. This delay is the amount of time (in seconds) that the ScaleOut service waits before attempting to use the network. Setting this parameter to 30 should normally be enough time for the network communication to be established.
  • Disable auto join by setting the auto_join parameter to 0 in the SOSS host configuration file on all hosts and then manually wait to join the host to an existing store until the network is fully operational.
You can read more about how to properly configure the STP here: http://www.networkworld.com/community/blog/9-common-spanning-tree-mistakes