Affected: Server-side SDKs, Relay Proxy
Symptoms
The following symptoms occur when initializing or occur intermittently while running the Relay Proxy/SDK when connected to LaunchDarkly:
- SDK serving stale variations
- Frequent messages indicating network issues
Cause
It's possible to create so many SDK instances that your application runs out of resources and cannot open new connections to LaunchDarkly.
When you see timeouts on a host, and the timeout is not reproducible using our hello-world locally or via our automated testing, then one explanation is a bottleneck somewhere, causing the requests to time out. There are three potential causes for bottlenecks:
- Too many SDK instances due to over-instantiation of SDK
- Too many streaming connections for the host due to containerization
- Too many streaming connections causing internal network congestion
Solution
It's best to identify if you're over-instantiating the SDK before setting up the Relay Proxy as a solution. Setting up the Relay Proxy will hide the symptoms of over-instantiating the SDK and can cause greater problems in the future.
Follow the troubleshooting recommendations below in the order listed:
1. Too many SDK instances due to over-instantiation of SDK
Issues only begin to occur when you create SDK instances excessively. This is typically pretty hard to detect if it's done accidentally. Typically, an SDK instance is created, but the reference to the SDK is lost and leaked into memory. These leaked SDK instances still fight to maintain their streaming connection. Over time, as the number of SDK instances grows, it reduces the available resources and makes streaming connections more likely to fail.
To address these concerns, make sure you implement the SDK as a singleton.
2. Too many streaming connections for the host due to containerization
If you have a containerized application, you may encounter a similar issue as stated above. In this setup, it's possible to have so many instances that it causes a bottleneck, and some streaming connections end up timing out. The solution is to use a node/host-level Relay Proxy. The Relay Proxy serves the LaunchDarkly endpoints that the SDK needs to operate and maintains a single streaming connection back to LaunchDarkly.
An example solution would be to set up a container hosting the Relay Proxy and connect all of the SDKs on that node to that container. Network requests between containers don't require as much overhead as when nodes connect to LaunchDarkly.
3. Too many streaming connections causing internal network congestion
While it's not unheard of, it is unlikely that you'll encounter network-level bottlenecks before running into node/host-level bottlenecks.
After implementing the node/host-level Relay Proxy solution, but are still encountering timeouts, you may be able to solve this by creating a network-level Relay Proxy to which you connect all of your nodes/hosts. This reduces your outgoing traffic down to one streaming connection.