I always enjoy a good troubleshooting challenge, and recently my good friend and colleage Martin Jalaf came up with an interesting one.
A customer was load testing a Web application. Like many others, it consisted of farm of load-balanced servers running an ASP.NET application front-end application, which then talked to a set of WCF services running on a second server farm. In both cases, an F5 device was used to load balanced HTTP traffic.
When the customer started increasing the load as part of their load tests, they would notice that calls to the WCF services would get slower, and would sometimes take upwards of 10 minutes to complete. However, the servers running the WCF services, as well as the SQL Server used showed little CPU usage.
Reviewing a few memory dumps of the front-end servers, all that could be observed was threads stuck calling the WCF service. Given this, our first suspicion was that WCF throttling was incorrectly configured. However, dumps of the WCF service revealed this was not the case. In fact, only a single call to the WCF service was running at any given time, and throttling settings were too high to be causing any issues. This could be easily observed using the excellent netext debugging extension:
0:000> !wservice Address State EndPoints BaseAddresses Behaviors Throttled Calls/Max Sessions/Max ConfigName,.NET Type 000000e9229c78d8 Opened 0n1 0n1 0n10 True 0n1/0n2000 0n0/0n400 "WCFSecurity.SecurityService",WCFSecurity.SecurityService
The WCF service was configured with InstanceContextMode=Single, and ConcurrencyMode=Multiple. However, this did not appear to be related at all. Another unexpected symptom was that despite the fact that the WCF servers were behind a load balancer device, all requests ended up being processed by the same server most of the time. I immediately suspected that the F5 device was configured with session affinity enabled, even though the customer had initially denied it.
Since the WCF service was exposed using an HTTP-based binding, I asked Martin to check the ASP.NET HttpContext instances, just in case they were somehow relevant. We immediately found that there were hundreds of HttpContext instances in the dump that had yet to complete!
0:000> !whttp HttpContext Thread Time Out Running Status Verb Url 000000e622a12850 -- 00:00:00 00:20:52 200 POST /WCFSecurity/SecurityService.svc 000000e622a18e58 -- 00:00:00 00:20:47 200 POST /WCFSecurity/SecurityService.svc 000000e622a23c58 -- 00:00:00 00:20:39 200 POST /WCFSecurity/SecurityService.svc ....
Looking closely, we realized that all HTTP calls were stuck in the AcquireRequestState stage of the ASP.NET pipeline, which meant we were actually observing an instance of the infamous ASP.NET Session Lock issue (see the section titled Concurrent Requests and Session State)!
Session: cq2uoplja4aqu2aoobw3f1pf HttpContext Thread Poll (ms) Time Path ================ ====== ========= ==== =================================== 000000e622a12850 500 1253 /WCFSecurity/SecurityService.svc 000000e622a18e58 500 1248 /WCFSecurity/SecurityService.svc 000000e622a23c58 500 1240 /WCFSecurity/SecurityService.svc 000000e622a25d50 500 1238 /WCFSecurity/SecurityService.svc 000000e622a27b60 500 1236 /WCFSecurity/SecurityService.svc 000000e622a283f8 500 1236 /WCFSecurity/SecurityService.svc ....
Why was ASP.NET Session being used here? The service was configured with:
<system.serviceModel> <serviceHostingEnvironment aspnetCompatibilityEnabled="true" /> </system.serviceModel>
This meant that the service was enabling access to the ASP.NET services, but the service itself was not using session state. What was really unexpected in this case was that all the requests from a single front-end server were ending up with the exact same ASP.NET session cookie, particularly because, by default, the client-side of WCF will not accept and send cookies at all.
Given the issues with load-balancing not working as expected, we also checked the cookies collection of the HttpContext, and found out that, indeed, the F5 server had session affinity enabled using cookies:
0x000000e922d5ced0 System.Collections.Specialized.NameObjectCollectionBase+NameObjectEntry 0000 Key : 000000e922d5cc48 "BIGipServerpl_ATV_qa_80"  (System.String) 0008 Value : 000000e922d5cc00 (System.Web.HttpCookie)
With all this data, the light bulb went on, and we realized that the WCF binding on the ASP.NET front-end application had to have the allowCookies property enabled! We looked at it, and, sure enough it was enabled. However, this only explained why session cookies were present, but not why why the same session ID was used for all requests.
Digging a bit into the WCF source code, we realized that cookie support is implemented at the ChannelFactory level in the HTTP transport. The ASP.NET Front-End application was using standard proxy classes derived from ClientBase<T>, which meant that by default, channel factory caching was enabled. This meant that, on a W3WP.EXE process on the front-end, every single call to the WCF service would end up using the same HTTP cookies.
Once the customer configured the binding with allowCookies=”false” and fixed the other issues found, application performance significantly improved!
The moral of the story is: be careful when you enable the allowCookies option when calling WCF services over HTTP. In this case the combination of allowCookies, channel factory caching, aspnetCompabilitityEnabled, and an incorrectly configured load-balancer completely killed the application capability to scale.