Yesterday morning I was going through some client/server code I had written, mostly some custom serialization/deserialization code. The code itself was working fine; I had decent set of unit tests for it. Instead, I was focusing on doing some integration testing as well as taking some measurements of how performance was when large object sets where serialized/deserialized.

The code itself is fairly simple, and although it uses XML as the serialization format, it still is fairly efficient (XmlReader/Writer based code); however I needed to see how other factors would impact it. On the server side, I had built a TextWriter implementation on top of my networking stack, which made kept the code real simple.

In my first test, I tried having it work with a small set of objects. The result? Doing real work 110ms; serializing the results and sending them over the wire? 1100ms. Not very good.

Looking at it a bit more, I realized that the problem was that the underlying networking stack didn't do any buffering at all; if I send 3 bytes, it would send a TCP packet with those 3 bytes right away; since the XmlTextWriter class does a lot of WriteChar() calls on the underlying TextWriter, performance was abysmal.

So I went ahead and implemented a simple buffering scheme inside my TextWriter object, and that worked fine. Second results? Doing real work: 85ms, serializing the results: 120ms.

I then started looking at how it would perform with large object sets (as in like a few thousands of them). My first result? Doing real work: 6640ms, serializing the results? 130ms. That can't be right! Tried it a few more times; with always the same results.

Indeed it wasn't right. Turned out that while I was sending small result sets; all was well because the serialized representation of the objects was smaller than the small buffer size I was testing initially with (about 4KB), but as soon as that went over that number, I started sending larger packages down the network fairly fast and the underlying network library would throw an exception after the second or third package was sent. So indeed, failing was turning out to be faster than succeeding!

I didn't notice the exception that was occurring at first because the serialization code was wrapped with a try/catch block that, if it trapped an exception, would serialize the error information and send it along the wire as well; which was pretty useless as that was precisely what was failing :-).

Technorati tags: ,


Tomas Restrepo

Software developer located in Colombia.