While doing some encoding manipulation lately, I just realized an odd and somewhat annoying inconsistency in the .NET framework API, specifically with the design of the System.Text.Encoding class. As it turns out, an Encoding actually has several names it can be known as, including an internal framework name (such as UTF-8, or IBM500) and a human-understandable display name (such as "Unicode (UTF-8)" or "IBM EBCDIC (International)". This is most evident when you call Encoding.GetEncodings(), which returns an array of EncodingInfo objects, each with all the names an encoding is known for. Great.

Here's where you'll notice the first discrepancy: The human readable name for an encoding is called EncodingName in an Encoding object, but it is called DisplayName in the corresponding EncodingInfo name. It would be nice if they had been called the same, it would really have made it obvious that they were related.

The second oddity is even more annoying: When you call Encoding.GetEncoding() using a string parameter, you need to pass the internal framework encoding name, that is, the one you get from EncodingInfo.Name. However, what happens if you already got an Encoding instance from somewhere? Well, it turns out there is no Name property on the Encoding class, which means that there is no way to get that name once you've been handed an instance of an Encoding class. So, why is the name public on one direction and not another? One reason might be that the designers preferred you referred to the encoding using the CodePage, but that seems like putting a very unintuitive limitation.

This would be a moot point, however, if the Encoding class simply allowed you to get the corresponding EncodingInfo object :)




Tomas Restrepo

Software developer located in Colombia. Sr. PFE at Microsoft.