BizTalk and Single-Quoted Attributes in XML

The XML Specification allows attribute values to be wrapped with either double quotes (") or single quotes ('). Normally, BizTalk Server, as a good standard-abiding citizen, has no problem with this. However, I had been seeing messages on the BizTalk newsgroup claiming that BizTalk would refuse to process XML documents using single quotes when the XML Disassembler component tried to parse the incoming message.

I spent a few minutes researching this claim and found out that indeed there is some truth to this statement. Indeed it appears there is a bug in the XML Disassembler when it runs into documents using single quotes, but it only affects some documents.

The first thing the XML Disassembler component does when it receives a message is probe it to see if it is indeed and XML message and try to "guess" the encoding it is in. As part of that it will look not only for a BOM (Byte Order Mark), but also for an <?xml?> declaration containing an encoding attribute.

Here's the problem: if the encoding attribute in the xml declaration is wrapped in single quotes, the parsing fails. In other words: BizTalk has a real issue with a document that begins like this:

<?xml version='1.0' encoding='utf-8'?>

It does't have a problem with this, however:

<?xml version='1.0' encoding="utf-8"?>

In fact, as long as the encoding attribute is missing or is wrapped with double quotes, BizTalk will happily accept and correctly parse the message. If not, BizTalk will fail with the following error:

There was a failure executing the receive pipeline: "Microsoft.BizTalk.DefaultPipelines.XMLReceive, Microsoft.BizTalk.DefaultPipelines, Version=3.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" Source: "XML disassembler" Receive Port: "ReceivePort1" URI: "C:\temp\BizTalk\TestIn\*.xml" Reason: Length cannot be less than zero.
Parameter name: length

Looking a bit throught the XML Disassembler code using reflector, it becomes very clear that the Probe() method of the XmlDasmComp class (i.e. the disassembler component) is fatally flawed, as it will attemtp to do a manual parsing of the <?xml?> declaration and explicitly expects the encoding attribute value to be wrapped in double quotes. Here's the relevant bit of code:

if (text2.Contains("encoding")) { text1 = text2.Substring(text2.IndexOf("encoding")); text1 = text1.Substring(text1.IndexOf('"') + 1); text1 = text1.Substring(0, text1.IndexOf('"')); }

Unfortunately, there's no easy workaround without actually modifying the incoming message either by hand or through a custom decoding pipeline component that can fix it..

Technorati: ,

Comments (4)

Dexter LegeaspiOctober 5th, 2006 at 5:52 pm

I just wrote a similar entry in my blog and wrote a decoding pipeline component:
http://www.dexterlegaspi.com/journal/?p=31

Tomas RestrepoOctober 5th, 2006 at 6:01 pm

Dexter,
Cool! Just subscribed to your weblog as well, some nice articles you got there!

Jan EliasenOctober 6th, 2006 at 1:11 am

Hi
Have you reported it to Microsoft? They will need to fix that to the next service pack and/or BizTalk 2006 R2.

eliasen

Tomas RestrepoOctober 6th, 2006 at 5:33 am

Jan,
Not yet, but I’ll try doing so if I have some free time when I get my support contract renewed :)

Leave a comment

Your comment