I've just posted a new sample (very simple one, by the way) custom pipeline component for BizTalk Server: The FixEncodingComponent. What this sample does is allow you to tell the XmlDisassembler exactly using which encoding or Charset it should interpret the message in.
This alleviates the problem sometimes where you get an incoming XML message on which the XmlDisassembler barfs because it cannot figure out on it's own what the message's encoding is to process it. According to the BizTalk documentation, here's the rules the XmlDisassembler uses to figure out a message's encoding:
- If a byte order mark exists in the data, encoding information is determined from it.
- Otherwise, if the IBaseMessagePart.Charset property is set, the encoding specified there is used.
- Otherwise if the XML declaration is present in the XML document, the encoding specified there is used, provided the XML declaration is ANSI.
- Otherwise, UTF-8 encoding is used.
Option 2 is what the FixEncodingComponent does, since it is by far the simplest one and doesn't require fiddling around with the message stream, which is both slower and error prone. Here's how to use it:
- Create a new custom receive pipeline
- Add the XmlDisassembler to the pipeline and configure it to your need.
- Add the FixEncodingComponent to the pipeline in the Decode stage, so that it is run before the XmlDisassembler.

- Set the Encoding parameter of the component in the Property Explorer to the encoding you want to use. To make this easier, the component implements a custom UITypeEditor that pops up a list of all the encodings supported by the .NET framework:

I've included in the sample a test project that uses routing between ports and includes a test XML message encoded in IBM EBCDIC (International) encoding (the File -> Advanced Save Options command in Visual Studio is pretty helpful for this). To see the pipeline component in action, just configure the receive location with the standard XmlReceive pipeline and with the custom pipeline included in the sample and notice how one fails and how the other one works :)

Hi Tomas
Will this component encode from one encoding format to another. For eg from SHIFT- JIS to EBCDIK(Katakana).
Or it will just set the encoding type that is used in the message. Please give a reply.
Thanks
Smithesh
It will not transcode the message; all it does is make sure BizTalk knows what encoding it is in so that it can decode it appropriately.
Hi Tomas,
Will this Dll work on 64 – bit machine. It works fine with 32- bit machine but we have found problems executing it on 64 bit.
Can you please let us know about this.
Thanks in advance.
Regards,
Sachin.
@Sachin: To be honest, I haven’t tried it myself (or touched that code in a while), as I don’t have a 64-bit biztalk installation at the moment. However, if you’ve got an error message or any info about reproducing the problem, let me know (here or an email would do fine) and I’ll see what I can do to help you resolve it.
I am Using your Fix message encoding tool at a decode stage while I am using EDI disassembler on disassemble stage but I am getting same generic message (no disassemble stage can recognize the data ). I can use the same body of a message write that in a file and use FILE as a transport, that work just fine but if I try to pick same message using MSMQ with EDI disassembler pipeline, it gives me a same error each time.I am not sure which Encoding I should pick though I have tried almost all which make sense to me.
@Jay: It might depend on how exactly you’re sending the message over MSMQ (I’ve talked about this before, see http://winterdom.com/2008/04/biztalk2006msmqandbiztalk2002 ). Also, to be honest, I don’t know if the EDI Disassembler pays attention to the same charset property that the XML and FF disassemblers do, that’s something I’d need to look into.
Tomas, will this component work with the FlatFileDisassembler as well, or only with the XmlDisassembler?
What is the difference between the “Set Message Encoding” component and the “Fix Message Encoding” component, i get both from the downloaded dll.
Once i opened your source your comments made my previous question redundant
Fix – Fixes the encoding of incoming XML messages
Set – Sets the encoding to use when sending out messages
btw. i also tested this with FlatFileDisassembler and this works just fine
Thanks Tomas! Great work!
Hi Tomas,
Thanks for your solution, I’m a bit beginner in BizTalk server , so perhaps my question is silly
I’ve downloaded the solution but I don’t know how to put it on my pipeline!
could you please tell me how should I import your dll in pipeline step by step .
Many thanks,
Farzad
@Farzad: Build the custom pipeline component; close VS; put the new dll in the gac and copy it to the “Pipeline Components” folder under your Biztalk installation folder; restart VS… when editing a pipeline, the new component should appear on the toolbox (if not, just right click on it and do a Choose Items)
Hi,
I have used ur Pipeline.It is gr8 code, thanks.
But i want to catch the exception in case of an invalid xml being inputted.
can u help me in that.
thanks again,
Vishal Sharma
Hi Tomas, how are you doing ?
I am having some troubles, I trying to use the Fix encodign component, and I made several tests and the resulted XML has always weird symbols just before the root, so I am having the following exception:
The service instance will remain suspended until administratively resumed or terminated.
If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception.
InstanceId: 7c605e4b-0eaf-4308-a026-a4258a3e2413
Shape name: ExportAssociate
ShapeId: f42d9bfe-fdb9-42a2-819c-c0a94452852c
Exception thrown from: segment 2, progress 18
Inner exception: Root element is missing.
Exception type: XmlException
Source: System.Xml
Target Site: Void Throw(System.Exception)
The following is a stack trace that identifies the location where the exception occured
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at CapGemini.Adecco.AW.Bts.SE.Wrapper.ExportAssociate.GetXmlValue(String sXmlAssociate, String sField)
at CapGemini.Adecco.AW.Bts.SE.IEM_ExportAssociate.segment2(StopConditions stopOn)
at Microsoft.XLANGs.Core.SegmentScheduler.RunASegment(Segment s, StopConditions stopCond, Exception& exp)
header of the resulted XML is as follows, note there are some weird symbols just before the root:

…
…
I am using it with Biztalk 2010 and VS2010. which could be the problem ?
thanks a lot and grear job !!
@Julian: Those strange characters are probably a BOM (Byte Order Mark). Can you open up the message in a hex editor and see what they are? You should be able to figure out the right encoding from that.
Nice tutorial, thanks for the share !
Check this one for furhter explanation on how this all works for Flat Files: http://maximelabelle.wordpress.com/2010/05/20/an-encodingtranscoder-custom-pipeline-component/