The BizTalk Server 2004 documentation is rather scant on the topic of creating schemas overall, but, specifically, it is surprisingly poor at explaining the basic concepts behind the flat file extensions to the designer.
So, let’s try to clear up a few of the concepts by creating a simple, yet interesting Flat File schema that parses CSV (Comma-Separated-Values) file.
Let’s start with the basics: What rules are there for a CSV file, in plain text?
A CSV file contains a set of records, separated by a CR/LF pair (\r\n). Each record contains a set of fields separated by a comma (,). If a field contains either a comma, or a CR/LF, then it must be escaped by enclosing said feed between double quotes.
There is certainly more to a CSV file, but let’s use this definition for now to start this tutorial.
The first thing we need to do is create a new Schema, let’s call it "CSV". Once we created the schema, we need to tell BizTalk that is we want to enable the flat file designer. To do this, selecte the schema, open the properties window; find the "Schema Editor Extensions" property and enable the appropriate one, as presented in the following image:
Now, let’s rename the root node of the schema to a more sensible name, say "CSV"; this will be the root of the XML document generated from the CSV file contents. Since the CSV file contains a set of records, let’s add a new Record Element underneath the CSV node and call it "Record", as well.
The Record node can be repeated any number of times, so let’s set the minOccurs and maxOccurs properties as appropriate:
Each record is composed by a collection of fields, so we want to add those as well:
Now comes the interesting part, which is making it deal with the Flat File side of things. Let’s start inside out.
First, we want to ensure that the schema parses correctly each field inside a record. So, we’ll go ahead and select the Record node and open up its properties. Here, we need to tell BizTalk that this is a delimited record, not a positional one, so we set the value of the Structure property to "Delimited".
Next, we need to tell it what kind of delimiters it uses (a comma). For this, we we’ll use the Child Delimiter set of properties of the Record node, and set them like this:
|Child Delimiter Type||Character|
|Preserve Delimiter for Empty Data||yes|
What this means is that each field in each record will be separated by a character (a comma), and that the position of the delimiter between the record will be between each field (that is, infix). Other possible values of the Child Order property include Postfix (which means it will always appear after the field, even after the last one) or Prefix (which means the delimiter always appears before the field, even before the first one).
To understand the difference between this three values, let’s consider an example of what they would parse:
Clearly, it is the infix case what we want. Enabling the Preserve Delimiter for Empty Data property means that fields with no value will not be ignored, but serialized as empty elements (and that they’ll have the necessary delimiters added, to enable things like "March, Robert,,Peter").
Now that we can parse each record, we need to tell BizTalk what separates each record in the file, which is a pair of CR/LF. To do this, we need to select the root element of the Schema (CSV) and configure its properties like this:
|Child Delimiter Type||Hexadecimal|
|Child Delimiter||0x0D 0x0A|
Here we’re telling BizTalk that the delimiter is not a specified as a printable character, but rather as the hexadecimal values of the characters. The way to specify the characters is not very clear in the documentation, so keep this in mind:
- You need to type a hexadecimal value that includes the 0x prefix
- If you need to specify more than one character, then put a space between them and put the 0x prefix on each of them. While the editor seems to accepts other formats, they don’t work as one might naturaly expect.
One suggestion I’d make to the BizTalk team in charge of the schema editor would be to allow us to use, in this cases, the "Character" delimiter type and instead put the escape sequences as the value (say, "\r\n"). That would make it much simpler and more intuitive, in my opinion.
Armed with this configuration, we can try the schema for the first time. Configure it so that we can validate the schema against a native instance, and put this in the input text file:
Restrepo,Tomas,27 Brown, Robert, 34 Smith, Lauren, 24
If we try that, we’ll see this result:
<CSV xmlns="urn:schemas-winterdom-com:csv"> <Record> <Field>Restrepo</Field> <Field>Tomas</Field> <Field>27</Field> </Record> <Record> <Field>Brown</Field> <Field> Robert</Field> <Field> 34</Field> </Record> <Record> <Field>Smith</Field> <Field> Lauren</Field> <Field> 24</Field> </Record> </CSV>
Great,this worked! However, it has a few minor things that don’t quite act like we want to. The first one is the empty spaces on some of the <Field> values, which we might like to avoid. To do so, we could tell BizTalk that fields might be padded with empty spaces, by selecting the Field node and configuring the following properties:
|Pad Character Type||Character|
|Pad Character||(empty space)|
However, this doesn’t quite solve the problem, since if we had trailing spaces, instead, they wouldn’t be removed. This is a side-effect of the value of the Justification property, since it tells BizTalk where the padding would go. Unfortunately, there’s no such thing as a "Justified" setting, only left or right are allowed.
A second problem this schema has is that if there are empty lines on the file (say, traling CR/LF pairs) then BizTalk will generate empty Record declarations, like this:
<Record> <Field></Field> </Record>
To be quite honest, I don’t know yet how to avoid this, so let’s leave that aside for a while. We’ll come back to it eventually.
The biggest problem with our schema right now, is that it won’t allow fields that contain embedded ";" or CR/LFs, since there is no way to escape them. In flat file schemas, there are two ways to solve this problem:
- Use escape characters: these act like the "\" character in C# or C++ string literals, giving special meaning to the next character. We set this on BizTalk using the Escape Character and Escape Character Type properties on the field node.
- Using wrap characters: These specify that fields containing problematic characters will be surrounded in special characters that mean that nothing inside it is supposed to be interpreted as having special meaning.
Number 2 is what we want in order to comply with the CSV definition we gave above, which requires us to wrap problematic fields in double quotes. To do this, we’ll select the Field node and configure the following properties:
|Wrap Character Type||Character|
|Wrap Character||" (double quote)|
With these new properties configured, we can parse the following document correctly:
Restrepo,Tomas,27 "Brown,row a lot", Robert, 34 Smith, Lauren, 24
<CSV xmlns="urn:schemas-winterdom-com:csv"> <Record> <Field>Restrepo</Field> <Field>Tomas</Field> <Field>27</Field> </Record> <Record> <Field>Brown,row a lot</Field> <Field>Robert</Field> <Field>34</Field> </Record> <Record> <Field>Smith</Field> <Field>Lauren</Field> <Field>24</Field> </Record> </CSV>
And that’s it for now. As you can see, it is not all that hard to use, although it can be a little bit confusing at times. Hope this helps anyone!