In this tutorial, I'll show you how to create a schema for a somewhat
more complex Flat File involving both delimited and positional records, and how
to use tags to distinguish between different possible records at one position
in the file.
Consider this example file:
CITYLIST
CSEATTLE WA00198776
SWASHINGTON WA
SARIZONA AZ
CTUCSON AZ89112299
.
MILKPRICES
SEATTLE 1000USD
TUCSON 19200USD
What this has is information about two different topics: A list of cities,
and milk prices, which form the two different parts of the file.
Each one is formed by a set of records, but the records in each one of
the parts is very different. Also, the first part (the list of cities) contains two
different kinds of records (one for Cities, and one for States).
Let's see the definition of the file in more detail:
The City Records are defined like this:
The State records are defined like this:
The Milk Price records are defined like this:
As you can see, this file, although simple in appearance, has a few interesting
challenges for parsing. First of all, it contains records of several kinds,
divided in two different sections. Also, it has both a delimited structure
(the sections are delimited among themselves by the '\r\n.\r\n' sequence) as
well as positional records. It also has a few interesting things, like paddings on a
few elements.
Let's start then to create a schema for it. Let's call this schema "MultiFile" ;)
For this schema, we'll configure the following properties:
Property | Value |
---|---|
TargetNamespace | urn:schemas-winterdom-com:multifile |
Case | Uppercase |
Our MultiFile schema will begin with a single root element, called MultiFile.
Since this node will contain both of our sections in the file, we want it to be
configured as a delimited record. We'll also want it to contain
an
to support the sections in any order. Here's how we'd configure it:
Property | Value |
---|---|
Structure | Delimited |
Child Order | Infix |
Child Delimiter Type | Hexadecimal |
Child Delimiter | 0x0D 0x0A 0x2E 0x0D 0x0A (\r\n.\r\n) |
Inside this root node, we'll insert our section records. Let's start with the CityList record, which we'll configure like this:
Property | Value |
---|---|
Structure | Delimited |
Tag Identifier | CITYLIST |
Group Order Type | choice |
Group Max Occurs | unbounded |
Group Min Occurs | 0 |
Preserve Delimiter for Empty Data | No |
Child Order | Infix |
Child Delimiter Type | Hexadecimal |
Child Delimiter | 0x0D 0x0A (\r\n) |
This is where things start to get interesting. Notice how we use the Tag
Identifier property to tell BizTalk that it can recognize that the record is of
type CityList because it begins with the string CITYLIST (since we don't set the
Tag Offset property, it is assumed to be 0).
Now we need to insert two child records, one for the City record,
and one for the State record. For each one, we'll set up in our schema each field
in the record as an attribute of the City Node. Each one should be configured like
described below.
City
Property | Value |
---|---|
Structure | Positional |
Tag Identifier | C |
Fields:
Name
Justification Left
Pad Character ' ' (space)
Pad Character Type Character
Data Type xs:string
Positional Length 30
Positional Offset 1
State
Data Type xs:string
Positional Length 2
NumSewers
Justification Right
Pad Character 0
Pad Character Type Character
Data Type xs:int
Positional Length 8
State
Property | Value |
---|---|
Structure | Positional |
Tag Identifier | S |
Fields:
Name
Justification Left
Pad Character ' ' (space)
Pad Character Type Character
Data Type xs:string
Positional Length 30
Positional Offset 1
Initials
Data Type xs:string
Positional Length 2
Armed with this, we'd suppose we should now be able to parse this:
CITYLIST
CSEATTLE WA00198776
SWASHINGTON WA
SARIZONA AZ
CTUCSON AZ00002299
Right? Wrong! As you soon will have noticed, it fails.
The reason, well, simple, actually. The Tag for the CityList element is defined
as CITYLIST, not CITYLIST\r\n. So what BizTalk would parse right now would actually be this:
CITYLISTCSEATTLE WA00198776
SWASHINGTON WA
SARIZONA AZ
CTUCSON AZ00002299
One would think that by setting the Child Delimiter Order property of the CityList
node to "Prefix" would be enough, however, this is not true.
The reason for this is that, since there are no actual fields in CityList
(only child records), it simple won't recognize that delimiter there (nor insert it
if you generate an instance from vs.net). If there actually was a small
field just before the City and State records, it would be recognized as appropriate,
though. This seems to me to be a small bug, though, since it is definitely not what
one would expect.
So, what can we do about this?
In fact, afaik, there's no way to tell BizTalk that there's a
CR/LF following the tag, at least directly from the UI. One way I found
that works is to open up the XSD file directly in the text editor, find
the tag_name attribute for CityList with the value CITYLIST
. It's a
fun trick in that we basically encode the CR/LF pair using character entities.
I had a pleasant surprise to find it worked, but I definitely would like something better
in the UI that allowed this. (Before anyone asks, yes, I did try putting that directly
in the Tag Identifier property, but that won't work because BizTalk will encode
the character entity into &#0D; which doesn't work :().
Now, let's be honest... this is a bit of a hack, because we're basically
telling BizTalk that the tag is actually "CITYLIST\r\n", which we know ain't
the truth, but it works, and it's good enough for now.
Defining the rest of the document schema is very similar to what we've done so far.
We need to add the MilkPrices node under the root, configured like this:
Property | Value |
---|---|
Structure | Delimited |
Tag Identifier | MILKPRICES
 |
Group Order Type | sequence |
Preserve Delimiter for Empty Data | No |
Child Order | Infix |
Child Delimiter Type | Hexadecimal |
Child Delimiter | 0x0D 0x0A (\r\n) |
Under MilkPrices, we'll insert a positional record to contain each price, as well as the necessary fields, as defined below:
MilkPrice
Property | Value |
---|---|
Structure | Positional |
Min Occurs | 0 |
Max Occurs | unbounded |
CityName (Attribute Field)
Justification Left
Pad Character ' ' (space)
Pad Character Type Character
Data Type xs:string
Positional Length 30
Price (Attribute Field)
Justification Right
Pad Character ' ' (space)
Pad Character Type Character
Data Type xs:decimal
Positional Length 6
Currency (Attribute Field)
Justification Right
Pad Character ' ' (space)
Pad Character Type Character
Data Type xs:string
Positional Length 3
Now, however, we run into a little snag. Depending on how our schema is defined,
we can get into a case where either the schema fails to parse the sample file, or
in which only a single record within the CityList section is recognized, while the
rest is ignored.
The problem lies in the fact that, when you're defining the schema, you define,
in essence, a state machine (might not be entirely obvious). In our case,
the problem lies in the fact that the delimiters for both the MultiFile root node and
the CityList node begin with \r\n. So, what happens is that the parser will process
the first record inside CityList, and then find a \r\n. How will BizTalk know
that this \r\n it found means the end of the records inside CityList, or the beginning
of the delimiter for the MultiFile record?
Answer: it doesn't, so with our current configuration, the schema will fail
to parse completely.
The simplest way to fix this that I found was to define the CityList delimiter
to be postfix, and then change the delimiter in the MultiFile record to just
be ".\r\n". This would eliminate the ambiguity, while allowing us
to keep our file syntax.
You may have noticed that, so far, I've left one thing
out: How to make sure that BizTalk Server recognizes the decimal numbers in
the Price field of the MilkPrice record correctly. As it currently is, it
will not parse the decimal digits correctly, and will just lead to a number
that's a hundred times larger than it really is (there is no decimal separator
in the file format).
Well, honestly? I haven't found a way to make this work from within
the basic parser, there's just no property I can see that controls this.
Two options come to mind:
Here's how our schema looks in the end: