BizTalk 2004 Flat File Schema Tutorial 2

In this tutorial, I'll show you how to create a schema for a somewhat
more complex Flat File involving both delimited and positional records, and how
to use tags to distinguish between different possible records at one position
in the file.

Consider this example file:



CITYLIST

CSEATTLE                       WA00198776

SWASHINGTON                    WA

SARIZONA                       AZ

CTUCSON                        AZ89112299

.

MILKPRICES

SEATTLE                         1000USD

TUCSON                         19200USD

What this has is information about two different topics: A list of cities,
and milk prices, which form the two different parts of the file.
Each one is formed by a set of records, but the records in each one of
the parts is very different. Also, the first part (the list of cities) contains two
different kinds of records (one for Cities, and one for States).

Let's see the definition of the file in more detail:

The List of cities begins with the string CITYLIST on a line of it's own. It is defined by a set of records, which can be of two kinds:
- City Records
- State Records

The Milk prices list begins with the string MILKPRICES on a line of it's own. It is defined by a set of records of a single type: MilkPrice.
Sections in the file are separated by CR/LF.CR/LF

The City Records are defined like this:

They begin with the letter C
The name of the city comes next, 30 characters long
The initials of the state of the city (2 chars) go next
The number of sewers in the city comes next, being an 8 digit number which is padded with 0s to the left.

The State records are defined like this:

They begin with the letter S
The name of the state comes next, 30 characters long
The initials of the state (2 chars) go next

The Milk Price records are defined like this:

The name of the city, 30 characters long
A number representing the price, which can be 6 digits long (4+2 decimals)
A three-letter currency abreviation.

As you can see, this file, although simple in appearance, has a few interesting
challenges for parsing. First of all, it contains records of several kinds,
divided in two different sections. Also, it has both a delimited structure
(the sections are delimited among themselves by the '\r\n.\r\n' sequence) as
well as positional records. It also has a few interesting things, like paddings on a
few elements.

Let's start then to create a schema for it. Let's call this schema "MultiFile" ;)
For this schema, we'll configure the following properties:

Property	Value
TargetNamespace	urn:schemas-winterdom-com:multifile
Case	Uppercase

Our MultiFile schema will begin with a single root element, called MultiFile.
Since this node will contain both of our sections in the file, we want it to be
configured as a delimited record. We'll also want it to contain
an instead of an , since we'd ideally like it
to support the sections in any order. Here's how we'd configure it:

Property	Value
Structure	Delimited
Child Order	Infix
Child Delimiter Type	Hexadecimal
Child Delimiter	0x0D 0x0A 0x2E 0x0D 0x0A (\r\n.\r\n)

Inside this root node, we'll insert our section records. Let's start with the CityList record, which we'll configure like this:

Property	Value
Structure	Delimited
Tag Identifier	CITYLIST
Group Order Type	choice
Group Max Occurs	unbounded
Group Min Occurs	0
Preserve Delimiter for Empty Data	No
Child Order	Infix
Child Delimiter Type	Hexadecimal
Child Delimiter	0x0D 0x0A (\r\n)

This is where things start to get interesting. Notice how we use the Tag
Identifier property to tell BizTalk that it can recognize that the record is of
type CityList because it begins with the string CITYLIST (since we don't set the
Tag Offset property, it is assumed to be 0).

Now we need to insert two child records, one for the City record,
and one for the State record. For each one, we'll set up in our schema each field
in the record as an attribute of the City Node. Each one should be configured like
described below.

City

Property	Value
Structure	Positional
Tag Identifier	C



Fields:

Name

   Justification           Left

   Pad Character           ' ' (space)

   Pad Character Type      Character

   Data Type               xs:string

   Positional Length       30

   Positional Offset       1

State

   Data Type               xs:string

   Positional Length       2

NumSewers

   Justification           Right

   Pad Character           0

   Pad Character Type      Character

   Data Type               xs:int

   Positional Length       8

State

Property	Value
Structure	Positional
Tag Identifier	S



Fields:

Name

   Justification           Left

   Pad Character           ' ' (space)

   Pad Character Type      Character

   Data Type               xs:string

   Positional Length       30

   Positional Offset       1

Initials

   Data Type               xs:string

   Positional Length       2

Armed with this, we'd suppose we should now be able to parse this:



CITYLIST

CSEATTLE                      WA00198776

SWASHINGTON                   WA

SARIZONA                      AZ

CTUCSON                       AZ00002299

Right? Wrong! As you soon will have noticed, it fails.
The reason, well, simple, actually. The Tag for the CityList element is defined
as CITYLIST, not CITYLIST\r\n. So what BizTalk would parse right now would actually be this:



CITYLISTCSEATTLE                      WA00198776

SWASHINGTON                   WA

SARIZONA                      AZ

CTUCSON                       AZ00002299

One would think that by setting the Child Delimiter Order property of the CityList
node to "Prefix" would be enough, however, this is not true.
The reason for this is that, since there are no actual fields in CityList
(only child records), it simple won't recognize that delimiter there (nor insert it
if you generate an instance from vs.net). If there actually was a small
field just before the City and State records, it would be recognized as appropriate,
though. This seems to me to be a small bug, though, since it is definitely not what
one would expect.

So, what can we do about this?

In fact, afaik, there's no way to tell BizTalk that there's a
CR/LF following the tag, at least directly from the UI. One way I found
that works is to open up the XSD file directly in the text editor, find
the tag_name attribute for CityList with the value CITYLIST
. It's a
fun trick in that we basically encode the CR/LF pair using character entities.
I had a pleasant surprise to find it worked, but I definitely would like something better
in the UI that allowed this. (Before anyone asks, yes, I did try putting that directly
in the Tag Identifier property, but that won't work because BizTalk will encode
the character entity into &#0D; which doesn't work :().

Now, let's be honest... this is a bit of a hack, because we're basically
telling BizTalk that the tag is actually "CITYLIST\r\n", which we know ain't
the truth, but it works, and it's good enough for now.

Defining the rest of the document schema is very similar to what we've done so far.
We need to add the MilkPrices node under the root, configured like this:

Property	Value
Structure	Delimited
Tag Identifier	MILKPRICES
Group Order Type	sequence
Preserve Delimiter for Empty Data	No
Child Order	Infix
Child Delimiter Type	Hexadecimal
Child Delimiter	0x0D 0x0A (\r\n)

Under MilkPrices, we'll insert a positional record to contain each price, as well as the necessary fields, as defined below:

MilkPrice

Property	Value
Structure	Positional
Min Occurs	0
Max Occurs	unbounded

CityName (Attribute Field) Justification Left Pad Character ' ' (space) Pad Character Type Character Data Type xs:string Positional Length 30 Price (Attribute Field) Justification Right Pad Character ' ' (space) Pad Character Type Character Data Type xs:decimal Positional Length 6

Currency (Attribute Field) Justification Right Pad Character ' ' (space) Pad Character Type Character Data Type xs:string Positional Length 3

Now, however, we run into a little snag. Depending on how our schema is defined,
we can get into a case where either the schema fails to parse the sample file, or
in which only a single record within the CityList section is recognized, while the
rest is ignored.

The problem lies in the fact that, when you're defining the schema, you define,
in essence, a state machine (might not be entirely obvious). In our case,
the problem lies in the fact that the delimiters for both the MultiFile root node and
the CityList node begin with \r\n. So, what happens is that the parser will process
the first record inside CityList, and then find a \r\n. How will BizTalk know
that this \r\n it found means the end of the records inside CityList, or the beginning
of the delimiter for the MultiFile record?

Answer: it doesn't, so with our current configuration, the schema will fail
to parse completely.

The simplest way to fix this that I found was to define the CityList delimiter
to be postfix, and then change the delimiter in the MultiFile record to just
be ".\r\n". This would eliminate the ambiguity, while allowing us
to keep our file syntax.

You may have noticed that, so far, I've left one thing
out: How to make sure that BizTalk Server recognizes the decimal numbers in
the Price field of the MilkPrice record correctly. As it currently is, it
will not parse the decimal digits correctly, and will just lead to a number
that's a hundred times larger than it really is (there is no decimal separator
in the file format).

Well, honestly? I haven't found a way to make this work from within
the basic parser, there's just no property I can see that controls this.
Two options come to mind:

Creating a BizTalk map that will use a functoid to change the value
as appropriate (simply divide it by 100)
Parsing the field into two actual fields (the integral part,
and the decimal part). You could then write a map that puts them
together as appropriate.

Here's how our schema looks in the end: