BizTalk 2004 Flat File Schema Tutorial 2

In this tutorial, I’ll show you how to create a schema for a somewhat
more complex Flat File involving both delimited and positional records, and how
to use tags to distinguish between different possible records at one position
in the file.

Consider this example file:


CITYLIST
CSEATTLE                       WA00198776
SWASHINGTON                    WA
SARIZONA                       AZ
CTUCSON                        AZ89112299
.
MILKPRICES
SEATTLE                         1000USD
TUCSON                         19200USD
      

What this has is information about two different topics: A list of cities,
and milk prices, which form the two different parts of the file.
Each one is formed by a set of records, but the records in each one of
the parts is very different. Also, the first part (the list of cities) contains two
different kinds of records (one for Cities, and one for States).

Let’s see the definition of the file in more detail:

  • The List of cities begins with the string CITYLIST on a line of it’s own. It is defined by a set of records, which can be of two kinds:
    • City Records
    • State Records
  • The Milk prices list begins with the string MILKPRICES on a line of it’s own. It is defined by a set of records of a single type: MilkPrice.
  • Sections in the file are separated by CR/LF.CR/LF

The City Records are defined like this:

  • They begin with the letter C
  • The name of the city comes next, 30 characters long
  • The initials of the state of the city (2 chars) go next
  • The number of sewers in the city comes next, being an 8 digit number which is padded with 0s to the left.

The State records are defined like this:

  • They begin with the letter S
  • The name of the state comes next, 30 characters long
  • The initials of the state (2 chars) go next

The Milk Price records are defined like this:

  • The name of the city, 30 characters long
  • A number representing the price, which can be 6 digits long (4+2 decimals)
  • A three-letter currency abreviation.

As you can see, this file, although simple in appearance, has a few interesting
challenges for parsing. First of all, it contains records of several kinds,
divided in two different sections. Also, it has both a delimited structure
(the sections are delimited among themselves by the ‘\r\n.\r\n’ sequence) as
well as positional records. It also has a few interesting things, like paddings on a
few elements.

Let’s start then to create a schema for it. Let’s call this schema "MultiFile" ;)
For this schema, we’ll configure the following properties:

Property Value
TargetNamespace urn:schemas-winterdom-com:multifile
Case Uppercase

Our MultiFile schema will begin with a single root element, called MultiFile.
Since this node will contain both of our sections in the file, we want it to be
configured as a delimited record. We’ll also want it to contain
an <xs:all/> instead of an <xs:sequence/>, since we’d ideally like it
to support the sections in any order. Here’s how we’d configure it:

Property Value
Structure Delimited
Child Order Infix
Child Delimiter Type Hexadecimal
Child Delimiter 0x0D 0x0A 0x2E 0x0D 0x0A (\r\n.\r\n)

Inside this root node, we’ll insert our section records. Let’s start with the CityList record, which we’ll configure like this:

Property Value
Structure Delimited
Tag Identifier CITYLIST
Group Order Type choice
Group Max Occurs unbounded
Group Min Occurs 0
Preserve Delimiter for Empty Data No
Child Order Infix
Child Delimiter Type Hexadecimal
Child Delimiter 0x0D 0x0A (\r\n)

This is where things start to get interesting. Notice how we use the Tag
Identifier property to tell BizTalk that it can recognize that the record is of
type CityList because it begins with the string CITYLIST (since we don’t set the
Tag Offset property, it is assumed to be 0).

Now we need to insert two child records, one for the City record,
and one for the State record. For each one, we’ll set up in our schema each field
in the record as an attribute of the City Node. Each one should be configured like
described below.

City

Property Value
Structure Positional
Tag Identifier C

Fields:
Name
   Justification           Left
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:string
   Positional Length       30
   Positional Offset       1
State
   Data Type               xs:string
   Positional Length       2
NumSewers
   Justification           Right
   Pad Character           0
   Pad Character Type      Character
   Data Type               xs:int
   Positional Length       8
      

State

Property Value
Structure Positional
Tag Identifier S

Fields:
Name
   Justification           Left
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:string
   Positional Length       30
   Positional Offset       1
Initials
   Data Type               xs:string
   Positional Length       2
      

Armed with this, we’d suppose we should now be able to parse this:


CITYLIST
CSEATTLE                      WA00198776
SWASHINGTON                   WA
SARIZONA                      AZ
CTUCSON                       AZ00002299
   

Right? Wrong! As you soon will have noticed, it fails.
The reason, well, simple, actually. The Tag for the CityList element is defined
as CITYLIST, not CITYLIST\r\n. So what BizTalk would parse right now would actually be this:


CITYLISTCSEATTLE                      WA00198776
SWASHINGTON                   WA
SARIZONA                      AZ
CTUCSON                       AZ00002299   
   

One would think that by setting the Child Delimiter Order property of the CityList
node to "Prefix" would be enough, however, this is not true.
The reason for this is that, since there are no actual fields in CityList
(only child records), it simple won’t recognize that delimiter there (nor insert it
if you generate an instance from vs.net). If there actually was a small
field just before the City and State records, it would be recognized as appropriate,
though. This seems to me to be a small bug, though, since it is definitely not what
one would expect.

So, what can we do about this?

In fact, afaik, there’s no way to tell BizTalk that there’s a
CR/LF following the tag, at least directly from the UI. One way I found
that works is to open up the XSD file directly in the text editor, find
the tag_name attribute for CityList with the value CITYLIST&#x0D;&#x0A;. It’s a
fun trick in that we basically encode the CR/LF pair using character entities.
I had a pleasant surprise to find it worked, but I definitely would like something better
in the UI that allowed this. (Before anyone asks, yes, I did try putting that directly
in the Tag Identifier property, but that won’t work because BizTalk will encode
the character entity into &amp;#0D; which doesn’t work :().

Now, let’s be honest… this is a bit of a hack, because we’re basically
telling BizTalk that the tag is actually "CITYLIST\r\n", which we know ain’t
the truth, but it works, and it’s good enough for now.

Defining the rest of the document schema is very similar to what we’ve done so far.
We need to add the MilkPrices node under the root, configured like this:

Property Value
Structure Delimited
Tag Identifier MILKPRICES&#x0D;&#x0A;
Group Order Type sequence
Preserve Delimiter for Empty Data No
Child Order Infix
Child Delimiter Type Hexadecimal
Child Delimiter 0x0D 0x0A (\r\n)

Under MilkPrices, we’ll insert a positional record to contain each price, as well as the necessary fields, as defined below:

MilkPrice

Property Value
Structure Positional
Min Occurs 0
Max Occurs unbounded

CityName (Attribute Field)
   Justification           Left
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:string
   Positional Length       30

Price (Attribute Field)
   Justification           Right
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:decimal
   Positional Length       6

Currency (Attribute Field)
   Justification           Right
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:string
   Positional Length       3
      

Now, however, we run into a little snag. Depending on how our schema is defined,
we can get into a case where either the schema fails to parse the sample file, or
in which only a single record within the CityList section is recognized, while the
rest is ignored.

The problem lies in the fact that, when you’re defining the schema, you define,
in essence, a state machine (might not be entirely obvious). In our case,
the problem lies in the fact that the delimiters for both the MultiFile root node and
the CityList node begin with \r\n. So, what happens is that the parser will process
the first record inside CityList, and then find a \r\n. How will BizTalk know
that this \r\n it found means the end of the records inside CityList, or the beginning
of the delimiter for the MultiFile record?

Answer: it doesn’t, so with our current configuration, the schema will fail
to parse completely.

The simplest way to fix this that I found was to define the CityList delimiter
to be postfix, and then change the delimiter in the MultiFile record to just
be ".\r\n". This would eliminate the ambiguity, while allowing us
to keep our file syntax.

You may have noticed that, so far, I’ve left one thing
out: How to make sure that BizTalk Server recognizes the decimal numbers in
the Price field of the MilkPrice record correctly. As it currently is, it
will not parse the decimal digits correctly, and will just lead to a number
that’s a hundred times larger than it really is (there is no decimal separator
in the file format).

Well, honestly? I haven’t found a way to make this work from within
the basic parser, there’s just no property I can see that controls this.
Two options come to mind:

  • Creating a BizTalk map that will use a functoid to change the value
    as appropriate (simply divide it by 100)

  • Parsing the field into two actual fields (the integral part,
    and the decimal part). You could then write a map that puts them
    together as appropriate.

Here’s how our schema looks in the end:


<?xml version="1.0" encoding="utf-16"?>
<xs:schema xmlns="urn:schemas-winterdom-com:multifile" xmlns:b="http://schemas.microsoft.com/BizTalk/2003" elementFormDefault="qualified" targetNamespace="urn:schemas-winterdom-com:multifile" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:annotation>
    <xs:appinfo>
      <b:schemaInfo count_positions_by_byte="false" standard="Flat File" root_reference="MultiFile" case="upper" />
      <schemaEditorExtension:schemaInfo namespaceAlias="b" extensionClass="Microsoft.BizTalk.FlatFileExtension.FlatFileExtension" standardName="Flat File" xmlns:schemaEditorExtension="http://schemas.microsoft.com/BizTalk/2003/SchemaEditorExtensions" />
    </xs:appinfo>
  </xs:annotation>
  <xs:element name="MultiFile">
    <xs:annotation>
      <xs:appinfo>
        <b:recordInfo structure="delimited" preserve_delimiter_for_empty_data="true" suppress_trailing_delimiters="false" sequence_number="1" child_order="infix" child_delimiter_type="hex" child_delimiter="0x2E 0x0D 0x0A" />
      </xs:appinfo>
    </xs:annotation>
    <xs:complexType>
      <xs:sequence>
        <xs:annotation>
          <xs:appinfo>
            <b:groupInfo sequence_number="0" />
          </xs:appinfo>
        </xs:annotation>
        <xs:element name="CityList">
          <xs:annotation>
            <xs:appinfo>
              <b:recordInfo sequence_number="1" structure="delimited" preserve_delimiter_for_empty_data="false" suppress_trailing_delimiters="true" tag_name="CITYLIST&#xD;&#xA;" child_order="postfix" child_delimiter_type="hex" child_delimiter="0x0D 0x0A" />
            </xs:appinfo>
          </xs:annotation>
          <xs:complexType>
            <xs:choice minOccurs="0" maxOccurs="unbounded">
              <xs:annotation>
                <xs:appinfo>
                  <b:groupInfo sequence_number="0" />
                </xs:appinfo>
              </xs:annotation>
              <xs:element name="City">
                <xs:annotation>
                  <xs:appinfo>
                    <b:recordInfo sequence_number="1" structure="positional" preserve_delimiter_for_empty_data="true" suppress_trailing_delimiters="false" tag_name="C" />
                  </xs:appinfo>
                </xs:annotation>
                <xs:complexType>
                  <xs:attribute name="Name" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="1" justification="left" pad_char_type="char" pad_char=" " pos_length="30" pos_offset="1" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="State" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="2" justification="left" pos_length="2" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="NumSewers" type="xs:int">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="3" justification="right" pos_length="8" pad_char_type="char" pad_char="0" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                </xs:complexType>
              </xs:element>
              <xs:element name="State">
                <xs:annotation>
                  <xs:appinfo>
                    <b:recordInfo sequence_number="2" structure="positional" preserve_delimiter_for_empty_data="true" suppress_trailing_delimiters="false" tag_name="S" />
                  </xs:appinfo>
                </xs:annotation>
                <xs:complexType>
                  <xs:attribute name="Name" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="1" justification="left" pad_char_type="char" pad_char=" " pos_length="30" pos_offset="1" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="Initials" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="2" justification="left" pos_length="2" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                </xs:complexType>
              </xs:element>
            </xs:choice>
          </xs:complexType>
        </xs:element>
        <xs:element name="MilkPrices">
          <xs:annotation>
            <xs:appinfo>
              <b:recordInfo sequence_number="2" structure="delimited" preserve_delimiter_for_empty_data="false" suppress_trailing_delimiters="false" tag_name="MILKPRICES&#xD;&#xA;" child_delimiter_type="hex" child_delimiter="0x0D 0x0A" child_order="infix" />
            </xs:appinfo>
          </xs:annotation>
          <xs:complexType>
            <xs:sequence>
              <xs:annotation>
                <xs:appinfo>
                  <b:groupInfo sequence_number="0" />
                </xs:appinfo>
              </xs:annotation>
              <xs:element minOccurs="0" maxOccurs="unbounded" name="MilkPrice">
                <xs:annotation>
                  <xs:appinfo>
                    <b:recordInfo sequence_number="1" structure="positional" preserve_delimiter_for_empty_data="true" suppress_trailing_delimiters="false" />
                  </xs:appinfo>
                </xs:annotation>
                <xs:complexType>
                  <xs:attribute name="CityName" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="1" justification="left" pad_char_type="char" pad_char=" " pos_length="30" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="Price" type="xs:decimal">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="2" justification="right" pos_length="6" pad_char_type="char" pad_char="0" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="Currency" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="3" justification="left" pad_char_type="char" pad_char=" " pos_length="3" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>   
   

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>