Archives for category: XML

Yossi Dahan commented here on being surprised by the implications of the elementFormDefault option in XSD schemas, and particularly in relation to BizTalk.

I don't find the behavior surprising myself anymore, as I already was familiar with the implications of elementFormDefault. In reality, this is something that I don't particularly like about XSD, as I find that leaving it to "unqualified" can lead, as Yossi found out, to some weird looking schemas (at least for me). Mind you, regardless of this option, qualified-ness of an element can also be enabled/disabled for each element individually.

There are, however, 2 things that make this option more obscure than it should be:

  1. The default value defined in XSD for elementFormDefault is "unqualified".
  2. When you install BizTalk, now Visual Studio will have 2 different templates for XSD schemas. The original one in Visual Studio has elementFormDefault="qualified", while the one installed by BizTalk doesn't specify elementFormDefault (thus leaving it in the default value of "unqualified"). Which one you get depends on the context in which you use the Add New Item menu in VS.

In general, I recommend always watching out for this issue and always explicitly setting elementFormDefault to the appropriate value. In my mind, that appropriate value should always be "qualified" for new schemas :-).

Also, just for the sake of completeness, it is worth saying that, under the XSD specification, all root elements defined in a schema are always qualified (i.e. should be associated to the XML targetNamespace of the schema). The elementFormDefault only controls whether child elements are qualified or unqualified. This is why the elementFormDefault option doesn't really "drop namespaces" from the XML document instances; it merely restricts it to the root elements.

Technorati tags: ,

The XML Specification allows attribute values to be wrapped with either double quotes (") or single quotes ('). Normally, BizTalk Server, as a good standard-abiding citizen, has no problem with this. However, I had been seeing messages on the BizTalk newsgroup claiming that BizTalk would refuse to process XML documents using single quotes when the XML Disassembler component tried to parse the incoming message.

I spent a few minutes researching this claim and found out that indeed there is some truth to this statement. Indeed it appears there is a bug in the XML Disassembler when it runs into documents using single quotes, but it only affects some documents.

The first thing the XML Disassembler component does when it receives a message is probe it to see if it is indeed and XML message and try to "guess" the encoding it is in. As part of that it will look not only for a BOM (Byte Order Mark), but also for an <?xml?> declaration containing an encoding attribute.

Here's the problem: if the encoding attribute in the xml declaration is wrapped in single quotes, the parsing fails. In other words: BizTalk has a real issue with a document that begins like this:

<?xml version='1.0' encoding='utf-8'?>

It does't have a problem with this, however:

<?xml version='1.0' encoding="utf-8"?>

In fact, as long as the encoding attribute is missing or is wrapped with double quotes, BizTalk will happily accept and correctly parse the message. If not, BizTalk will fail with the following error:

There was a failure executing the receive pipeline: "Microsoft.BizTalk.DefaultPipelines.XMLReceive, Microsoft.BizTalk.DefaultPipelines, Version=3.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" Source: "XML disassembler" Receive Port: "ReceivePort1" URI: "C:\temp\BizTalk\TestIn\*.xml" Reason: Length cannot be less than zero.
Parameter name: length

Looking a bit throught the XML Disassembler code using reflector, it becomes very clear that the Probe() method of the XmlDasmComp class (i.e. the disassembler component) is fatally flawed, as it will attemtp to do a manual parsing of the <?xml?> declaration and explicitly expects the encoding attribute value to be wrapped in double quotes. Here's the relevant bit of code:

if (text2.Contains("encoding")) { text1 = text2.Substring(text2.IndexOf("encoding")); text1 = text1.Substring(text1.IndexOf('"') + 1); text1 = text1.Substring(0, text1.IndexOf('"')); }

Unfortunately, there's no easy workaround without actually modifying the incoming message either by hand or through a custom decoding pipeline component that can fix it..

Technorati: ,

Matt Winkler asks here on feedback as to what we'd like to see on Windows Workflow Foundation that might help reduce the complexity and/or improve the WF experience for developer: spend a $100 bucks on what you'd like to see. Here's my take on it:

$10: Out of the Box hosting scenarios. This is a cool idea, and could be really complemented with:
$40: Guidance, Guidance, Guidance. We need to have good guidance in place on topics such as: WF Runtime Hosting Scenarios and Requirements; Long-running workflows; Workflow Versioning Strategies; Communication Options, Scalability options and more.
$20: Programming-model refinements. Things like the complexity introduced by spawned-contexts and the like cannot be 100% solved by tooling; the underlying programming model has to guide you towards writing correct code right from the start. I'll agree I don't have a good proposal as to how to fix it right now, though :-) 
$30: Better design time experience:

  • Improved designer
  • Improved Design-time validation. This could either be done by providing more compile-time validation, or, perhaps even better, through a WfCop-kind of tool. One idea I think would mix well with the superb extensibility model in WF and the Guidance idea would be to make it scenario based: You select the kind of scenario you expect your workflows/activities to run under, and extensive validation is done to see if it will work. For example, you might select the "Long-Running Transactional Workflow" escenario, and the tool would validate that things like serializability requirements are met by the workflow itself and all activities used.

While taking some time to peruse issue #8 of the Architecture Journal (an excellent resource, by all means), I ran into Tim Ewald's and Kim Wolk's article on "A Flexible Model for Data Integration". Pretty interesting, certainly in line with some of the service versioning stuff Tim has talked about.

I also found intriguing the concept of tying (connecting is possily a better word) the XML data model with the shared vs. system store model, particularly in the scenario discussed in the article - customer data - because it ties in with the common model in enterprises today of [partially] centralizing customer information in their CRM system, while allowing the recognizing the fact that some business areas (and thus the applications supporting them) will require extra data around it.

It's a pity this last topic was only briefly mentioned towards the end of the article; I'd certainly would've liked to see it presented in a little more detail (though the basic idea is fairly simple).

Tim has an interesting post here about dealing with semantic changes in existing interfaces on Web Services (continuing from his posts here, here, and here). I found this particularly interesting because we followed a very similar approach on our last project.

In our case, we had a set of existing operations on a service with an existing client using it, and needed to extend it with new operations. That was no problem. However, during development we noticed that due to changes in a backend LOB application our services connected with a new field was needed some of the original operations. In particular, we needed the client to let us know on what TimeZone (or rather what the time shift from UTC) to use when registering the time some operations occurred. However, we required the original client didn't change.

What we ended up doing was basically making the new TimeZone field optional, and then introducing logic on the operations that checked if it was missing. If it was, instead of simply assuming a global default value, we actually queried our security configuration and extracted a default value for this field that was specific to the client making the call. This was possible because the services were using WS-Security for authenticating the caller. This way, we could actually set especific default values for other clients if needed. It worked very nicely overall and was remarkably easy to put in place.

Some of you might be wondering how we dealt with DST: We didn't. A explicit decision from the business side was made not to support it because of some complexity in the business side of things, and particularly because it would've required clients in other countries to make changes to existing infrastructure and the benefits just were not worth it. One thing we did learn from that experience was that handling date and time fields in webservices where consumers and service-providers are on different countries and time zones can be easily get quite tricky, and that the way .NET handles dates on WebServices does not make things any easier.

Scott comments here on a utility XmlUrlResolver he wrote to load schemas embedded as resources in .NET assemblies that contain includes between them. This is a powerful feature, and it's nice to see how the XML stack was open to extension in ways like this.

FWIW, I did something like this a few years back (2002, actually) to load XSLTs embedded as resources for the original NUnitReport task for NAnt. Here's the code I wrote back then, which is pretty similar to Scott's code (except that I used an explicit URI schema):


/// <summary>

/// Loads the XSLT Transform

/// </summary>

/// <remarks>

/// This method will load the file specified

/// through the the xslfile attribute, or

/// the default transformation included

/// as a managed resource.

/// </remarks>

/// <returns>The Transformation to use</returns>

private XslTransform LoadTransform()

{

   XslTransform xslt = new XslTransform();

   if ( XslFile != null )

   {

      xslt.Load(XslFile);

   } else

   {

      XmlResolver resolver = new LocalResXmlResolver();

      Stream stream =

         (Stream)resolver.GetEntity(new Uri(XSL_DEF_FILE), null, null);

      XmlTextReader reader = new XmlTextReader(XSL_DEF_FILE, stream);

      xslt.Load(reader, resolver);

   }

   return xslt;

}

 

 

/// <summary>

/// Custom XmlResolver used to load the

/// XSLT files out of this assembly resources.

/// </summary>

internal class LocalResXmlResolver : XmlUrlResolver

{

   const string SCHEME_MRES = "mres";

 

   /// <summary>

   /// Loads the XSLT file

   /// </summary>

   /// <param name="absoluteUri"></param>

   /// <param name="role"></param>

   /// <param name="objToReturn"></param>

   /// <returns></returns>

   public override object GetEntity(Uri absoluteUri, string role, Type objToReturn)

   {

      if ( absoluteUri.Scheme != SCHEME_MRES )

      {

         // we don't know how to handle this URI scheme....

         return base.GetEntity(absoluteUri, role, objToReturn);

      }

      Assembly thisAssm = Assembly.GetExecutingAssembly();

      string filename = absoluteUri.Segments[absoluteUri.Segments.Length - 1];

      return thisAssm.GetManifestResourceStream(filename);

   }

 

While doing a couple of "fixes" on some schemas this past week, I rediscovered XSD named groups. For some reason, it made me inexplicably happy.

Granted, I won't take to the extreme Don does, though; I still rely heavily on named complexTypes :)

A few weeks ago, some debate sparked regarding the IBlogThis (now IBlogExtension) interface, and, at that time, Don suggested using IXPathNavigable or XPathNavigator over XmlDocument through the interface. And so, we now have an IBlogExtension interface that uses IXPathNavigable indeed.

However, this raises a small question: Why would you prefer IXPathNavigable over XPathNavigator directly? After all, the former has only one method (CreateNavigator()) so you can pretty much only use it to get an XPathNavigator instance anyway.

While on the topic, I figure this could be a good time to make a few wishes regarding the Xml support in the .NET framework: Bridge the different abstractions you guys provide! Overall, I really like the XML support in the framework (especially compared to what we used to have before), but sometimes, you find significant gaps in how the different abstractions provided interact. So yes, I'd like things like an XPathNavigatorReader and to be provided by the framework. Is that too much to ask? I think not.

Fumiaki Yoshimatsu points out on a comment to my xs:any post that skip seems to work on .NET 1.1.

I just tried it, and indeed it works and shows no error. Lax still gives me an error, though, which I think should not, either.

Can anyone help me out by clarifying what the correct behavior should be for an XSD validator in the prescence of xs:any?

Imagine I had the following schema:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema
targetNamespace="http://www.winterdom.com/schemas/valprb1"
elementFormDefault="qualified"
xmlns="http://www.winterdom.com/schemas/valprb1"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
<xs:element name="test">
<xs:complexType>
<xs:sequence>
<xs:element name="value1" type="xs:string"/>
<xs:any namespace="##any" processContents="skip" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

And the following XML:

<test xmlns="http://www.winterdom.com/schemas/valprb1">
<value1>whatever</value1>
<!-- Should fit in xs:any -->
<s:other xmlns:s="http://www.winterdom.com/schemas/other">asdad</s:other>
</test>

What should the correct behavior be here regarding the validation of s:other given my specification of processContents="skip"? I'm not quite entirely sure of what the XSD specification implies. My initial understanding would've been that the validator should not try to actually validate anything fitting in the xs:any section, just verify that it is indeed valid XML. However, this seems not to be the case, at least for the .NET XmlValidatingReader, which reports:

Error: Could not find schema information for the
element 'http://www.winterdom.com/schemas/other:other'.
An error occurred at file:///C:/temp/test.xml(4, 5).

The message obviously points out that I haven't given any schema information for s:other, but my question goes towards the why I should have to do it? If I was using skip?

So I'm probably misunderstanding how things are supposed to work, and I'd love it if someone would point me out in the right direction...