Back to the Past

Link. August 7, 2008. Comments [0]. Posted in: Architecture | BizTalk

Every so often I see people asking in the newsgroups how to solve certain challenges they encounter while working on their BizTalk applications. One common question revolves around being able to go "back to the past" when an error happens during processing of a message.

This isn't a bad question at all, and usually revolves around how to simulate the behavior of atomic transactions in an environment where transactions can be a lot more complex and not always as natural.

The question usually goes like this: "I'm receiving a message in BizTalk, which is triggering an orchestration instance. The orchestration does this and that, and if it any of those things fail, I want to put the message back where I got it from".

This-That-There

The question might seem simple, but it's not always necessarily so. In fact, sometimes you have to stop a moment and ask yourself whether this really makes sense. There are several aspects you need to consider:

  1. Handling the case where "this" causes an error is probably not a big deal. Handling the case where "this" succeeded but "that" failed, however, might not be that simple. Not all actions your orchestration might do can be undone.
  2. Most of the time you'll find that both actions can't be done as a single unit in a single atomic transaction. Fortunately, BizTalk provides very good support for long-running transactions and compensation which can help quite a bit.

    Unfortunately, long-running transactions and compensation models are often misunderstood (cue in the inevitable "How long does a transaction have to last to be a long-running transaction?" jokes/questions).

    Here are a few articles that do a great job of describing the BizTalk Transaction features and how to use them effectively:
  3. The sentence "put the message back where I got it from" can be either a very good thing, or a very problematic thing. It basically relates to leaving stuff as you found it; in particular, leaving the message back into its origin (thus relating to the transactional concept of "nothing happened here, move along")so that you can try processing it again later on and hopefully it will succeed at that time.

The problem with number 3 is that it (a) isn't always possible, and (b) it isn't always a good idea.

It might not be possible to put the message back where you got it if someone was pushing the message to you instead of you pulling it from somewhere. If you had a SOAP/HTTP WebService exposed that received a message from someone else, then you probably can't put the message back where you got it from!

On the other hand, this is a very common model for queued messaging systems: If you run into an error processing the message, you put it back into the queue and try again later. And this works great many times and can simplify error handling a great deal.

The point where this becomes a problem is when you rely on this as your only error handling mechanism. If you blindly send the message back to its origin to retry processing for any and all errors and a message comes in that always fails, you've got yourself a poison message!

toxic I've already talked about Poison Messages in the past, so I won't comment much more on them. But there are other things you can keep in mind to improve the "back to the past" error handling technique, particularly if you don't care about message processing:

  1. If you can identify and classify the source/cause of the errors, you can make your orchestration smarter about how to handle them. For example:
    • Can you distinguish transient error conditions? For example, a timeout connecting to the database might be a temporary condition because of a network fluke or a server being restarted. Sometimes retrying the operation after a short while is enough to deal with this situation effectively.
    • Can you distinguish errors that might require manual intervention to fix? Example: Validating an operation fails because some configuration data is missing. This is a case where you want to be proactive and raise an appropriate alert so that someone can get in there and fix the issue. Extra points if you can tell apart conditions that require intervention from a business users and those that require it from a systems administrator.

      Notice, however, that in this case putting the message back at the start right after creating the notification is not the right thing to do. People don't react that fast. You need to set the message aside until such time as the corrective measure has been taken and it is safe to try processing it again.
  2. Can you control when the retry might happen? Can you throttle it if necessary? If the answer is no, then you might want to be very careful about using this technique. You could easily increase the system load substantially if lots of messages fail in a short time and you try reprocessing them in a tight loop.
  3. Be mindful of adapters that provide no ordering semantics. For example, if your original location used the FILE adapter and you put the message back in the original folder, it will likely get picked up very soon again for processing; which can quickly get you back to step 2.

    At least with an adapter like MSMQ you can push the message to the end of the queue, which might buy you some time.
  4. Even if you take 1, 2 and 3 into account, you still need to provide a way to deal with poison messages. Keep in mind that what started as a transient error condition can suddenly escalate to a full-blown problem you can't do nothing about, like when that temporary network fluke turns into a days-long outage after some idiot digging a whole outside snaps your network fiber cable in two.

    In fact, sometimes you might need to go so far as to completely shut down processing. Sometimes being able to detect that some things that should be working keep failing after an extended period of time and alerting about it can help get things sorted out before they spiral out of control.

These are just some ideas that might help make your system more reliable and more manageable. Some of them do cost money; that is, you have to invest time and development/testing efforts in getting them done, and that's where you're going to have to evaluate what makes sense and what not.

The other side of the bridge

Link. July 29, 2008. Comments [0]. Posted in: Architecture

06May07-110308

Some readers of this blog might have noticed that my posting frequency has been somewhat reduced during the past few months. Some of this is the result of working on several different projects, some of which isn't stuff I can talk about much.

Besides my usual .NET/BizTalk work, I've been increasingly spending time working on some Java stuff as well. I would say that I probably spend my time 50/50 between both technologies nowadays. There are good and bad things about this, but that's a topic for a different post.

One of the interesting things I've worked on lately on the Java side has been  implementing custom Binding Components based on the Java Business Integration (JBI) specification. JBI isn't a new technology; it's been around for a while, but until now I had not had the opportunity of working with it.

Mostly, I've been working with Sun's OpenESB, which is actually not that bad, though I've also used JBoss/ServiceMix a bit.

It took me a little while to fully grok the core of the JBI model. There are many similarities in the concepts it uses to other messaging technologies, but the terminology used can be somewhat different, which can be confusing sometimes. In case you're not familiar with JBI, it defines two different kinds of components developers might wish to implement:

  • Service Engines, which are just components that do some processing and send and/or receive messages.
  • Binding Components, which are mostly transport-level adapters. These are the equivalent components in JBI to BizTalk adapters and WCF's transport channels.

I'm not going to do a comprehensive comparison of JBI to BizTalk or WCF, but did want to make a few observations about some aspects that I found interesting.

Abstraction

The JBI spec is a fairly low-level specification. Much of the spec is really concerned with how components interact with the JBI server and how class loading and deployment work, as opposed to defining a more higher-level messaging model.

Actually, maybe I should say that the JBI spec doesn't really define abstraction layers. It's all a single layer that addresses a number of different concerns.

Most of the JBI messaging model is really defined to match the WSDL model, and there are advantages and disadvantages to this. On the plus side, the WSDL model is generic enough to be useful in a broad set of scenarios, and it is fairly well known. It also makes a lot of sense considering the terminology used throughout the spec (once you get familiar with it).

On the bad side, it means that components also have to get intimate knowledge of the WSDL contract model and this forces some compromises I personally don't quite agree with.

For example, in WCF and BizTalk, if I'm writing a transport-level adapter/channel, I may need to be intimately aware of the contract definition being used by the client/server. But it's not really necessary most of the time, particularly when you're working with network-level transports. Mostly, you care about the Message Exchange Pattern (MEP) that the contract demands.

In JBI, however, you have to understand the entire contract. This might be fine for Service Engines, but it can be rather inconvenient for Binding Components, as now you have to figure out out how to map contract operations to your transport, and even how to marshal data back and forth based on the contract definition.

Configuration

This leads me to my second gripe about JBI: Configuration.

Endpoint configuration in JBI is done through a combination of two things:

  • The WSDL that defines the contract you'll want to receive or send messages to (extended with your binding component specific port/binding/operation configuration data)
  • The deployment descriptors which tell your component if you want to receive or send messages for the corresponding WSDL file.

wsdl

In reality, WSDL permeates the entire JBI development experience, including how messages are represented internally between the JBI runtime and binding components / service engines.

A huge downside of this, though is that the code needs to "understand" the contracts being used, so parsing the WSDL files and extracting your component's configuration data is your responsibility. Libraries like WSDL4J help, but it can still be a drag.

What does really irk me a bit about JBI though is not that you have to manually craft WSDL files (not that it's a lot of fun). It's that the JBI spec doesn't address the design time aspect of JBI components at all besides, you guessed it, manually crafting WSDL files and deployment descriptors.

So if you're creating JBI components and want a decent design time experience, then you need to target each different server/tooling specifically. For example, for OpenESB you'd craft a custom NetBeans module, while for another tool you'd do something different. It's just not nice.

What's worse is that some servers decided this model wasn't good enough either and added their own configuration models for components and contracts. For example, while ServiceMix supports the stuff mandated by the JBI spec, it also supports an alternative XML-based configuration syntax. While it's more compact, it's hard to argue it really is a significant improvement.

Encoding

This is one aspect where I thought the JBI spec really blew it. As I said above, the JBI spec uses an XML representation for messages internally, that matches the WSDL message definition (parts and all).

There's no layering here that allows binding components to worry about sending and receiving messages without really caring about message contents. They have to care about the message contents and how to map that to however that message was defined in the contract WDSL definition. This might mean looking at which parts are defined and how to break the message down to them, or how to decode complex XSD types defined in the contract.

Instead of being delegated explicitly in the spec to separate components, it's all the responsibility of the binding components themselves. The obvious benefit of having this layered would've been more reusability of said encoding/decoding components, particularly if a minimum set of encoders were mandated by the spec.

Just to take one example: basic XML encoding/decoding is such a basic part of the system, that it's just really odd each component writer gets to redo it from scratch."

I think this is one aspect that BizTalk in particular gets very right. WCF does keep encoding/decoding an explicitly separate task, even if transport channels are responsible for using the encoders directly, but it is still a much better solution than what JBI came up with.

What Irks Me About Visual DSLs

Link. December 3, 2007. Comments [2]. Posted in: Architecture | Development

There's a lot of talk about Domain Specific Languages lately. The exact definition of what a DSL is, however, might change depending on who you ask. Microsoft itself tends to favor significantly Visual DSLs, that is, domain specific languages that are made of visual components (as opposed to Text-based DSLs that are made of some kind of text driven representation).

Frankly, I don't expect MS to change their direction, nor am I sure it would be the wisest decision given their target audience, but I do tend to favor text-based DSLs myself, for several reasons:

  1. Text-based DSLs work best during development. We have a significant amount of experience and a rich set of tools available to deal with text in an effective fashion: Source control and comparison tools, good editors, diff'ing and merging, and so on.
  2. A text-based DSL is illustrative in and of itself. Anyone with a text editor can look at it, so it only requires special tooling during execution, unlike their visual counterparts.
  3. If you're spending significant time using a DSL to create new things (versus, say, simply visualizing existing stuff), then a textual DSL is usually more effective.

I should say at this point that XML-based languages don't necessarily fit this descriptions. XML can be clunky at times, and a lot of people hate having to manually crank XML to do something. For example, many people dislike manually editing NAnt or MSBuild build files.

What's not to like about Visual DSLs

Many Visual DSLs are very appealing at first to create new things when you're unfamiliar with the language, as they can be very didactic. But once you're familiar with the language, Visual DSLs, as implemented by most tooling out there, will usually get in the way instead of boosting your productivity.

Don't get me wrong; there are a lot of things to like about Visual languages. In particular, they can be great tools for visualizing things. In some cases, they are great tools to editing existing things and occasionally, even creating new things.

The last one, however, is pretty rare. I've been thinking a lot about this, and I've started to think that one of the reasons this is so is that there's a fundamental disconnect in how we usually think about Visual Languages and Tools.

The disconnect is that we tend to assume that the visual representation of the underlying domain that is best for visualizing and describing the language is actually an acceptable choice for "writing" in that language.

For example, let's consider Windows Workflow Foundation workflows or BizTalk Orchestrations. Both could be consider DSLs for building processes and workflows, and they are actually pretty effective at that. Both use a visual representation that feels more or less natural to people used to working with processes (or state machines, in the case of WF). Both of those representations are great for working with existing processes, as they allow the reader to quickly grasp the flow of it, and it even works very well when debugging a running process.

But, to be honest, both leave a lot to be desired when you're actually sitting down to create and define a new process, and both tend to get a lot in the way. I personally feel that WF is a lot worse in this respect.

XAML

I should mention that I do not consider XAML a text-based DSL (even if it is "just text"). Fundamentally, XAML is just a serialization format, and that shows in a number of places. It is build to be created and consumed by software tools, not the human developer (though it is possible to do so, as many people found out with WPF in the early days).

More importantly, these kinds of XML/XAML languages that are aimed at tools don't necessarily work great with the tooling we have for dealing with text (see the all-important point 1 above). For example, a lot of people have found the hard way that trying to do a diff or trying to merge two copies of a tool-generated XML/XAML file can be nearly impossible at times.

It's pretty evident that Microsoft is working on a lot more tools based on XAML, so that's here to stay, but it remains to be seen yet how that is going to work out. I'm sure there's going to be good Visual tooling around it, but, as usual, the problem is that it just isn't enough.

What about Oslo?

A lot of my fellow MVPs and a bunch of people that attended the recent SOA and Business Process Conference have mentioned Microsoft's Oslo initiative that was announced at the conference.

From what little I know of it, it is a far reaching initiative, touching multiple key products in the Microsoft development platform. A significant component of this effort is investment in models and, you guessed it, modeling tools around them. I think it's obvious to everyone by now that a substantial set of those tools will be built around visual DSLs and visual designer tools (that XAML's in there somewhere is probably also a safe bet). Some people will think this is a key advantage, others will probably hate it.

The one conclusion I've reached so far regarding Oslo is that will likely mean a significant shift in how we do development on the MS platform (at least for those of us involved in connected systems). However, I'm holding my thoughts on what will be good or bad about those changes until we know more precisely what the Oslo technologies will be delivering and we have a clearer picture of how we will interact with them. Also, it's clear that this is an initiative that will be gradually rolled out, so it will probably be a long transition period around it (which is both good and bad in itself).

As customers, and users of those technologies, however, we have a big opportunity, and a big responsibility in letting Microsoft know what kind of tooling we want/need to have around the modeling tools and other technologies. Like I told someone at MS recently: "I don't expect MS to shift its position on visual tooling and Visual DSLs, but I do wish the hooks and infrastructure was there for us in the community, creating our own, non-visual DSL tools around it that allow us to work more effectively with the technology". Hopefully, that little thought will not be forgotten.

System.Transactions and Workflows

Link. July 15, 2007. Comments [3]. Posted in: .NET | Architecture | Workflow

Scott Bellware discusses some interesting things here regarding transactions and extensibility hooks and even considering Ruby on Rails as a web front end. I'll ignore the Ruby thing, but I do want to talk about his comments on transactions and workflows.

If I understand right, Scott is basically providing a way to extend his application through user-defined workflows that are executed at specific and controlled extensibility points. This is a very cool scenario for WF that enables very interesting possibilities.

His concern, however, seems to center around controlling what happens when one of those user-defined workflows fails, to ensure the entire system is not left in an inconsistent state. This is certainly a valid concern, but it is one that I feel isn't quite as simple to solve as simply "use transactions".

Let's take WF out of the picture for a moment and assume we were using a good old code-based extensibility model (or even a script-based one). Even with the help of System.Transactions and distributed transactions, there's really no way to guarantee that whatever code users of the system put into the extensibility hooks would still work correctly in the event of failure. This revolves around the fundamental fact that there's no guarantee that whatever is put in there is even transaction-aware. Granted, there's a big chance that code put in there is indeed transaction aware, but that's only because a lot of what comes directly into the core .NET framework is tightly integrated in itself. Indeed, for all you know, the user might have even explicitly bring out his operations out of the transactional context (for example using "Enlist=false" in his connection strings).

So even if you wrapped your extensibility hooks with a transaction scope built with System.Transactions, there's no guarantee that a transaction rollback will really bring the system to a consistent state. That's a constant danger with extensible systems, and one that it is not easily addressed. If you've noticed, most framework facilities focused on extensibility only concern themselves with the security problems, but not with extension behavior in the general sense.

This fundamental issue doesn't really get any better just because your extensions are now written as workflows. Indeed, it can become a bit more complicated. To me, using System.Transactions with WF Workflows used in this matter doesn't really make much sense, for several reasons:

  1. Yes, WF supports System.Transactions (indeed this is what a TransactionScope activity uses underneath), but the same danger exists of someone using an activity that's not transaction-aware.
  2. Wrapping the entire workflow in a System.Transactions transaction limits substantially what the user can do with the extension workflows. Are you willing to restrict your users to running only short-lived, simple workflows?

The second one is to me the most significant one, because putting that restriction in place limits substantially the power of WF for these scenarios, particularly when you allow workflow persistence and tracking as part of your application extensibility.

The one benefit that WF gives you over code-based extensibility for handling errors is indeed mentioned by Scott: Compensations. This is a very powerful mechanism, particularly for long running workflows, and can certainly be leveraged by extension workflows to put the system back into a consistent state.

In the end, just like with code-based extensibility, you still need to put guidelines in place to make sure users extending your system do so in such a way that they don't jeopardize the behavior and consistency of the entire system, but even with WF, it is still hard to put those guidelines in executable form.

In some cases, however, WF can indeed make it easier, if you're willing to live with some restrictions. For example, here are some ideas to think about to improve your chances:

  1. Restrict the vocabulary: WF can be extended through the use of custom Activities. Ideally, you will have devised a set of custom activities specific to your problem domain that the users can use to compose their extension workflows. If you're willing to restrict users a bit, and your activity set is complex enough, consider putting restrictions in place so that users can only compose extension workflows using a set of activities you've "approved" (those on your own set and some basic flow control ones, for example). This won't be a perfect model, but can improve the chances that users don't extend the system incorrectly, and can easily be done.
  2. Use validators: Take advantage of WF's ActivityValidators to enforce rules on your extension workflows. If you've also restricted the vocabulary, you can certainly take advantage of domain=specific knowledge to write the validation rules. For example, if you've got an UpdateXXX activity, you could validate that it always is used within a TransactionScope activity.
  3. Create templates your users can use to get started writing extension workflows. Ideally, make these into top-level workflow activities your users can derive from. So, for example, you could have users create a workflow derived from InsuranceExtensionWorkflowActivity instead of a simple SequenceWorkflowActivity, Your base workflow could also have custom validators enforcing, for example, that a global compensation flow is implemented by the users (if necessary).

None of these are foolproof, but they can improve the odds in your favor when used correctly.

Visual DSLs

Link. July 13, 2007. Comments [2]. Posted in: .NET | Architecture

Scott Bellware wrote yesterday on disambiguating fluent interfaces (aka APIs) from domain-specific languages. Rant aside it's an interesting read. Something, however, gave me a chuckle: Scott says:

"At some point in the near future, Microsoft will tell the .NET mainstream that DSL's are visual, that they are drag-and-drop, and that they require a new plug-in to Visual Studio in order to make them and use them."

Hummmm..... sorry to say this Scott, but they already did a couple of years ago :-). I think it started with the whole Software Factories [1] idea that Microsoft pushed some time ago.

I must say, though, that I don't hold any grudges to the idea of Visual DSLs. In fact, they can be very useful tools both to create and visualize software. The problem appears when a) they are not the most productive way of working either because of limitations in the visual language itself, or b) because of poor implementation. The Windows Workflow Foundation Designer strikes me as an example of a concept where the visual thing works very well, though it has an unfortunate implementation (it's slow as molasses, almost unbearable to use at times!).

For many of these tools, what I'd really like to have is the best of both worlds: Having a visual experience can be very useful, but having a productive, effective and efficient textual dsl is, to me, a core requirement. Despite what some people might think, writing text is still a far more efficient means of writing software if you have the right language at your disposal.

[1] I think the Software Factories idea has some good merits as a conceptual concept and vision, but it was unfortunately named (and I believe the term has made substantial damage to our software industry, by the way).

Syndicate

About

Tomas Restrepo is a software developer located in Colombia, South America. His interests include .NET, Connected Systems, PowerShell and lately dynamic programming languages. More...

tomasrestrepo @ twitter My Flickr photostream My saved links on delicious My Technorati Profile

email: tomas@winterdom.com
msn: tomasr@passport.com

View my profile on LinkedIn

MVP logo

Ads


Categories

Statistics

Total Posts: 1051
This Year: 2
This Month: 2
This Week: 1
Comments: 827

Archive

Other

Copyright © 2002-2008, Tomas Restrepo.

Powered by: newtelligence dasBlog 2.2.8279.16125

Sign In