I've talked in past entries about event-driven activities (those implementing IEventActivity) in Windows Workflow Foundation and how to implement them. I discovered most of this stuff by going over all the samples available and a lot of spelunking around using Reflector while trying to create my MsmqActivities for WF. However, a recent question on the WF forums made me aware of a significant bug in my implementation.

Let me try to describe from the beginning what the problem is:

Normally, an event activity when used in an event context (say in one branch of a ListenActivity) works roughly like this:

  1. The ListenActivity will right at the beginning call the IEventActivity.Subscribe() method in your event activity forcing it to initialize any subscriptions needed. Normally this involves notifying a runtime service that the activity is interested in a given event and creating a WorkflowQueue for the notification.
  2. When the event occurs, the runtime service notifies the activities involved through the corresponding WorkflowQueue. This causes the IActivityEventListener.OnEvent() method of the ListenActivity to be called (since it is the one that subscribed to the event), which in turn asks our activity to execute (because it is the one that fired the event first).
  3. After this, the ListenActivity will call our IEventActivity.Unsubscribe() method to unsibscribe from our event (regardless of whether we triggered the event in the first place). Usually, this means we'll go out to the underlying runtime service to ask it to remove the subscription and delete the WorkflowQueue we were using.

At any point in time during this process the workflow engine might decide to unload our workflow instance from memory and persist it using a configured persistence service (such as the built-in SqlWorkflowPersistenceService). In a lot of scenarios this will work correctly, because the WorkflowRuntime handles the persistence of the most important items: It will persist the workflow state and the state of each of the created activities (including our own). As part of this it will also "remember" which WorkflowQueues had been created and will even persist any contents of those queues, which is why anything you put on a WorkflowQueue should to be serializable.

Looking at this with my own MsmqActivities, I noticed that if the host process hosting the workflow engine was kept alive, everything worked just fine, even if the individual workflow instances were unloaded and reloaded. However, if the hosting process was terminated after the workflow instances where unloaded from memory and it was started again (thus loading any persisted workflow instances), things might not work right. Specifically if the hosting process went down after IEventActivity.Subscribe() had been called but before the MsmqListenerService had received the message on the queue it was subscribed to, then when the workflow instance was loaded again by the runtime it wouldn't work because the message would never arrive.

Looking at this problem in detail, here's what I think is happening:

  • The workflow persistence services are aware of workflow instances, activities and workflow queues, but they really have no clue about any other runtime services such activities might depend on. In other words, they are not aware of any persistence requirements of the runtime services a given workflow instance is actively using.
  • Under normal circumstances, the IEventActivity.Subscribe() method will only be called once by the event context for your activity instance. So, even if it had been called prior to the workflow instance being unloaded, it won't be called again when it is loaded into memory again.
    This actually makes a lot of sense if you think about it, because the runtime is already aware of the WorkflowQueues created and event handlers attached as a result of the original subscription (since it is taken care of by the persistence services).

The biggest issue by far, though, is how the MsmqListenerService itself works. The MsmqListenerService actually takes proactive action when it is notified of a subscription being created by the MsmqReceiveActivity, by creating a listener for the MSMQ Queue the activity wants to receive a message from. This is not something that is required by all event activities (the built-in activities are pretty passive in this respect), though I believe it will be a common requirement.

When the hosting process is shutdown, all MSMQ Listeners are removed from memory alongside everything else. Because the MsmqListenerService didn't have any built-in persistence mechanism, when the WorkflowRuntime was spawned again, all persisted workflow instances were loaded again but the MsmqListenerService was a "clean slate", with no knowledge of any subscriptions that had been created on the previous incarnation!

Based on this facts, it is my conclusion that if your event activities interact with their runtime services in such a way that the services need to keep track of subscriptions created on their end, you will need to provide your own persistence mechanism and store so that they "survive" across host restarts. I have already implemented a basic persistence mechanism for my MsmqActivities to keep track of which subscriptions have been created and which MSMQ Queues it needs to listen to and use that information to respawn the queue listeners when the workflow engine is started. So far, it seems to be working correctly.

I'll discuss the implementation in more detail in a few days when I'm ready to release the updated package.

Tomas Restrepo

Software developer located in Colombia.