I've been playing for some time now with the Dynamic Language Runtime. At first I started simply playing around with its hosting interfaces, but later I realized I really wanted to check more closely how it worked, so I started building a very simple language on top of it.

My intention isn't to build a full fledged language out of it, but simply use it to explore interesting aspects of the DLR. It's been a bit complex and frustrating at times because there really isn't much documentation yet [1] on how the DLR works (and how to build on it), but it's been a lot of fun, too!

Fortunately, there's plenty of source code to study when you're looking at the DLR. On the small scale, there's the DLR version of LOLCODE! as well as the ToyScript sample included with the DLR in the IronPython 2.0 alpha bits.

Much larger samples are obviously both the IronPython 2.0 and IronRuby source code, but they are complex enough that it can be difficult to understand how they are using the DLR sometimes (I'm starting to get a feel for how IronRuby does it, but that probably will only last until the next big changeset is merged into the SVN repository :-)).

Note: I don't know if the LOLCODE! sample has been recently updated, but the DLR has changed quite a bit since then, so it might require a few changes around to get it to build.

A most interesting source of current information on the DLR can be found in Martin Maly's blog, and it is definitely recommended if you're interested in targeting the DLR. It took me a bit to finally grok what he was talking about but after that it helped quite a bit!

Note: Do take everything I mention here with a large grain of salt. After all, I'm just a hobbyist playing with the DLR, not someone implementing a real language on it.

Targeting the DLR

There are several reasons why building an interpreter/compiler on top of the DLR makes sense, and obviously the most significant one is that it takes an awful lot of work of your hands. The DLR provides you with:

  • Most of the hosting infrastructure needed to build a script interpreter. It also provides you with a common API (the hosting API) you can use to use your language from different applications and uses.
  • An MSIL code generator: You hand it an Abstract Syntax Tree and it will take care of generating the necessary MSIL.
  • Dynamic behaviors: To a large degree, you can build a language on top of the DLR where you don't have to care about what the expressions you're manipulating evaluate to at runtime. This makes building a bunch of stuff fairly trivial for you, and actually makes it a lot easier to get up and running with something that works (for some definition of work).
  • Bunch of other stuff

Indeed, the DLR takes care of a lot of the grunt work off building a working compiler/interpreter.

So how do you actually target the DLR?

Turns out that getting started with the DLR isn't all that hard. First you need to get your hands on a copy of the DLR, which you can do either from the IronPython 2.0 alpha bits or from the IronRuby SVN repository. Personally, I've been going with the IronRuby bits since it's more convenient.

Then you need to write the code. There are a few things you need to get this up and running:

Connect: You need provide a way for the DLR to know about your language: How to parse it and how to generate code for it. In the current incarnation of the DLR [2] this is done by implementing a LanguageContext derived class.

You can start with a pretty simple implementation of this, which just creates the ActionBinder (more of this in a minute) and that implements the ParseSourceCode() method.

It may be that at some point in time you'll want to make your language known to the DLR so that you can use conveniences like the Script class to quickly parse/compile/evaluate scripts in your language. This can be easily done by using the RegisterLanguageContext() method of the ScriptDomainManager class.

Binding: You'll also need to implement an ActionBinder derived class and override the methods from it that are relevant to your language. As I understand it, the binder is responsible for making a bunch of language-specific decisions about how the DLR should generate code for your language, like for example which type conversions are supported (and how they should be implemented). Again, you can start with a pretty minimal binder class and then grow it as needed.

Generating Code: With this in place you can actually get to the good stuff. You're responsible for actually parsing scripts in you're language. Normally, you'd write your lexer and parser using whatever compiler writing tool you like (or by hand), and then use them to build an Abstract Syntax Tree that represents constructs in your language.

This means that you can build a high-level AST that's close to the concepts and semantics your language needs, but even then it doesn't need to be very complex to start with (and in some cases, it can be fairly thin). Then, you translate it to a DLR AST, which is basically a tree of nodes in the Microsoft.Scripting.Ast namespace.

There are tons of lower-level constructs in the core DLR AST, ranging from creating type and member definitions to operators and method calls. Most of what you actually deal with here are expressions (almost everything is, naturally, an expression), which you can create using the methods of the Ast class (yes, there's both an Ast namespace and an Ast class), like Ast.Call() or Ast.Assign().

This brings me back to the ParseSourceCode() method: What you return here is, in fact, a DLR tree of what should be executed, as a node of type CodeBlock (basically, a function). As long as you build the right AST, things should mostly work.

How I'm using the DLR

I started writing my simple language using the Garden Points Parser Generator (GPPG) as it's fairly simple to use and my syntax is also pretty simple. I haven't had much trouble with this yet (took me a bit to remember the lex/yacc-like syntax from my college days), but then again, my needs are simple.

Right now, I've used it to explore very specific parts of the DLR, mostly code blocks and method invocation [3], so naturally the functionality currently implemented is very limited:

  • I can define both named and unnamed functions, and both global and local functions are supported.
  • It can create instances of arbitrary .NET classes (haven't bothered to really support value types yet).
  • It can invoke the defined functions (duh!) as well as instance methods/properties in .NET objects.
  • Closures are mostly supported (they are fairly trivial to support with the DLR)

Surprisingly enough, I haven't bother to implement any built-in function or operators, so there's not really much you can do with variables yet, but that's high on my TODO list.

Some things I've learned about the DLR

Working with the DLR has been pretty fun, but one thing I've realized is that the current implementation doesn't really provide very good tooling for language developers.

One of the tools the DLR gives you to diagnose issues is the ability to dump ASTs or Rules, and this is very helpful (though it occasionally breaks down if you generate incorrect code). Sometimes, however, it isn't enough to diagnose what's going on when the DLR is doing what you don't expect.

In particular, there are substantial pieces of th
e code that actually gets executed that gets generated by AST fragments that are not created directly by your ParseSourceCode() method, and it can be pretty hard to diagnose whether that actually makes sense or not (or even if it really makes sense).

Yes, the DLR has a bunch of internal asserts on debug builds, but I've found most of them useless from the outside. Mostly, they tell you that something in your AST isn't valid, but rarely "why", unless you're very familiar with the internals of the DLR itself.

Another issue I've run into is that the DLR makes a bunch of assumptions about what you give it, and despite the fact that most of the DLR has extensive sanity checks throughout the code, sometimes some of these assumptions are not validating, leaving you to deal with IndexOutOfBounds or NullReferenceExceptions from deep inside the DLR.

I've been able to work around this things, sometimes with the help of the DLR samples, and sometimes with the help of the occasional comment in the DLR source, but finding those isn't always easy. I do expect, however, most of these issues will get sorted out with time and documentation, but meanwhile, expect to spend some time on it.

I'll cover a few specific things I've run into in a follow-up post.

[1] Completely understandable at this early stage :-)
[2] Current as in "what's in the IronRuby SVN repository today".
[3] I've actually rewritten method invocation 3 times at least, exploring some of the options the DLR gives you for that.

Technorati tags: , ,

Tomas Restrepo

Software developer located in Colombia. Sr. PFE at Microsoft.