Monday, November 24, 2008

Dynamic Compilation and Loading of Modules in Haskell

The Haskell system GHC has libraries that provide the ability to compile Haskell code and dynamically load it into a running Haskell program. A library that provides this functionality is hs-plugins. Unfortunately hs-plugins doesn't work with the latest GHC release, 6.10.1.

There is an API to the internals of GHC that allows implementing runtime compilation and loading. The api is documented but there aren't very many usage examples that work with the latest GHC release. I stumbled across a blog post which no longer exists describing an older version of the API, but managed to retrieve it from archive.org which provided a lot of help since the API's are similar.

As a means to learn how to use the API I decided on a very simple use case of compiling and loading a very simple Haskell module containing a single function 'print' that outputs a string. The module looks like this:
module Test (Test.print) where

print :: String -> IO ()
print x =
putStrLn x
To compile and load this code dynamically we need a compilation target. This is created using the function guessTarget. It's passed a string that refers to the name of a module, or the name of a source file. By using a source file we can tell the Haskell system to compile and load it. The phase argument I set to Nothing as I have no idea what that is.

Once we have the target it needs to be added to the current compilation session. This is done via the addTarget function. When all the targets are added, a call to load will do the equivalent of a ghc --make to build the module if needed. So all the relevant target code looks like:
do
target <- guessTarget "Test.hs" Nothing
addTarget target
load LoadAllTargets
Notice this is run within the 'do' syntax. This is because all these compiler functions must be called with a special Monad, called GhcMonad. The way this works is described later but just imagine that the GhcMonad is implicitly passed to all the Ghc functions we are calling automatically.

'load' returns a success flag that can be pattern matched on to determine if it succeeds or failed. If it succeeds we can start using the new module. To be able to access exported functions from the module we need to get a reference to the module using findModule and use setContext on it. 'setContext' takes two arrays containing modules. All modules in the first array will have their top level scope available. All modules listed in the second array will have only their exports available. In this example we want to access the 'print' function of 'Test', which is exported.

The function compileExpr is used to compile a string containing a Haskell expression. The result of this is an HValue which we can do things with. The expression that is compiled can access any of the bindings available in the context we set up via 'setContext'. So passing "Test.print" to 'compileExpr' will return the function 'print' from the 'Test' module that we dynamically built and loaded. Code similar to this:
do
r <- load LoadAllTargets
case r of
Failed -> error "Compilation Failed"
Succeeded -> do
m <- findModule (mkModuleName "Test") Nothing
setContext [] [m]
value <- compileExpr "Test.print"
Unfortunately the result of 'compileExpr' is of no use to us as an HValue if we want the actual function type so we can call it. Haskell being strongly typed we have to 'cheat' by telling Haskell 'yes, I know you think this is an HValue, but I can assure you, it's really a function'. This is Haskell's super dangerous Unsafe.Coerce. It allows us to coerce from one type to another and is, as the name implies, completely unsafe. If the type you are coercing isn't really of the type you are coercing too then you'll very likely crash your program and/or corrupt data. It's the reinterpret_cast<> of the Haskell world. But for handling dynamic loading of code it seems exactly what we want. I'm not sure of any other way to convert the HValue type.

The type of 'print' in the 'Test' module is 'String -> IO ()'. 'unsafeCoerce' can be called to convert the 'HValue' to this:
value <- compileExpr "Test.print"
do let value' = (unsafeCoerce value) :: String -> IO ()
return value'
The receiver of the result, value', can now call the function as it knows its type. The complete code for this example is:
mport GHC
import GHC.Paths
import DynFlags
import Unsafe.Coerce

main :: IO ()
main =
defaultErrorHandler defaultDynFlags $ do
func <- runGhc (Just libdir) $ do
dflags <- getSessionDynFlags
setSessionDynFlags dflags
target <- guessTarget "Test.hs" Nothing
addTarget target
r <- load LoadAllTargets
case r of
Failed -> error "Compilation failed"
Succeeded -> do
m <- findModule (mkModuleName "Test") Nothing
setContext [] [m]
value <- compileExpr ("Test.print")
do let value' = (unsafeCoerce value) :: String -> IO ()
return value'
func "Hello"
return ()
Build using:
$ ghc -package ghc --make Api.hs
[1 of 1] Compiling Main ( Api.hs, Api.o )
Linking Api ...
$ ./Api
Hello
$
You can test that it is dynamically compiling and loading by modifying 'Test.hs' so the output is different. Change the 'putStrLn' to putStrLn $ x ++ " World!" and rerun 'Api' without recompiling it:
$ ./Api
Hello World!
$
I mentioned previously the implicit 'GhcMonad' that is passed to each GHC API function. This is provided by the runGhc word. It takes a pointer to the directory where GHC's library files reside. In this example I use the GHC.Paths (available from Hackage package to deal with this using 'libdir'. The second argument is the action to perform, the GhcMonad, and is built using the 'do' syntax. Haskell's 'do' syntax is syntactic sugar that does the magic of threading the GhcMonad state holding the targets, modules, contexts, etc through the various function calls. Find a favourite Monad tutorial/explanation if you want to know more.

'defaultErrorHandler' sets up the environment to handle compilation errors in a standard way. The 'DynFlags' related calls get and set standard flags that can be used to configure the compilation. You can set the equivalent of GHC arguments to control extra features that GHC can use, or use the interpreter (as used by GHCi) rather than the compiler.

Hopefully that helps explain some of what is going on and how to use the current GHC api. I'd appreciate comments on better ways of doing things, and other examples of the API.

Categories:

Labels:

1 Comments:

Blogger Chris Double said...

There's a good description of the GHC API here too:

http://sneezy.cs.nott.ac.uk/fplunch/weblog/?p=168

The referenced slides in that post are annotated and easy to follow.

11:54 AM  

Post a Comment

<< Home