Friday 6 March 2009

Dynamic Languages: Python

Over the past few years I've thought a lot about how S compares with other programming languages. As discussed in a previous post, S is head and shoulders above general purpose programming languages in terms of the built-in statistical and graphics routines.

Having said that, S is a bit thread-bare in terms of general purpose infrastructure routines. I've spent entirely too much of my life writing S wrappers of Java routines to do things like:

  • Zip and unzip files

  • Create and parse XML

  • HTTP client operations


Various packages to do these sorts of things are around, but it's fair to say that the implementations are typically not as full featured as one would have in Java, C#, or Python. Also, some poor soul probably had to do the work of writing C code to interface R to a C library with the capabilities.

In a recent product development cycle we wanted to have a modern Windows GUI written in C# talking to a server written in Java that called a graphics system written in S. Oh, and the graphics device used C to create a graph file as well as Java to generate things like PDF from that file. It sounds a little crazy, but was driven by the desire to have a Windows rich client with a cross-platform server plus extensibility in S.

This meant jumping back and forth between multiple IDE's and some tricky debugging. It also meant essentially implementing some things twice: once in Java and once in C#. It left me longing for a system that could meet the following requirements without so many languages:
  • Cross-platform in terms of both operating systems and virtual machines. Runs on Windows, UNIX, Linux, perhaps Mac. Runs in the .NET CLR and the Java VM.

  • Has a rich standard library for string manipulation, network operations, etc. That is, it can do the bread-and-butter stuff needed by programs.

  • Uses an interpreted, loosely typed language that provides the same productivity benefits as S. Approachable and quick to learn for S programmers so that perhaps they could use it rather than S for extensibility.

Basically I wanted to have a single language that I could use to do .NET programming for the client, Java programming for the server, and ideally the type of programming one typically does in S.

The language that rose to the top was Python. The standard Python implementation is CPython which has been ported to oodles of platforms. There's also IronPython which runs on the .NET CLR and Jython that runs on the Java VM. With Mono and the Dynamic Language Runtime (DLR) it even runs on Android (Google's OS for cell phones).

I looked at bunches of other dynamic languages including Ruby, F#, Groovy, etc. None really had the right "feel" in terms of being something approachable to S programmers. Python has a sensible syntax that's easy to learn and follow. The most common complaint against it is the use of indentation as a flow-control mechanism rather than braces, and of course that an interpreted language typically isn't as quick as well-written C code.

The main limitation of IronPython and Jython is that while they can run pure Python modules they can't handle modules using C, with the most requested one being numpy. The guys at Resolver Systems are working to remedy this for IronPython via the IronClad project.

The big downside of Python by itself for S users is that it doesn't have the rich statistics and graphics available in S. So the thought of using Python alone rather than S is a non-starter. But I do think there's potential for using Python as a primary general-purpose language together with R as a statistics engine.

Not that I expect I'll be doing Python rather than C# and Java anything soon, but perhaps as the DLR matures it'll become a viable option.

No comments:

Post a Comment