Sunday 10 January 2010

In (Re-)Development: SPIFF

Shipping is a Feature. A feature that, to be fair, most (all) of my personal software projects have dropped from scope at some point along the way.

One such project was SPIFF. Originally the less pronounceable "SPADF" (Simple Parser for Aribtrary Data Formats), and developed about 4 years ago, the aim was to have a simple way to get data out of those nasty binary file formats that employ obscure things like, y'know, bytes. I mean, why store your integer in 4 bytes when you could wrap it in an xml file?
<?xml version="1.0" encoding="utf-8"?>
<root>
<data type="integer">1</data>
</root>

At the time, I was trying to parse out id3 tags from MP3 files, and compare them to data in an ITunesDB file, both of which employ such ruthless, egregious efficiency. In trying to implement parsers for both these formats, the alarm bells told me that there had to be an easier way. What I wanted to do was define the data format and it's rules in a simple way, and then have some standard bit of code do the hard work. At that time, my desire was to have something like a SAX parser, to which I could listen for events and do work appropriately when the data I needed popped out.

At the heart of it was a JavaCC generated parser, which would read a file that defined a data format:
int                     headerSize
short version
byte flags
int stringLength
string(stringLength) description

and spit out a sequence of Instruction objects that knew how to parse each of these things, and fire events as necessary. Of course, file formats aren't that simple. Even in the basic example above, there's a need to evaluate one element based on the value of another - the description field has a length defined in the int that comes before it. Likewise, in any non-trivial file format, there's a need for conditionals, loops and jumps, almost always based on a value from elsewhere in the file. SPIFF lets users define these these using a dot in front of keywords:
.if(version==1.0) {
byte flags
} .else {
short biggerFlags
}

Jel is employed under the covers to deal with evaluation of expressions. Any values that have been defined previously in the file can be used in an expression, and the ampersand can be used to reference the position of that value in the file. For instance, in a 24-bit bitmap, each "row" of pixels (defined by 3 bytes) is padded to a boundary on a multiple of 4. So it's necessary to do things like:
.repeat(pixelHeight) {
.mark(startOfRow)
.repeat(pixelWidth) {
ubyte rgbBlue
ubyte rgbGreen
ubyte rgbRed
}
.skip (&rgbRed - &startOfRow - 1) % 4
}

You could also insert arbitrary groupings into the format, which would fire an event in the parser that you could use to change state, pretty much like nested elements in an XML file.

One feature that didn't make it into SPIFF was that old "shipping". The code sat in my svn repo doing precisely nothing (I never did get back to that project with the ITunesDB either). Spurred on by the realisation that I never actually finish anything, I recently picked up SPIFF again, with the aim of turning it into something more like JAXB, where the parser can populate (or even generate) an annotated object graph.

Hence, consider this the first in a series of posts covering aspects of the ongoing development of SPIFF, and my attempt to at least get it in the public domain.