Friday 12 November 2010

Please release me

Ship it, ship it, ship it! Spiff v0.1.0 (yes, I'm being cautious) out now, you can grab the jar from https://github.com/revbingo/SPIFF/downloads.

Tuesday 21 September 2010

Spiff: The competition

In the intervening years between the first inception of Spiff (yes, I'm going camel case, upper case everywhere just won't do) and it's subsequent revival (and re-revival), a couple of competitors have appeared in the same space.

Closest in spirit to Spiff is Preon. It even states it's aim "to be to binary encoded data what Hibernate is to relational databases, and JAXB to XML", which pretty much sums up what I want with Spiff. Where Preon differs is in it's extensive use of annotations to do what Spiff does in it's format definition file (which Spiff calls an .adf file - Arbitrary Data Format). Preon will examine your classes and derive the data format from the order and types of annotated fields in the class. It also uses annotations to derive looping and conditional logic.

I'll admit that I've not used Preon - partly through fear of polluting my ideas about what Spiff could and should do, and partly in case I decided it was better than Spiff and just decided to call the whole thing off. So any discussion of it's merits and drawbacks are truly superficial. My general impression is that it's reliance on annotations are a little hairy - when you're expressing logic in annotations, things look a little awkward. Spiff trades off compactness (having everything described in the code) for readability and also portability. The event dispatching and class binding mechanism means that one .adf file can be used to populate classes of any shape without needing to respecify the file format. This also highlights the fact that it looks like Preon expects the classes to fully describe the file format, which is rarely what you want in the code. One of the use cases that led to me starting to write Spiff was wanting to get little pieces of the data without having to worry about the rest of the file format. And thus were .jump and .skip begat.

On the other side, the Google lads are also in the frame with protobuf. Protobuf is interesting in that it uses something analogous to the .adf file to describe the format. Where it differs is that it will generate classes for you to serialize and deserialize the format. That's something that Spiff might be capable of one day, but I like the idea of being able to write arbitrary POJOs and map the data onto them, rather than having objects in my code whose sole purpose is as marshallers. Also, it's largely oriented towards message-passing, that is, describing a message that will be passed between two systems, such as in RPC, where the user is in control of both ends of the transaction. To that end, the .proto definitions are reliant on using the underlying protobuf grammar for the message, for instance to recognise repeated blocks of data, and don't have some of the flexibility to express more complicated relationships between parts of the file.

I see a couple of strong points in Spiff from this. Portability of .adf files is possibly the biggest. Once someone has defined an .adf for, say, a .bmp file, or an ItunesDB file, anyone else can take that and use it to bind all or part of that data to their own classes. The other is flexibility, in hopefully being able to express all the things that can make binary file formats tricky to work with. I guess first step is to have a working product...

Spiff!

Ok, I'm back on Spiff. I mean it this time. Repeat after me - "I will ship software, I will ship software, I will ship software".

Coming back to code after a little time is an interesting experience. Almost every time you can guarantee a few nuggets of insight that hadn't occurred previously.

Today's lesson: if it's difficult writing a unit test for (usual suspects being the FileNotFoundExceptions and if(x == null)) conditions), it's probably not worth having in the code.

I'm not normally one for striving for 100% code coverage with tests. Like all things that fall under the agile/XP umbrella, if you're doing it by the book, you're doing it wrong. There isn't a book that tells you how you should be working on your projects.

However, in this instance, I thought it would be an interesting exercise to try and get up to 100%. By getting OCD on the unit tests, I found at least two conditions in my code that couldn't actually occur:
  • a null check on an object right after it's constructor was called, and
  • an exception thrown from code where I was using dynamic assignment where static assignment was sufficient.
In the latter case, I had
try {
lib = Class.forName("java.lang.Math");
} catch (ClassNotFoundException e) {
//how do I get here?
}
Can you write a test that exercises the catch block? Nope. This was a remnant of old code that hadn't been cleaned up. What I should have been doing was
lib = java.lang.Math.class;
which doesn't throw any exception.

It's good to remember that adding test cases is not the only way to get closer to 100% code coverage - deleting code does just as well.

How final is final?

One interesting tidbit from Spiff development. How final is
final int x = 1
? Answer: Not very, if you're using reflection. Field.setAccessible(true) will soon get you round any awkward encapsulation issues.

So, newly armed with that knowledge, what's printed out here?
public class HowFinal {
private final int x = 1;

public static void main(String[] args) throws Exception {
HowFinal howFinal = new HowFinal();
Field f = howFinal.getClass().getDeclaredField("x");
f.setAccessible(true);
f.set(howFinal,2);
System.out.println(howFinal.getX());
System.out.println(f.get(howFinal));
}

public int getX() {
return x;
}
}

The answer, unexpectedly, is
1
2

Er, so x was final after all? Sort of. The compiler inlines constants at compile time, so as far as the runtime JVM is concerned, getX() contains the code return 1;. Querying the field via reflection shows it's true value of 2.

Is there any question whose answer doesn't start with "it depends"?

Spiff on Github

Spiff is now available to download/fork/whatever at GitHub. The GitHub site also has a wiki with some instructions for getting started.

Current state is pre-pre-pre-pre-alpha. That is, it doesn't really work, I'm rebuilding some of the core parts, but it's there for anyone who wants to snoop. Ship early, ship often!

Sunday 10 January 2010

In (Re-)Development: SPIFF

Shipping is a Feature. A feature that, to be fair, most (all) of my personal software projects have dropped from scope at some point along the way.

One such project was SPIFF. Originally the less pronounceable "SPADF" (Simple Parser for Aribtrary Data Formats), and developed about 4 years ago, the aim was to have a simple way to get data out of those nasty binary file formats that employ obscure things like, y'know, bytes. I mean, why store your integer in 4 bytes when you could wrap it in an xml file?
<?xml version="1.0" encoding="utf-8"?>
<root>
<data type="integer">1</data>
</root>

At the time, I was trying to parse out id3 tags from MP3 files, and compare them to data in an ITunesDB file, both of which employ such ruthless, egregious efficiency. In trying to implement parsers for both these formats, the alarm bells told me that there had to be an easier way. What I wanted to do was define the data format and it's rules in a simple way, and then have some standard bit of code do the hard work. At that time, my desire was to have something like a SAX parser, to which I could listen for events and do work appropriately when the data I needed popped out.

At the heart of it was a JavaCC generated parser, which would read a file that defined a data format:
int                     headerSize
short version
byte flags
int stringLength
string(stringLength) description

and spit out a sequence of Instruction objects that knew how to parse each of these things, and fire events as necessary. Of course, file formats aren't that simple. Even in the basic example above, there's a need to evaluate one element based on the value of another - the description field has a length defined in the int that comes before it. Likewise, in any non-trivial file format, there's a need for conditionals, loops and jumps, almost always based on a value from elsewhere in the file. SPIFF lets users define these these using a dot in front of keywords:
.if(version==1.0) {
byte flags
} .else {
short biggerFlags
}

Jel is employed under the covers to deal with evaluation of expressions. Any values that have been defined previously in the file can be used in an expression, and the ampersand can be used to reference the position of that value in the file. For instance, in a 24-bit bitmap, each "row" of pixels (defined by 3 bytes) is padded to a boundary on a multiple of 4. So it's necessary to do things like:
.repeat(pixelHeight) {
.mark(startOfRow)
.repeat(pixelWidth) {
ubyte rgbBlue
ubyte rgbGreen
ubyte rgbRed
}
.skip (&rgbRed - &startOfRow - 1) % 4
}

You could also insert arbitrary groupings into the format, which would fire an event in the parser that you could use to change state, pretty much like nested elements in an XML file.

One feature that didn't make it into SPIFF was that old "shipping". The code sat in my svn repo doing precisely nothing (I never did get back to that project with the ITunesDB either). Spurred on by the realisation that I never actually finish anything, I recently picked up SPIFF again, with the aim of turning it into something more like JAXB, where the parser can populate (or even generate) an annotated object graph.

Hence, consider this the first in a series of posts covering aspects of the ongoing development of SPIFF, and my attempt to at least get it in the public domain.