Thursday 15 May 2014

Mass Observation

This blog post comes to you courtesy of Mass Observation

08:15
An abnormal day starts out more abnormal than expected when the builders turn up on the doorstep to start some renovation work on the new house.  Some communication had obviously gone AWOL somewhere, so we had no idea they were coming.  It's lucky for them that anyone is home - I'm travelling today so I was still pottering round getting everything ready to leave just before 9am.  Their arrival puts me in a bit of a spin as I hastily sort them out with keys and coffee and generally make sure they know what they’re doing.  Then I have only a few minutes to get myself sorted out and make sure I have everything.  Finding a European travel plug adapter proves a bit of a challenge. They used to live in the drawer under the bed, but we just had a new bed, sans drawers, so now it's anyone’s guess where they are.  I eventually track them down in the bottom of a bag in a wardrobe in Ella's bedroom.

09:15
By 9.15am I'm waiting at Pokesdown (For Boscombe) train station.  I'm heading for Grenoble with work, to catch up with the researchers in our Innovation group. As the head of our software development teams, I'm there to talk about the technical bits, but also to work out the plan for getting some of our developers living and working alongside them in Grenoble for a period of time.  The journey down to Grenoble is a little arduous - Pokesdown to Brockenhurst on the stopping train, then hopping to the fast train to Waterloo, where I'll join up with my colleagues Greg and Tim.  From Waterloo it's a tube across London to St Pancras, on to the Eurostar to Paris, the metro across Paris to catch the TGV that will carry us down to Grenoble.  All in all it'll take about 10 hours.  On the plus side, Grenoble is a thoroughly nice place to spend some time, a bustling city against a beautiful backdrop of the French Alps.

13:30
An uneventful journey to Waterloo, mostly spent talking shop about the trip.  As we get off the train, Tim insists that taking the bus to St Pancras is far better than taking the tube, which requires a change of line mid-journey.  A debate ensues as Greg asserts that we'll need Oyster cards and won't be able to pay cash for the bus journey, but Tim thinks we can, and won't take no for an answer.  We wait in Spring sunshine for the bus.  When the 59 to Kings Cross arrives, cash is acceptable, but Tim offers the bus driver a £20 note.  Unsurprisingly, the bus driver won't take it.  He says the trip is £7 for the 3 of us. I offer £6 in change, but the bus driver clearly doesn't have time for this and just waves us on anyway without paying.  I'm not sure how Tim does it.  As we take our seats, Greg points out that £7 isn't evenly divisible by 3.

St Pancras is busy, but the Eurostar terminal is actually a reasonable place to be, clean and well lit.  It even has a long bank of plug sockets, which is heartening to see - I seem to spend most of my time at airports and train stations hunting round for power for laptops and phones.  We sit at Caffe Nero; Greg asks Tim to get him a “skinny wet latte”.  The barista has no idea what this means.  Nor do we.  I suspect Greg is pulling Tim's leg, but it turns out it's a latte with no foam.  First world problems.  Everyone agrees that the brie & bacon ciabatta is remarkably delicious.

17:35 (CET)
The Eurostar is efficient, and we arrive at Gare du Nord bang on time.  It’s just as busy as St Pancras, and we weave through crowds to get to the metro.  We are a slightly shambolic crew, no-one really knowing where we need to go, but somehow we make it without trouble to Gare du Paris Lyon, where the TGV is waiting to take us to Grenoble.  Within minutes we're at full speed trundling through the French countryside.

18:30 (CET)
I have to admit, France is a lovely place.  A lot like England, but unmistakably not so.  We career down towards Grenoble, the rolling hills slowly becoming more dramatic, glimpses of proper mountains on the horizon as we skirt the Massif Central.  At Lyon the train takes on a somewhat more sedate pace.  We console ourselves with a couple of cans of Kronenbourg.  My French is terrible; the lady in the buffet car is a little put out at having to repeat her questions in English (bottle or can? Small or large?).  I suspect she's used to it.  On the way down, we talk about the perils of agile software development, how what was once cutting edge has become the norm, and so almost becoming the very thing it set out to replace. In good news, a national newspaper has published a story about how our company is using machine learning algorithms - some of the software that my teams produce - to look through Twitter and predict who should be included in the World Cup squad on the basis of sentiment.  On the down side, that newspaper is the Daily Mail, and they get our company name wrong at least once.

23:00 (CET)
We arrive in Grenoble.  Cash, once again, is an issue - we only have 12 Euros between us.  Tim asks the taxi driver, in what sounds to Greg and I like surprisingly fluent French, whether he can pay by card.  The driver actually speaks pretty good English, he says no, but we agree that 12 euros is enough to get us to our destination.  Ten euros later, we're at the Park Hotel, a strange anachronism, where room keys are still actually real keys, and every surface that stays still long enough is mirrored.  It's either outdated or a carefully cultivated image - either way it's comfortable enough anyway.  We find the closest place to eat, a roomy burger joint above the multiplex cinema just over the road.  It's decent food, although I'm quickly reminded of just how rare the French like their meat.

End of a long day, settling down with the laptop.   I answer some work emails - between holiday, meetings and travel, I've only spent one day in the last 2 weeks at work, so I have a backlog that I feel obliged to battle - and peruse Facebook. Most of the Facebook posts are just reposts of memes.  Wherefore art thou original thought?  I decide that Facebook has jumped the shark.

Want to get an early start in the morning, I need to run off the beer and burgers.  Alarm set for 6.30am.  Lights out.


I donate my 12th May diary to the Mass Observation Archive. I consent to it being made publicly available as part of the Archive and assign my copyright in the diary to the Mass Observation Archive Trustees so that it can be reproduced in full or in part on websites, in publications and in broadcasts as approved by the Trustees

Monday 9 December 2013

The Two Queues

As a manager of several development teams, I have two queues at my desk.

The first consists of all the people complaining "the Dev estimate on this work is far too high!"

The second consists of all the people complaining "Dev have taken far longer than they estimated!"

It is not unknown for members of the first queue, having had their query swatted away, to immediately join the second queue, or vice versa, without a hint of irony.

Let's address the first one here. Why might someone think an estimate is too high?  Conversely, why might a Dev team give an estimate that others consider "high"?
  • You don't agree on the scope - estimation is often done off the back of a ticket in a workflow system that has little detail on it.  Even if the Dev team engage the product owner in a conversation when estimating (and you do, right?), there are often lots of implicit assumptions made by both sides, not least the assumption that the "other side" are making the same assumptions.  In the absence of knowing the right questions to ask, it's natural for developers to err on the side of caution and fill in the gaps with their own assumptions about what's needed.  There's a simple solution to this: keep talking.  Find out from the Dev team what assumptions they've made, or what tasks they're planning, and help them turn them into concrete things.
  • The team doesn't know the domain - perhaps they've been asked to take over a system they haven't worked on before.  It is unrealistic to expect a team that's never worked on something to make changes to it at the same rate as a team that may have built it from scratch.  Don't be surprised if it takes 2 or even 3 times longer. Developing software is not like following a recipe.  Either quit moving systems between teams, or consider the initial effort an investment to make things quicker next time.
  • The system has technical debt - Whisper it quietly, but sometimes software is not that well written.  Or at least, once upon a time it was, but over time things have gotten a little messy.  Perhaps the Dev team have been less than careful about the changes they made.  Perhaps they did it to get the work done quicker and make your estimates smaller last time.  Either way, now you're paying the price, and the price is more effort, which means a higher estimate.  It is the righteous and courageous Dev team that says "no, we're fixing this now, not later".  If my builder didn't do that, I'd probably be writing to Watchdog.
  • You think you know something the team doesn't - And so we come to the most common cause for "that estimate is too high!".  It's just a one-liner, right? 
  • You actually do know something the team doesn't - If you do, congratulations!  Share it with the Dev team.  Share your experience or your knowledge, or connect them with other people who have it. The Dev team will almost certainly be thankful for the input. However, just telling them what you know doesn't necessarily mean that right now they can do it as quick as they will be able to once they've got experience for themselves.
  • The team know something you don't - Yes, it's a one-liner.  At least, it is once we've refactored the code into the right shape.  And written the unit test. And functional test. And integration test.  And tested it across all browsers. And deployed it.  And listened to your feedback about how it's slightly the wrong shade of blue and needs to blink. And attended the meeting with the customer to explain to them how to use the new feature. 
There is a third queue at my desk.  It's the people complaining "the Dev team have done the work far quicker than they estimated!".  It's currently empty.  Feel free to join it sometime.

Sunday 7 April 2013

MD5 tests

Consider this test:
   @Test
   public void testMapRowWhenDocIdNotInSolr() throws Exception {
      idToScore.put(DOC_ID101, 0l);
      context.checking(new Expectations(){{
         oneOf(rs).getString(1);will(returnValue(DOC_ID101));
         oneOf(rs).getLong(2);will(returnValue(DOC_SCORE));  

         oneOf(solr).query(with(solrQueryHolder));
         will(returnValue(solrResponse));
         
         oneOf(solrResponse).getResults();will(returnValue(new SolrDocumentList()));
         
         never(solr).getBinder();will(returnValue(new DocumentObjectBinder()));
         never(solr).addBean(with(baseDocumentHolder));
         
      }});
      unit.mapRow(rs, 101);
      assertEquals("id:docId101",solrQueryHolder.getFirstHeldObject().getFilterQueries()[0]);
      assertEquals(new Integer(1),solrQueryHolder.getFirstHeldObject().getRows());
      assertEquals("*",solrQueryHolder.getFirstHeldObject().getQuery());      
   }
And what exactly is this testing? Alas, the name of the test doesn't really tell us - it tells us what's going to be called (unit.mapRow), and what the pre-conditions are (the docId is not in Solr), but doesn't give any clue as to what the expected outcome is. That's a first smell of a bad test. Nor does it get any better. The test proceeds to make a bunch of mock expectations about what's going to get called, but none of this helps us understand what is really expected to happen.  

The problem here is that I want to refactor this class. I want to keep the same behaviour, but I don't want it in mapRow(). This is unfortunate, because all the tests in this project look like this, and if I move functionality elsewhere I don't have any tests at a functional level that will show that I'm still achieving the same end goal.  

The colloquial name for these tests in our team is an "MD5 test". That is, the test just tests that the method is exactly what I've written. I may as well test the MD5 hash of the method body. 

Mocking frameworks make these kind of tests very easy to accidentally introduce.  The unfortunate coder will stray into the wilderness of mocking up every single method call, making sure to examine and verify every method signature along the way.  The alert coder will consider what the intent of the mocking is.  In some cases, verifying a specific method call is what you're intending to do, but I'd suggest a majority of the time mocking is simply a way to return fake values, in which case you probably want a stub.

Tuesday 12 February 2013

On the Joy Of Less Code

There's only one thing I really enjoy more than coding, and that's not coding. Or rather, anti-coding.  Ctrl + D is my favourite key combination.  Watching lines disappear into bit-history fills me with a sense of excitement and cosiness that far exceeds the dread and fear with which I create them.

It's pretty simple.  Every time I delete a line, there are a handful fewer troublesome bytecodes in the world that can cause a bug, and that can surely only be a good thing.  There are, of course, perils, but I'm all about high-octane thrills.  I'm not talking about mere refactoring here - taking 3 lines of code and coming up with the sort of one-liner that über-geeks love to use for oneupmanship.  I'm talking taking a good look at your code and thinking about what really matters.  Is this code actually doing anything useful? 

There are many reasons to be deleting code.  Perhaps you found a better way of doing things and have refactored.  Perhaps requirements have changed and some functionality is no longer needed.  Perhaps you just went too far in the first place and made up a requirement in your head.  Occasionally, you have the headbanging moment when the compiler forces you to add code against your will to deal with a situation that will never occur.

But, remember the rules:
  1. Delete, don't comment - Commenting out unused code is a bit like putting dog poo in a bag and then leaving the bag.  If you're using version control (and if you're not, you're on your own), then all this history is freely available to you.  With modern DVCS systems like git, it's a no-brainer to start a repo anywhere you like, and to search back through history if you ever want to dig out that code again.
  2. YAGNI - it's far too tempting to think "but this might come in handy one day...".  If it's not being used now, get rid of it.  Again, if you want it in the future, it's safe and warm in the repo.  But not today.
  3. Delete mercilessly and thoroughly - Don't just remove the obvious chunk of code.  Inevitably there will be tests that call that code, configuration parameters, text strings - all sorts of other cruft that will be rendered redundant when you delete that code.  If it's not used right here, right now, delete it.
I'm going to stick my neck out and suggest that most codebases could probably lose 10% of bug-generating, coder-baffling, readability-stifling code, and still do exactly the same job.  So, grab your Ctrl, grab your D, and get deleting.  Watch your test coverage stats soar! 

Monday 28 November 2011

Insert Post Here

Insert shame-faced regret at not updating blog more here

Friday 12 November 2010

Please release me

Ship it, ship it, ship it! Spiff v0.1.0 (yes, I'm being cautious) out now, you can grab the jar from https://github.com/revbingo/SPIFF/downloads.

Tuesday 21 September 2010

Spiff: The competition

In the intervening years between the first inception of Spiff (yes, I'm going camel case, upper case everywhere just won't do) and it's subsequent revival (and re-revival), a couple of competitors have appeared in the same space.

Closest in spirit to Spiff is Preon. It even states it's aim "to be to binary encoded data what Hibernate is to relational databases, and JAXB to XML", which pretty much sums up what I want with Spiff. Where Preon differs is in it's extensive use of annotations to do what Spiff does in it's format definition file (which Spiff calls an .adf file - Arbitrary Data Format). Preon will examine your classes and derive the data format from the order and types of annotated fields in the class. It also uses annotations to derive looping and conditional logic.

I'll admit that I've not used Preon - partly through fear of polluting my ideas about what Spiff could and should do, and partly in case I decided it was better than Spiff and just decided to call the whole thing off. So any discussion of it's merits and drawbacks are truly superficial. My general impression is that it's reliance on annotations are a little hairy - when you're expressing logic in annotations, things look a little awkward. Spiff trades off compactness (having everything described in the code) for readability and also portability. The event dispatching and class binding mechanism means that one .adf file can be used to populate classes of any shape without needing to respecify the file format. This also highlights the fact that it looks like Preon expects the classes to fully describe the file format, which is rarely what you want in the code. One of the use cases that led to me starting to write Spiff was wanting to get little pieces of the data without having to worry about the rest of the file format. And thus were .jump and .skip begat.

On the other side, the Google lads are also in the frame with protobuf. Protobuf is interesting in that it uses something analogous to the .adf file to describe the format. Where it differs is that it will generate classes for you to serialize and deserialize the format. That's something that Spiff might be capable of one day, but I like the idea of being able to write arbitrary POJOs and map the data onto them, rather than having objects in my code whose sole purpose is as marshallers. Also, it's largely oriented towards message-passing, that is, describing a message that will be passed between two systems, such as in RPC, where the user is in control of both ends of the transaction. To that end, the .proto definitions are reliant on using the underlying protobuf grammar for the message, for instance to recognise repeated blocks of data, and don't have some of the flexibility to express more complicated relationships between parts of the file.

I see a couple of strong points in Spiff from this. Portability of .adf files is possibly the biggest. Once someone has defined an .adf for, say, a .bmp file, or an ItunesDB file, anyone else can take that and use it to bind all or part of that data to their own classes. The other is flexibility, in hopefully being able to express all the things that can make binary file formats tricky to work with. I guess first step is to have a working product...