Oh look - I broke it: June 2008

Just a quick entry, because I’ve been on vacation for the past week, and my head is mainly full of swimming pools, sun, and too much rosé wine.

I’ve long been a fanatic for monitoring as many interfaces as possible when testing, and over the past year I’ve worked quite closely with our operations team and our internal users, which has just confirmed my fanaticism.

Introducing new testers to the project I’ve been working on, I give them a minimum of three locations to check for each test they execute: the GUI, the database, and the server log. What I absolutely don’t want happening is a failure in production to which we the testers can only answer “it worked on our test servers”. So the advantages to monitoring the database and the server log are actually twofold: 1) you are much more likely to catch intermittent problems with persistence, strange error messages, or buffer overflows, and 2) when something does go wrong in production, you already know what “normal” usage looks like.

I’ll talk more about database testing some other time, but here are some of the things I have tested for in the server log (with log4j):

• Errors are flagged with ERROR, and warnings are flagged with WARN

• Nothing is flagged as ERROR when it is not an error, or you know about every single exception to this (one of the third party components we use always writes to ERROR, for instance, so I pass this information on to the operations team)

• Keeping log levels at INFO, it is possible to reconstruct what a user has been doing

• Operations that execute continuously are logged as DEBUG, or can be filtered out of the log another way (and that the production servers are configured to filter it out)

• Passwords are not logged in plain text, but user names are (this depends a lot on context, privacy laws may apply, or you may have so many users that logging system log-ins will cost more in disk space than any possible worth)

• Errors either define the exact problem, or else point you to more information. “The file cannot be read” isn’t good enough if the file name isn’t printed out also with an ERROR flag. On the other hand, “SQL3096N, varchar(50), USER_ADDRESS” gives the knowledgeable debugger exactly enough information to pinpoint the error.

There’s more to it than this, I can spin out several scenarios to test that logging is adequate, but from the above I hope that a couple of things become clear: the end user is not necessarily the only user, and the users of the server log have radically different definitions of what is “user friendly”.

I slink back to this blog, looking guilty. I hadn't forgotten about it, I hadn't abandoned it, and it isn't even that I didn't have anything to write about. I've been discovering a whole pile of interesting things over the past year, fodder for hundreds of blog entries, and I think that the very learning process I went through would have made interesting reading, but the plain truth is that I was so busy learning and working that I literally did not have any time for writing. Or for a life at all, really, although somehow I squeezed in the time to get married. (Two days before my wedding, I ran my last test of the day at 1.20am. Just to give some perspective.)

Now, I'm about to change jobs, to a company that promises me time outside of work too, and I'm going to go out on a limb and promise to blog here once a week. Even if it is only to talk about what I am doing on my holidays. I write slowly, and form my ideas slowly, testing them as I go, changing my mind, getting worried about how what I say will be interpreted, even when I am only talking about what I'm doing on my holidays, so my apologies in advance if I have over-committed myself.

So, what have I been doing? As a clue to what I will be talking about.

I have been dragged kicking and screaming into the Agile Development world, and though I am not passionate about it I can now admit that it works, and that testing has a place in it. In fact, given that the initial writings of the Agile gurus are so wrong-headed about testing, it's something that I can contribute to, can improve. What you won't hear about is how it has solved all my problems, how it has solved all the business's problems, and how the software we now deliver is bug free and all tests are fully automated.

I have had my first positive experiences with automating more than just the smoke and load tests. All those negative past experiences have stood me in good stead, and the project that I have been working on had some areas that were ideally suited for automation. The solution that I will leave behind me does what it is supposed to do, it isn't over-ambitious, but what I'm most interested in is that it meets the needs of more than just the testers, and those additional stakeholders have meant that the system will keep running. What you won't hear about is how the testers now have nothing to do because automation has solved all our problems.

I have been interviewing, hiring, and mentoring a lot of junior and not so junior testers. This has been a really good experience for me, I've moved from only hiring people based on their technical skills, people like me, to hiring testers who will give a different perspective, who I will argue with until we reach a compromise. A few weeks back, I had an epiphany about someone who I had done nothing but argue with, both of us always seeming to be heading in different directions. I started asking him different questions, deeper questions, until we found what exactly the source of our disagreement was, and then we together agreed a solution that would meet both his needs and my needs. It was a wonderful moment, and since then we've been working much better together. What you won't hear about is how my epiphany has solved all my problems, and will solve yours too if you just ask the same questions.

I have been introduced to a lot of new tools and methods, particularly for manual testing, and not a one of them will solve all my problems. More tools is good, new tools are also good, but sometimes, my old-fashioned Casio stopwatch is precisely the tool that I need for a particular test. People look at me strangely as I sit there hitting the lap button, then they borrow the stopwatch, and time how long they can hold their breath. A few days later, they are back again, borrowing the stopwatch for a test they want to run. I call that a success.

I've been thinking a lot about test data, creating it and using it, and been amazed at how little discussion there is about it. The industry that I work in has highly standardised data formats, and I need a lot of data, with different requirements for each test - sometimes one field should be restricted to just a few values, usually it should be random between a particular range, one field should be left out in this test, have a very long value in another test, and in addition to these requirements, all the other fields should contain values and if I'm running a very narrowly focused test those values should not affect the result. I usually use Perl for generating all this, sometimes Excel (although Excel has a nasty bug as far as CSV formatted data is concerned), but how come nobody talks about what they do?

And then I've been doing a lot of other things, trying to see how I can do things better, make things better, not just in testing but for the product as a whole, for my colleagues. Looking for solutions that are better than putting people in a meeting room and hoping that they sort the problem out, better than fixating on a magic bean and complaining until it is bought and found to be a dud. You know how, if you've been in a really fast car or train, when you get out of it you kind of stagger as your velocity returns to normal? That's pretty much how my life has been. With normal velocity, I'm going to talk about what I've learned.

Oh look - I broke it

20 June 2008

Testing your server log

12 June 2008