Testing your server log

Just a quick entry, because I’ve been on vacation for the past week, and my head is mainly full of swimming pools, sun, and too much rosé wine.

I’ve long been a fanatic for monitoring as many interfaces as possible when testing, and over the past year I’ve worked quite closely with our operations team and our internal users, which has just confirmed my fanaticism.

Introducing new testers to the project I’ve been working on, I give them a minimum of three locations to check for each test they execute: the GUI, the database, and the server log. What I absolutely don’t want happening is a failure in production to which we the testers can only answer “it worked on our test servers”. So the advantages to monitoring the database and the server log are actually twofold: 1) you are much more likely to catch intermittent problems with persistence, strange error messages, or buffer overflows, and 2) when something does go wrong in production, you already know what “normal” usage looks like.

I’ll talk more about database testing some other time, but here are some of the things I have tested for in the server log (with log4j):

• Errors are flagged with ERROR, and warnings are flagged with WARN

• Nothing is flagged as ERROR when it is not an error, or you know about every single exception to this (one of the third party components we use always writes to ERROR, for instance, so I pass this information on to the operations team)

• Keeping log levels at INFO, it is possible to reconstruct what a user has been doing

• Operations that execute continuously are logged as DEBUG, or can be filtered out of the log another way (and that the production servers are configured to filter it out)

• Passwords are not logged in plain text, but user names are (this depends a lot on context, privacy laws may apply, or you may have so many users that logging system log-ins will cost more in disk space than any possible worth)

• Errors either define the exact problem, or else point you to more information. “The file cannot be read” isn’t good enough if the file name isn’t printed out also with an ERROR flag. On the other hand, “SQL3096N, varchar(50), USER_ADDRESS” gives the knowledgeable debugger exactly enough information to pinpoint the error.

There’s more to it than this, I can spin out several scenarios to test that logging is adequate, but from the above I hope that a couple of things become clear: the end user is not necessarily the only user, and the users of the server log have radically different definitions of what is “user friendly”.

Oh look - I broke it

20 June 2008

Testing your server log

No comments: