Monthly Archives: March 2013

A behaviour driven testing framework for batch processing systems

Recently I’ve been working on a testing framework to support testing of batch systems such as data warehouses.

The framework is called ‘posh-gwen‘ due to the three behaviour driven methods Given, When, and thEN. The first version is on github at: https://github.com/jsnape/posh-gwen. Comments, suggestions and pull requests are welcome.

So why should you care about using this framework?

It is difficult to test batch systems using modern test frameworks such as Specflow or FitNesse because of the simple rule that good tests should be isolated from one another. All these frameworks run tests in sequence:

  • do something
  • check something
  • clean up
  • move on to the next test

For this to be successful each test has to run very fast. Most batch processing systems are optimised for bulk processing of data. They may take tens of seconds to run end to end even with a single row of data so running hundreds of tests independently can take hours.

This framework is designed to break the rule of sequential test execution. All tests are run in parallel by phase.

The best way to test batch processing is for a known input data to contain many test cases. The batch is run loading all data at once. Finally a number of queries are executed against the resulting system. So for example a data warehouse might load a number of source files using an ETL framework such as SQL Server Integration Services. Once loaded the data warehouse can be queried to check that expected values exist in the final system.

It is still important to make sure that each test is isolated from others or else changes in one might cause a number of others to fail or become invalid.

We can do this for batch processing by data isolation – that is to carve up data domains in a way that only a single test uses data from that domain. Then verification of the results force the query to execute against that test specific sub-domain.

There are a number of suitable domains to use but any with high cardinality are best:

  • Dates – each day is a single test (or blocks of days, weeks, years etc. for those tests that need to span days).
  • Transaction identifiers – use a map of IDs to test cases or in the case of strings prefix the transaction id with the test case number.
  • Business keys – for entities such as customer or product there is usually an ID field used as the business key; use the same methods as transaction identifiers.
  • Custom attributes – if none of the above will work then you might consider adding an extra attribute to the source data which is passed through the batch system. Obviously this is not a preferred solution single you will have to change your system.
  • Combinations of the above – sometimes depending on where you need to validate you might need multiple solutions.

Go try it out and let me know how it goes. I plan on adding more features over the coming months.

Classes of stakeholder

bournemouthlibrarySo this is the second attempt at writing this post. The first was very complete but read like a textbook and it makes more sense to read a real book – I recommend the stakeholder section in this book: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives.

All stakeholders have requirements, needs and interests which collectively I call “concerns”. I even track them in the same work item tracking system used for requirements since it means they never get lost and you can ensure traceability from concerns to requirements and architectural decisions.

The classes can be broadly split into three groups. The first being “receivers”:

  • Acquirers
  • Assessors
  • Users
  • Administrators
  • Support staff

Acquirers may be wanting your software for a number of reasons. They might not actually use it so their concerns are likely to be related to cost, need and satisfying other concerns indirectly, for example purchasing reporting software because a regulator has demanded accurate reporting of patient outcomes with possible penalties for failure.

In the mental health arena there are many additional “assessor” agencies and regulators such as the Care Quality Commission, Monitor, professional bodies such as the General Medical Council and watchdogs like Healthwatch. Fortunately they are mainly concerned with the quality of healthcare, patient outcomes and overall costs; less so about software.

Administrators and support staff want to spend as little time, money and effort as possible with your solution so anything that eases deployment, automatically heals or diagnoses issues and keeps the software running will help. Its worth looking further though – what if an administrator has a review objective of reducing disk costs over the year and you turn up asking for terabytes of clustered storage?

The second group of stakeholders are the “producers”:

  • Developers
  • Testers
  • Maintainers
  • Suppliers

Their concerns should be easy to list and meet since they want the same things as you. In this sample project the only producer is me and anyone who supplies me with coffee.

The remaining category are “communicators”. The book defines them as those who “…explain the system to other stakeholders…” but I think is should also include anyone who will discuss, promote, detract, educate, deny, network, rally and gossip about your project and are not in either of the other two groups. They may only by interested in your project for their own political reasons.

Do you have stakeholders from all these categories? If so look at things from each stakeholder’s angle and try to imagine what their needs are. Then go talk to them and confirm it.

Mental health stakeholder analysis – power and interest

So its great that we know a little about how mental health services work but as an architect we need to cover some more ground before the project can be considered up and running. This post is about the slightly dirty subject of stakeholders and politics.

Why do this? Well getting the politics right can mean the difference between a success and failure. All projects have stakeholders with specific concerns. Some can be ignored but in general it is the architect’s job satisfy those concerns. Some stakeholders  have more power than others; they will also have varying levels of interest. The diagram below shows the best approach for each of the categories.

  • StakeholderEngagement Low Interest – Low Power: the easiest category to deal with since they are not too concerned with the project and don’t have the ability to derail you. Just monitor them for feedback  and other  information.
  • High Interest – Low Power: this class of stakeholders want to know what is happening and still don’t have the ability to disrupt the project. The best tactic is to ensure they are well informed.
  • Low Interest – High Power: these stakeholders can be the most difficult to deal with since they have the power to influence not only your project but many of the other stakeholders. The main issue being that they are not that interested so it is difficult to have meaningful discussions and  negotiations. The primary tactic is to ensure their concerns are met with as little fuss as possible.
  • High Interest – High Power: not only do they have the power to effect change but are interested in the outcomes. Personally I find this group a lot of fun since they can contribute to some very interactive sessions. The best tactic is to engage them which shouldn’t be too difficult given their level of interest.

In the next post I’ll cover classes of stakeholder and why it some of the non-obvious ones might be the most important.

Mental Healthcare Workflow

As I mentioned in my last post I wanted to setup the scenario I’m going to use for my examples. My lovely wife is a child psychiatrist so I have learned a fair amount about the way psychiatric services work in the UK. Lets dive right in.

There are four main workflow steps to all patient care:

  1. Referral – prospective patients are referred to the service with behaviours that indicate a mental health problem.
  2. Assessment – a file is opened for the patient and they are assessed. If required, tests are performed, then where appropriate a diagnosis is given and the patient is referred for treatment or discharged.
  3. Treatment – depending on the diagnosis, one or more treatment approaches may be required. Each approach has a outcome: either further assessment and treatment or discharge.
  4. Discharge – the patient is released from mental health services.

There are two queues in the workflow:

  1. Assessment queue – those that have been referred but are waiting assessment. There is a government target of 4 weeks for this queue.
  2. Treatment queue – those that have a diagnosis requiring treatment. There is a desirable target of 6 weeks for this queue for the patient to begin their first treatment.

Some additional notes:

  • Whilst a patient may be seen by many professionals during treatment there is usually a single mental health professional that oversees the overall care.
  • There are multiple levels of discharge – each treatment approach and then final discharge from the service by the lead professional.
  • Diagnoses in the UK are made according to the International Classification of Diseases (WHO) which covers all diseases and includes mental health disorders. Other countries such as the USA diagnose using the Diagnostic and Statistical Manual of Mental Disorders.
  • Services are divided into tiers from Tier 1 (GP treatment) to Tier 4 (intensive, possibly in-patient, treatment).
  • There are other business processes that don’t contribute directly to the value stream but are essential to the overall efficiency of the service and quality of patient care such as treatment reviews.
  • Treatments are conducted over a number of sessions. Even if the treatment is medication there are still sessions required to monitor and evaluate the drug effects and side-effects.

I’m sure this isn’t a complete picture but its enough to make a start. First we need to look at some requirements; then review the stakeholders and begin to outline the architecture.

Agile Business Intelligence Architecture

I used to regularly write a blog over at http://jamessnape.me.uk based on my experiences with a number of topics at Microsoft. Since moving to BNP Paribas I haven’t had much to say publically beyond posting a few photos and I’d like to change that.

Over the past couple of years I’ve been practising agile business intelligence in my role as a solution architect. My team work on user stories from a prioritised backlog in two week iterations, practise test driven development, continuous integration and are beginning to look at acceptance test driven development.

Even with all these techniques it is still very difficult to be truly agile with business intelligence solutions –interrelated requirements, complex database schemas, poor agile tool support and the skill set of most business intelligence developers are all forces trying to turn your project back towards the waterfall dark side.

All the time there is an ever louder call from the business for instant answers to their business questions.

So I’ve begun  to wonder, how agile can you get? Can you create traditional and modern business intelligence solutions using the same agile processes that normal software teams use? Can solution architecture be lean and risk driven?

I’m influenced by quite a varied set of development jobs – broadcast and media at Sony, telecoms and call centres at Exony, all sorts at Microsoft and now banking. I’ve learned that clean code, good domain modelling and testing is the only way to be successful.

Obviously I can’t talk about my day job as I’m bound by employee confidentiality but I do want  to try out some ideas in a public arena where I can get feedback. I’m going to make up a set of requirements for a completely different industry to banking – healthcare; which I’ll setup in a future article.

So comments always welcome as are suggestions for future posts and I look forward to publishing some useful content.