Mental Health Referrals – First Feature Acceptance Test

I’ve been working on this for a few weeks now, half an hour at a time in the evenings and I can safely say it’s pretty hard to maintain a train of thought in thirty minute intervals. However a bare minimum implementation is complete and ready to discuss.

We start with an acceptance test:

The first part of the feature describes the user story and the second part tells us that when we load three patient referrals then the total count should be 3 with 1 on the 1st January.

I’m using SpecFlow for acceptance tests since it is very easy to define tables and there are some useful binding utilities as we will see. After entering the test we can immediately compile the application and run the tests without writing anything else. The test will obviously fail since we haven’t written any code. In fact the acceptance test will stay broken for some time as we write code and unit tests. When it passes we know the feature is done.

So thinking about this functionally we effectively want to write a function that transforms an enumerable of source referral records into an enumerable of referral facts; then pipe this iterator into a SqlBulkCopy instance. Effectively this code needs to work:

referralrepository.BulkCopy(referrals.Select(x => mapper.Map(x)));

This is a Linq transform with a mapping function applied to each item in the source list. In the next few posts I’m going to break it into bite size chunks to implement.

Tableau European Customer Conference 2013

This week has been dominated by the Tableau Customer Conference. I was fortunate to get a ticket since it was sold out but one of our architects couldn’t go so I filled in. I’m glad I did.

It’s been a while since I got to learn about a completely new technology so it is a refreshing change to be a bit of a novice. After a number of Microsoft conferences this one felt quite different too – less geeky with a more mixed crowd. It was interesting to be able to talk with non-technical types such as data analysts, business managers and statisticians.

I mainly went to the technical sessions but a couple of the keynote sessions were really interesting. Firstly ‘Creating a culture of data at Facebook’ gave some useful ideas about creating communities and getting more staff comfortable with visualizations. It was also nice to listen to a blogger I’ve read for a while (but only just discovered worked for Facebook). The second was Prof. Hans Rosling. I’ve seen his TED talk but in person was completely different – probably because he was talking to a room full of data visualisation professionals. He had plenty of anecdotes about how his famous visualizations came about. Ellie Fields gives a good description of his talk.

So back to the day job now but with some new ideas about business intelligence and data visualization.

Mental Health Project Automated Builds

Last time we were looking at the mental health project I was discussing the dimensional model. I think its time to have a crack at some code now. But this first session is just about setting up my project.

There are some key things every agile project should do:

  • Automated build with acceptance and unit tests
  • Automated code analysis
  • Automated deployment with integration tests

Note everything is automated – it has to be repeatable and not need human intervention or it won’t get done. I’m a big fan of continuous integration and continuous deployment so I’m going to use Team City as a build service since its free for a single agent.

Team City is a very configurable and powerful tool but I want to make sure that I can build and deploy from my local command line in exactly the same way that the Team City agent will since it makes debugging issues easier and allows developers to check the build works before committing.

There are lots of build script tools around such as FinalBuilder but I prefer MSBuild since its readily available and a text format. Visual Studio uses MSBuild internally but we are not going to change project files; we are going to create a higher level script to tie everything together. Since this is a simple start it’s all going in one build file.

The build script is split into 2 main parts. At the top are property and item definitions – this is the build metadata controlling what and how the build will happen. Below that are Imports and Targets which deal with the mechanics of building. This split makes it easy to add new projects and settings without having to change your overall build script.

There are four main targets listed which are Clean, SourceAnalysis, Compile and Test. The last three of which make a build. It’s fairly self-explanatory but if you don’t know MSBuild script imagine anything in a $() is a single value or variable, @() is a list of items. Each target has a list of tasks which are executed in order to complete the target.

So, this script is very simple; it just runs StyleCop over a set of source files, builds a Visual Studio solution and runs Xunit against a set of assemblies. Not much but it gives us a single command line action to build and test the solution as we add features:

C:\ > msbuild draco.proj

This is then setup as a single step in TeamCity. Every check-in causes the build to run and tests to execute.

The complete set of source for this project is available at https://github.com/jsnape/draco.

Dimensional Models are like Tattoos

tattoo work by Keith KillingsworthSo in the comments on a recent post on Risk Driven Architecture, Jamie Thomson asked whether the problems associated with change can be mitigated by using views. I firmly believe that views can help but unfortunately not enough to save you from clients that connect directly with Analysis Services cubes.

So it got me thinking about a similar mitigation for cubes. Unfortunately nothing came to mind apart from an analogy:

Dimensional models are like tattoos – you have to live with them for a long time

Why you might ask? Well you can add to them, maybe fill in some extra colour but basically once you’ve committed to you are stuck with them because every spread sheet and report using your model will need fixing if you try to remove something. Like tattoos, you can remove them but its going to be painful and cost a lot of money.

I don’t have any tattoos (not because I don’t like them, I just can’t decide on one that I’d have to live with for so long). However I’ve heard plenty of guidance about taking your time before committing – one of the best techniques is to simply draw your new tattoo with a Sharpie and try it on for size for a while.

How does this help with dimensional models? Well the same techniques apply. Try a new model on for size, especially if you can arrange it for the new model to fade like the Sharpie as time passes which automatically limits client usage. Maybe process the cube manually for a while – your users will soon tell you if the data is useful. This fits with an agile approach too – only put measures and attributes in the cube if you need them and don’t add stuff in the hope that it will be used productively.

Mental Health Dimensional Model

For this week’s post I want to continue the sample solution. Even though I’m going to be as agile as possible we still need to have a rough idea of a roadmap and the best way to do that is with a dimensional model.

Each business process we want to model is represented as a fact on columns. They are all to be stored at the transactional grain except possibly admissions. The conformed dimensions are listed on rows with the facts they are related to.

Referral Assessment Treatment Discharge Complaint Incident Admission
Date

Diagnosis

Health Professional

Patient

Referrer

Service (Tier)

Time

Treatment Outcome

Clinic

It is interesting to note that this is a very patient focused model since that dimension is related to every fact. There are some unanswered questions within the model though:

  • How do we represent treatment outcomes? Is there a standard method? Can this be represented in a single dimension?
  • What grain are admissions? Given the goal of calculating ‘bed days‘ we might need to model them as events in progress.

I think we have enough to make a start and I don’t think we will deliver faster if we stop to resolve these issues first. Initially I’m going to concentrate on referrals, assessments and discharges since the number of patients in the system is one of the most useful metrics to monitor.

Link

Whilst researching the previous article I came across this link on Acronyms and Ubiquitous Language. It is well worth reading as everything discussed also applies to dimensional models. There is one quote that I want to reprint from the .NET Framework General Naming Conventions:

Do not use any acronyms that are not widely accepted, and then only when necessary.

Your business users should be able to point Excel (or whatever tool you are using) at a dimensional model and intuitively know what the measures, dimensions and attributes are because they describe the business your users work in. Since acronyms obfuscate meaning they don’t belong in dimensional models.

The only time I generally relax this rule is when both the following are true:

  • All business users know the meaning of the acronym
  • The expanded version is so long that it becomes unwieldy

Applying Domain Driven Design to Data Warehouses

Managing complexity graphic showing D...My commute is around two and a half hours each way so I read a lot on the train. One of the subjects I’ve recently become interested in Domain Driven Design or DDD. I’ve found it isn’t really a new topic for me but more like someone has documented many of the techniques I’ve always used.

DDD discusses data warehouses primarily as an output or reporting function within a larger application. The book Implementing Domain-Driven Design by Vaughn Vernon mentions reporting repeatedly as a by-product of DDD (particularly when used with Event Sourcing) but not directly as a possible use case.

I would agree that not all concepts can be reused in data warehouse solutions since the only interface available is often one that transfers a set of mutations (property value changes) without the accompanying reasons (it is a key idea in DDD that you need to design your model not by changes in attributes but by operations performed on entities). For example an order count has decreased and the reason is missing – was the order returned, cancelled, an error etc. So where can it be applied? Are any of the concepts useful when designing data warehouses?

Ubiquitous Language

Lets start with one of the core concepts – Ubiquitous Language is a rigorous shared language used between developers and users. It is used to make sure that conversations are accurate and productive. It should evolve as the team’s understanding of a domain changes. The ubiquitous language is what forms the domain model at the heart of a software solution.

I find this description very similar to an equivalent concept in data warehousing – the Dimensional Model. This model, and its associated dimension bus matrix, is based on real business processes and terminology. The dimensional model is the public face of the data warehouse. It needs to be precise, reflect the terms used by business users and form a common vocabulary between users and the development team. For example when browsing an Analysis Services cube in Excel, the dimensions and facts defined in the dimensional model are directly visible to end users – if they don’t automatically understand what is on-screen then the model doesn’t describe the business.

Entities and Value Objects

There are two types of object in domain driven design – entities and value objects. Value objects are immutable and identified via their attributes. For example $100 USD in one object is the interchangeable with $100 USD in another. Entities on the other had cannot be identified purely by their attributes – there must be some sort of unique identifier (in data warehouse terms this is a business key) to differentiate similar entities. For example, one John Smith may not be the same as another and need a Customer-Id to differentiate the two.

With respect to dimensional models, value objects should not be implemented as top-level dimensions but instead be added at sets of attributes to the entities which own them. For example ‘Product Colour’ is a value type (colour) and should belong in the ‘Product’ dimension. This seems obvious when written this way but happens a lot.

Entities and Aggregate Roots

DDD groups sets of closely related entities under the control of a single ‘Aggregate root’. Some entities make no sense unless the parent entity is also within context; order lines and parent orders are the typical example.

So value objects shouldn’t be dimensions and I don’t think ordinary entities should be either. True dimensions are the aggregate roots and the one thing that seals it for me is that an aggregate root (according to DDD) defines a transactional boundary – you should not update multiple aggregate roots within a single transaction; instead sagas keep your data warehouse in sync (eventually).

Finally

Domain driven design and business intelligence share a number of common concepts yet the two philosophies are rarely seen as related. I think there is a lot to be gained by applying software concepts from different viewpoints which may not ordinarily be considered.

Risk Driven Architecture

Architecture shares something with testing in that resources are limited so effort is best directed toward maximising risk reduction.

The amount of ‘architecture’ in a solution should also reflect the risk associated with a project. For example the sample solution I’m creating carries almost no risk apart from my pride so a light touch is warranted.

However in a real solution what are the major risks? Where should we concentrate our efforts? Below are some of the common risks associated with business intelligence projects:

  • Unclean data e.g. Key pathologies – see later post.
  • Unreliable sources – how are failed connections, retries and duplicates handled?
  • Data volumes – what are the expected peak volumes? What will happen if these peaks are exceeded?
  • Latency requirements – can data be supplied to users fast enough? What is the business cost of delays?
  • Testability – how testable is the solution? How long can you keep going before technical debt catches up with you?
  • History and archive – in my experience most source systems don’t keep a full fidelity history so it ends up being the data warehouse’s responsibility.
  • Staying agile – unfortunately many problems with business intelligence solutions are due to an inability to change things; once users, reports, spread-sheets, and ETL code depend on data warehouse schemas or cube designs the whole thing becomes very difficult to change.
  • Disaster recovery – what happens when your server dies? Network fails? Data centre fails?
  • Scalability – what are your expected user loads? what happens if they are exceeded? are there any events that could cause your user load to be drastically exceeded?
  • Usability – how will your users interact with the system? how much training will they need? what if they need help? how can you make the solution easier to use?

“Agile architecture is the art of constraining a solution in order to optimise competing stakeholder concerns whilst maximising the number of options for future design decisions.” – James Snape (just now)

So to be agile lets just concentrate on the risks and try to not be too prescriptive over the final solution. Everything else can generally be quickly changed if it doesn’t work out.

Mental Health Functional Architecture

Context is everything with architecture. I’ve often had conversations which started with the phrase “how come you didn’t…” – once the context is explained the decision is usually obvious.

The context for this architecture is purely my own since there are no real customers. I  want to satisfy the concerns and requirements as simply as possible but leave room to swap out parts of the architecture to investigate new approaches and technologies.

The diagram below is a pretty standard set of data warehouse components. If you are a traditional Microsoft guy then, from left to right, the components would be Integration Services, SQL Server, Integration Services again, SQL Server, Analysis Services and Excel or Reporting Services respectively. Alternatively you might be using Hadoop as the source mirror and Tableau for the data mart and consume components or some other combination.

architecture

I always try to set firewalls within an architecture so that any problems can be isolated and replaced without too much disruption. In this instance I’m going to use those firewalls so that I can try out new ideas and technologies.

The main synchronisation points are the three data stores – Source Mirror, Data Warehouse and Data Mart. (In this instance I am using the term data mart to mean a prepared, subject area specific store optimised for analytical/aggregate queries.)

The responsibilities for each stage are as follows:

  • Acquire > Source Mirror: receive data from source, ensure minimal load on source via high watermarks or another strategy, archive data for historical accuracy.  A key point here is that the source mirror has the same metadata as the source itself. No joins or transforms on the way. Sometimes simple data conversions are useful but less is more.
  • Load > Data Warehouse: apply business logic and transform the source data into dimensional model, update data warehouse.
  • Summarise > Data Mart: aggregate generation or cube processing.
  • Consume & Act: covers any output tool such as Reporting Services, Excel, Tableau Dashboard, R, F# etc.

I consider the SQL/relational store to be “the data warehouse” and not Analysis Services which is better suited to a data mart role.

It’s quite hard to be succinct when talking about architecture and this post is quite lengthy so I’ll split it and talk about risk driven architecture in the next post.

Initial Mental Health Requirements

Torbay Hospital In-patient wards and ...

Torbay Hospital In-patient wards and treatment centres for mental health patients on the west side of the complex. (Photo credit: Wikipedia)

The previous couple of posts in this category haven’t exactly been exciting but they are important for context. Now things can concentrate more on the solution we are going to create.

There are two main classes of business intelligence required – clinical and operational. The most important user stories are listed in italic text below.

Clinical Requirements

Clinicians want to know how effective treatments are; patient outcomes; diagnosis statistics and critical incident analysis:

As a doctor I want to see treatment counts by patient and outcome so that I can determine the most effective treatments.

As a doctor I want to see critical incident counts by patient and mental health professional.

Operational Requirements

Operational business intelligence is primarily concerned with service costs, efficiency and capacity planning:

As an operational manager I want to see counts of assessments, treatments and discharges so I can plan capacity and monitor the number of patients in the system.

As an operational manager I want monitor prescribing costs to budget effectively and look for unusual prescribing patterns.

As an operational manager I want to see the number of bed days available and used so that I can monitor capacity and make sure suitable out of area options are available if needed.

As an operational manager I want to see complaints by patient and mental health professional so I can make sure the service has a good customer focus.

Other Requirements

Finally, there are also IT requirements which must be satisfied but since they are not value-add for patients and doctors I’ll look into them later.

This list is not exhaustive but the plan is to be agile – list the most important requirements, stack rank them and work down the list in iterations and re-plan often. Requirements will change, new ones will become apparent and some may even disappear before implementation starts. We will embrace this and not worry about the future too much.