Category Archives: Sample Solution

Thinking functionally about surrogate key mapping

I’m currently trying to learn F# because I’m keen to learn new programming styles as well as languages. It turns out that many of the concepts we C# programmers know and love such as Linq (monads), generics and async workflows originated in either F# or other functional languages. Thinking ‘functionally’ is a great skill to have too. How does this apply to surrogate key mapping? Well to borrow a notation from F# we are looking for a function like this:

string –> int

That is, a function that takes a string (the business key) and returns an integer (the surrogate key). Surrogate key lookup is a perfect fit for the functional view where “functions have no side effects”. Pass the same string to our lookup function any number of times and it should return the same integer value. The poorly performing version of this function might run off to the database every call and retrieve the value but there is a familiar functional technique called Memoization that can help. C# programmers might call this technique “store the values in a hashtable and only call the database if the value is missing”. A few other optimisations are necessary. Firstly, memoization will only cache the result of a single call so if we have a few hundred thousand dimension members in the database it will still take a lot of calls to populate the cache. Secondly, my lookup function doesn’t really care about the mechanics for the real database call so it would be nice if we could abstract that away. Finally, because I intend this class to be used a part of a multithreaded pipeline it needs to make sure that the internal data structures are protected. Piecing these requirements together we can start to flesh out the code. The main map function as we mentioned takes a string and returns an int:

public int Map(string businessKey)

Since we want to prime the cache with a set of values and abstract the real lookup functionality the best place to configure this is in the constructor:

public DimensionMapper(IDictionary<string, int>initialCache, Func<string, int> lookup)

Assuming the constructor just saves these parameters for later we can create a first cut version of the Map function:

public int Map(string businessKey)
    int surrogateKey;

    if (, out surrogateKey))
        return surrogateKey;

    surrogateKey = this.lookup(businessKey);, surrogateKey);

    return surrogateKey;

This works but it isn’t thread safe. For that we need a ReaderWriterLockSlim since only writes need to be synchronised. If you look at the code above there are two parts to it – the first few lines check the cache and return a value if it exists (the majority path); the last three lines are concerned with calling the real lookup function and populating the cache with the result when it doesn’t exist. Splitting on this boundary allows us to wrap the first part in a read lock and the second in a write lock – turning the write part into a separate function is a little cleaner:

public int Map(string businessKey)

        int surrogateKey;

        if (, out surrogateKey))
            return surrogateKey;

        return this.Lookup(businessKey);

private int Lookup(string businessKey)

        int surrogateKey = this.lookup(businessKey);, surrogateKey);

        return surrogateKey;

So we have most of the class written now and I haven’t discussed anything to do with databases or how we get a real surrogate key because…well its not relevant here since a function is passed to the constructor. I like this ability to concentrate on just a single algorithm and not worry about the wider solution. From what I’ve learned so far F# is better as this than C#.

For the full class definition see the full source file in context and associated unit tests.

Mental Health Referrals – First Feature Acceptance Test

I’ve been working on this for a few weeks now, half an hour at a time in the evenings and I can safely say it’s pretty hard to maintain a train of thought in thirty minute intervals. However a bare minimum implementation is complete and ready to discuss.

We start with an acceptance test:

The first part of the feature describes the user story and the second part tells us that when we load three patient referrals then the total count should be 3 with 1 on the 1st January.

I’m using SpecFlow for acceptance tests since it is very easy to define tables and there are some useful binding utilities as we will see. After entering the test we can immediately compile the application and run the tests without writing anything else. The test will obviously fail since we haven’t written any code. In fact the acceptance test will stay broken for some time as we write code and unit tests. When it passes we know the feature is done.

So thinking about this functionally we effectively want to write a function that transforms an enumerable of source referral records into an enumerable of referral facts; then pipe this iterator into a SqlBulkCopy instance. Effectively this code needs to work:

referralrepository.BulkCopy(referrals.Select(x => mapper.Map(x)));

This is a Linq transform with a mapping function applied to each item in the source list. In the next few posts I’m going to break it into bite size chunks to implement.

Mental Health Project Automated Builds

Last time we were looking at the mental health project I was discussing the dimensional model. I think its time to have a crack at some code now. But this first session is just about setting up my project.

There are some key things every agile project should do:

  • Automated build with acceptance and unit tests
  • Automated code analysis
  • Automated deployment with integration tests

Note everything is automated – it has to be repeatable and not need human intervention or it won’t get done. I’m a big fan of continuous integration and continuous deployment so I’m going to use Team City as a build service since its free for a single agent.

Team City is a very configurable and powerful tool but I want to make sure that I can build and deploy from my local command line in exactly the same way that the Team City agent will since it makes debugging issues easier and allows developers to check the build works before committing.

There are lots of build script tools around such as FinalBuilder but I prefer MSBuild since its readily available and a text format. Visual Studio uses MSBuild internally but we are not going to change project files; we are going to create a higher level script to tie everything together. Since this is a simple start it’s all going in one build file.

The build script is split into 2 main parts. At the top are property and item definitions – this is the build metadata controlling what and how the build will happen. Below that are Imports and Targets which deal with the mechanics of building. This split makes it easy to add new projects and settings without having to change your overall build script.

There are four main targets listed which are Clean, SourceAnalysis, Compile and Test. The last three of which make a build. It’s fairly self-explanatory but if you don’t know MSBuild script imagine anything in a $() is a single value or variable, @() is a list of items. Each target has a list of tasks which are executed in order to complete the target.

So, this script is very simple; it just runs StyleCop over a set of source files, builds a Visual Studio solution and runs Xunit against a set of assemblies. Not much but it gives us a single command line action to build and test the solution as we add features:

C:\ > msbuild draco.proj

This is then setup as a single step in TeamCity. Every check-in causes the build to run and tests to execute.

The complete set of source for this project is available at

Mental Health Dimensional Model

For this week’s post I want to continue the sample solution. Even though I’m going to be as agile as possible we still need to have a rough idea of a roadmap and the best way to do that is with a dimensional model.

Each business process we want to model is represented as a fact on columns. They are all to be stored at the transactional grain except possibly admissions. The conformed dimensions are listed on rows with the facts they are related to.

Referral Assessment Treatment Discharge Complaint Incident Admission


Health Professional



Service (Tier)


Treatment Outcome


It is interesting to note that this is a very patient focused model since that dimension is related to every fact. There are some unanswered questions within the model though:

  • How do we represent treatment outcomes? Is there a standard method? Can this be represented in a single dimension?
  • What grain are admissions? Given the goal of calculating ‘bed days‘ we might need to model them as events in progress.

I think we have enough to make a start and I don’t think we will deliver faster if we stop to resolve these issues first. Initially I’m going to concentrate on referrals, assessments and discharges since the number of patients in the system is one of the most useful metrics to monitor.

Risk Driven Architecture

Architecture shares something with testing in that resources are limited so effort is best directed toward maximising risk reduction.

The amount of ‘architecture’ in a solution should also reflect the risk associated with a project. For example the sample solution I’m creating carries almost no risk apart from my pride so a light touch is warranted.

However in a real solution what are the major risks? Where should we concentrate our efforts? Below are some of the common risks associated with business intelligence projects:

  • Unclean data e.g. Key pathologies – see later post.
  • Unreliable sources – how are failed connections, retries and duplicates handled?
  • Data volumes – what are the expected peak volumes? What will happen if these peaks are exceeded?
  • Latency requirements – can data be supplied to users fast enough? What is the business cost of delays?
  • Testability – how testable is the solution? How long can you keep going before technical debt catches up with you?
  • History and archive – in my experience most source systems don’t keep a full fidelity history so it ends up being the data warehouse’s responsibility.
  • Staying agile – unfortunately many problems with business intelligence solutions are due to an inability to change things; once users, reports, spread-sheets, and ETL code depend on data warehouse schemas or cube designs the whole thing becomes very difficult to change.
  • Disaster recovery – what happens when your server dies? Network fails? Data centre fails?
  • Scalability – what are your expected user loads? what happens if they are exceeded? are there any events that could cause your user load to be drastically exceeded?
  • Usability – how will your users interact with the system? how much training will they need? what if they need help? how can you make the solution easier to use?

“Agile architecture is the art of constraining a solution in order to optimise competing stakeholder concerns whilst maximising the number of options for future design decisions.” – James Snape (just now)

So to be agile lets just concentrate on the risks and try to not be too prescriptive over the final solution. Everything else can generally be quickly changed if it doesn’t work out.

Mental Health Functional Architecture

Context is everything with architecture. I’ve often had conversations which started with the phrase “how come you didn’t…” – once the context is explained the decision is usually obvious.

The context for this architecture is purely my own since there are no real customers. I  want to satisfy the concerns and requirements as simply as possible but leave room to swap out parts of the architecture to investigate new approaches and technologies.

The diagram below is a pretty standard set of data warehouse components. If you are a traditional Microsoft guy then, from left to right, the components would be Integration Services, SQL Server, Integration Services again, SQL Server, Analysis Services and Excel or Reporting Services respectively. Alternatively you might be using Hadoop as the source mirror and Tableau for the data mart and consume components or some other combination.


I always try to set firewalls within an architecture so that any problems can be isolated and replaced without too much disruption. In this instance I’m going to use those firewalls so that I can try out new ideas and technologies.

The main synchronisation points are the three data stores – Source Mirror, Data Warehouse and Data Mart. (In this instance I am using the term data mart to mean a prepared, subject area specific store optimised for analytical/aggregate queries.)

The responsibilities for each stage are as follows:

  • Acquire > Source Mirror: receive data from source, ensure minimal load on source via high watermarks or another strategy, archive data for historical accuracy.  A key point here is that the source mirror has the same metadata as the source itself. No joins or transforms on the way. Sometimes simple data conversions are useful but less is more.
  • Load > Data Warehouse: apply business logic and transform the source data into dimensional model, update data warehouse.
  • Summarise > Data Mart: aggregate generation or cube processing.
  • Consume & Act: covers any output tool such as Reporting Services, Excel, Tableau Dashboard, R, F# etc.

I consider the SQL/relational store to be “the data warehouse” and not Analysis Services which is better suited to a data mart role.

It’s quite hard to be succinct when talking about architecture and this post is quite lengthy so I’ll split it and talk about risk driven architecture in the next post.

Initial Mental Health Requirements

Torbay Hospital In-patient wards and ...

Torbay Hospital In-patient wards and treatment centres for mental health patients on the west side of the complex. (Photo credit: Wikipedia)

The previous couple of posts in this category haven’t exactly been exciting but they are important for context. Now things can concentrate more on the solution we are going to create.

There are two main classes of business intelligence required – clinical and operational. The most important user stories are listed in italic text below.

Clinical Requirements

Clinicians want to know how effective treatments are; patient outcomes; diagnosis statistics and critical incident analysis:

As a doctor I want to see treatment counts by patient and outcome so that I can determine the most effective treatments.

As a doctor I want to see critical incident counts by patient and mental health professional.

Operational Requirements

Operational business intelligence is primarily concerned with service costs, efficiency and capacity planning:

As an operational manager I want to see counts of assessments, treatments and discharges so I can plan capacity and monitor the number of patients in the system.

As an operational manager I want monitor prescribing costs to budget effectively and look for unusual prescribing patterns.

As an operational manager I want to see the number of bed days available and used so that I can monitor capacity and make sure suitable out of area options are available if needed.

As an operational manager I want to see complaints by patient and mental health professional so I can make sure the service has a good customer focus.

Other Requirements

Finally, there are also IT requirements which must be satisfied but since they are not value-add for patients and doctors I’ll look into them later.

This list is not exhaustive but the plan is to be agile – list the most important requirements, stack rank them and work down the list in iterations and re-plan often. Requirements will change, new ones will become apparent and some may even disappear before implementation starts. We will embrace this and not worry about the future too much.

Mental Health Stakeholders

mental-health-orgchartI want to briefly wrap up the section on stakeholders specific to mental health services because this is where we will get our requirements from.

At the top of the organisation is the chief executive and the trust board. Below her are the IT director, medical director, nursing director and three operational directors who are responsible for Child and Adolescent Mental Health Services (CAMHS), Adult Mental Health Services and Geriatric Mental Health Services respectively.

The IT director is responsible for IT staff such as support staff and system administrators; IT systems and hardware; and IT projects.

The medical director manages all the doctors in the organisation and the nursing director similarly the nursing staff.

The operational directors focus on their individual services with a mix of staff including psychologists, counsellors, therapists, social workers and administration staff.

Collecting all the information we have so far with an estimate on the individual’s needs:

Who Interest/Power Class Concerns
Chief Exec Low/High Acquirer Low costs
IT Director High/High Acquirer/Assessor Low costs
Ease of deployment
Medical/Nursing Directors Low/High User Minimal training, or time taken away from duties
Services Directors Low/High User/Communicator Functionality
IT administrators High/Low Administrator Automated maintenance
Simple troubleshooting
Secure implementation
Zero friction installs and upgrades
User support staff Low/Low Support staff Ease of use
Training material
Team members Depends/Low Users Functionality
Ease of use

The last three are generalisations – if this were reality I would be looking for specific people since the Interest/Power level is unique to a person and not the role.

Mental health stakeholder analysis – power and interest

So its great that we know a little about how mental health services work but as an architect we need to cover some more ground before the project can be considered up and running. This post is about the slightly dirty subject of stakeholders and politics.

Why do this? Well getting the politics right can mean the difference between a success and failure. All projects have stakeholders with specific concerns. Some can be ignored but in general it is the architect’s job satisfy those concerns. Some stakeholders  have more power than others; they will also have varying levels of interest. The diagram below shows the best approach for each of the categories.

  • StakeholderEngagement Low Interest – Low Power: the easiest category to deal with since they are not too concerned with the project and don’t have the ability to derail you. Just monitor them for feedback  and other  information.
  • High Interest – Low Power: this class of stakeholders want to know what is happening and still don’t have the ability to disrupt the project. The best tactic is to ensure they are well informed.
  • Low Interest – High Power: these stakeholders can be the most difficult to deal with since they have the power to influence not only your project but many of the other stakeholders. The main issue being that they are not that interested so it is difficult to have meaningful discussions and  negotiations. The primary tactic is to ensure their concerns are met with as little fuss as possible.
  • High Interest – High Power: not only do they have the power to effect change but are interested in the outcomes. Personally I find this group a lot of fun since they can contribute to some very interactive sessions. The best tactic is to engage them which shouldn’t be too difficult given their level of interest.

In the next post I’ll cover classes of stakeholder and why it some of the non-obvious ones might be the most important.

Mental Healthcare Workflow

As I mentioned in my last post I wanted to setup the scenario I’m going to use for my examples. My lovely wife is a child psychiatrist so I have learned a fair amount about the way psychiatric services work in the UK. Lets dive right in.

There are four main workflow steps to all patient care:

  1. Referral – prospective patients are referred to the service with behaviours that indicate a mental health problem.
  2. Assessment – a file is opened for the patient and they are assessed. If required, tests are performed, then where appropriate a diagnosis is given and the patient is referred for treatment or discharged.
  3. Treatment – depending on the diagnosis, one or more treatment approaches may be required. Each approach has a outcome: either further assessment and treatment or discharge.
  4. Discharge – the patient is released from mental health services.

There are two queues in the workflow:

  1. Assessment queue – those that have been referred but are waiting assessment. There is a government target of 4 weeks for this queue.
  2. Treatment queue – those that have a diagnosis requiring treatment. There is a desirable target of 6 weeks for this queue for the patient to begin their first treatment.

Some additional notes:

  • Whilst a patient may be seen by many professionals during treatment there is usually a single mental health professional that oversees the overall care.
  • There are multiple levels of discharge – each treatment approach and then final discharge from the service by the lead professional.
  • Diagnoses in the UK are made according to the International Classification of Diseases (WHO) which covers all diseases and includes mental health disorders. Other countries such as the USA diagnose using the Diagnostic and Statistical Manual of Mental Disorders.
  • Services are divided into tiers from Tier 1 (GP treatment) to Tier 4 (intensive, possibly in-patient, treatment).
  • There are other business processes that don’t contribute directly to the value stream but are essential to the overall efficiency of the service and quality of patient care such as treatment reviews.
  • Treatments are conducted over a number of sessions. Even if the treatment is medication there are still sessions required to monitor and evaluate the drug effects and side-effects.

I’m sure this isn’t a complete picture but its enough to make a start. First we need to look at some requirements; then review the stakeholders and begin to outline the architecture.