Better software through software architecture and devops

@jamessnape

Posts

  • Managing complexity graphic showing DDD patterns to apply from Domain Driven Design source https://commons.wikimedia.org/wiki/File:Maintaining_Model_Integrity.png

    My commute is around two and a half hours each way so I read a lot on the train. One of the subjects I’ve recently become interested in Domain Driven Design or DDD. I’ve found it isn’t really a new topic for me but more like someone has documented many of the techniques I’ve always used.

    DDD discusses data warehouses primarily as an output or reporting function within a larger application. The book Implementing Domain-Driven Design by Vaughn Vernon mentions reporting repeatedly as a by-product of DDD (particularly when used with Event Sourcing) but not directly as a possible use case.

    I would agree that not all concepts can be reused in data warehouse solutions since the only interface available is often one that transfers a set of mutations (property value changes) without the accompanying reasons (it is a key idea in DDD that you need to design your model not by changes in attributes but by operations performed on entities). For example an order count has decreased and the reason is missing – was the order returned, cancelled, an error etc. So where can it be applied? Are any of the concepts useful when designing data warehouses?

    Ubiquitous Language

    Lets start with one of the core concepts – Ubiquitous Language is a rigorous shared language used between developers and users. It is used to make sure that conversations are accurate and productive. It should evolve as the team’s understanding of a domain changes. The ubiquitous language is what forms the domain model at the heart of a software solution.

    I find this description very similar to an equivalent concept in data warehousing – the Dimensional Model. This model, and its associated dimension bus matrix, is based on real business processes and terminology. The dimensional model is the public face of the data warehouse. It needs to be precise, reflect the terms used by business users and form a common vocabulary between users and the development team. For example when browsing an Analysis Services cube in Excel, the dimensions and facts defined in the dimensional model are directly visible to end users – if they don’t automatically understand what is on-screen then the model doesn’t describe the business.

    Entities and Value Objects

    There are two types of object in domain driven design – entities and value objects. Value objects are immutable and identified via their attributes. For example $100 USD in one object is the interchangeable with $100 USD in another. Entities on the other had cannot be identified purely by their attributes – there must be some sort of unique identifier (in data warehouse terms this is a business key) to differentiate similar entities. For example, one John Smith may not be the same as another and need a Customer-Id to differentiate the two.

    With respect to dimensional models, value objects should not be implemented as top-level dimensions but instead be added at sets of attributes to the entities which own them. For example ‘Product Colour’ is a value type (colour) and should belong in the ‘Product’ dimension. This seems obvious when written this way but happens a lot.

    Entities and Aggregate Roots

    DDD groups sets of closely related entities under the control of a single ‘Aggregate root’. Some entities make no sense unless the parent entity is also within context; order lines and parent orders are the typical example.

    So value objects shouldn’t be dimensions and I don’t think ordinary entities should be either. True dimensions are the aggregate roots and the one thing that seals it for me is that an aggregate root (according to DDD) defines a transactional boundary – you should not update multiple aggregate roots within a single transaction; instead sagas keep your data warehouse in sync (eventually).

    Finally

    Domain driven design and business intelligence share a number of common concepts yet the two philosophies are rarely seen as related. I think there is a lot to be gained by applying software concepts from different viewpoints which may not ordinarily be considered.

    This entry was posted in business-intelligence  and tagged #data-warehouse #domain-driven-design  on .
    Discuss this on Twitter or LinkedIn
  • Architecture shares something with testing in that resources are limited so effort is best directed toward maximising risk reduction.

    The amount of ‘architecture’ in a solution should also reflect the risk associated with a project. For example the sample solution I’m creating carries almost no risk apart from my pride so a light touch is warranted.

    However in a real solution what are the major risks? Where should we concentrate our efforts? Below are some of the common risks associated with business intelligence projects:

    • Unclean data e.g. Key pathologies – see later post.
    • Unreliable sources – how are failed connections, retries and duplicates handled?
    • Data volumes – what are the expected peak volumes? What will happen if these peaks are exceeded?
    • Latency requirements – can data be supplied to users fast enough? What is the business cost of delays?
    • Testability – how testable is the solution? How long can you keep going before technical debt catches up with you?
    • History and archive – in my experience most source systems don’t keep a full fidelity history so it ends up being the data warehouse’s responsibility.
    • Staying agile – unfortunately many problems with business intelligence solutions are due to an inability to change things; once users, reports, spread-sheets, and ETL code depend on data warehouse schemas or cube designs the whole thing becomes very difficult to change.
    • Disaster recovery – what happens when your server dies? Network fails? Data centre fails?
    • Scalability – what are your expected user loads? what happens if they are exceeded? are there any events that could cause your user load to be drastically exceeded?
    • Usability – how will your users interact with the system? how much training will they need? what if they need help? how can you make the solution easier to use?

    “Agile architecture is the art of constraining a solution in order to optimise competing stakeholder concerns whilst maximising the number of options for future design decisions.” – James Snape (just now)

    So to be agile lets just concentrate on the risks and try to not be too prescriptive over the final solution. Everything else can generally be quickly changed if it doesn’t work out.

    This entry was posted in sample-solution  and tagged #architecture #business-intelligence #data-warehouse #risk #stakeholder-concerns  on .
    Discuss this on Twitter or LinkedIn
  • Context is everything with architecture. I’ve often had conversations which started with the phrase “how come you didn’t…” – once the context is explained the decision is usually obvious.

    The context for this architecture is purely my own since there are no real customers. I  want to satisfy the concerns and requirements as simply as possible but leave room to swap out parts of the architecture to investigate new approaches and technologies.

    The diagram below is a pretty standard set of data warehouse components. If you are a traditional Microsoft guy then, from left to right, the components would be Integration Services, SQL Server, Integration Services again, SQL Server, Analysis Services and Excel or Reporting Services respectively. Alternatively you might be using Hadoop as the source mirror and Tableau for the data mart and consume components or some other combination.

    architecture

    I always try to set firewalls within an architecture so that any problems can be isolated and replaced without too much disruption. In this instance I’m going to use those firewalls so that I can try out new ideas and technologies.

    The main synchronisation points are the three data stores – Source Mirror, Data Warehouse and Data Mart. (In this instance I am using the term data mart to mean a prepared, subject area specific store optimised for analytical/aggregate queries.)

    The responsibilities for each stage are as follows:

    • Acquire > Source Mirror: receive data from source, ensure minimal load on source via high watermarks or another strategy, archive data for historical accuracy.  A key point here is that the source mirror has the same metadata as the source itself. No joins or transforms on the way. Sometimes simple data conversions are useful but less is more.
    • Load > Data Warehouse: apply business logic and transform the source data into dimensional model, update data warehouse.
    • Summarise > Data Mart: aggregate generation or cube processing.
    • Consume & Act: covers any output tool such as Reporting Services, Excel, Tableau Dashboard, R, F# etc.

    I consider the SQL/relational store to be “the data warehouse” and not Analysis Services which is better suited to a data mart role.

    It’s quite hard to be succinct when talking about architecture and this post is quite lengthy so I’ll split it and talk about risk driven architecture in the next post.

  • Torbay Hospital In-patient wards and treatment centres for mental health patients on the west side of the complex. source https://commons.wikimedia.org/wiki/File:Torbay_Hospital_-_geograph.org.uk_-_1416979.jpg

    Torbay Hospital In-patient wards and ... Torbay Hospital In-patient wards and treatment centres for mental health patients on the west side of the complex. (Photo credit: Wikipedia)[/caption]

    The previous couple of posts in this category haven’t exactly been exciting but they are important for context. Now things can concentrate more on the solution we are going to create.

    There are two main classes of business intelligence required – clinical and operational. The most important user stories are listed in italic text below.

    Clinical Requirements

    Clinicians want to know how effective treatments are; patient outcomes; diagnosis statistics and critical incident analysis:

    As a doctor I want to see treatment counts by patient and outcome so that I can determine the most effective treatments.

    As a doctor I want to see critical incident counts by patient and mental health professional.

    Operational Requirements

    Operational business intelligence is primarily concerned with service costs, efficiency and capacity planning:

    As an operational manager I want to see counts of assessments, treatments and discharges so I can plan capacity and monitor the number of patients in the system.

    As an operational manager I want monitor prescribing costs to budget effectively and look for unusual prescribing patterns.

    As an operational manager I want to see the number of bed days available and used so that I can monitor capacity and make sure suitable out of area options are available if needed.

    As an operational manager I want to see complaints by patient and mental health professional so I can make sure the service has a good customer focus.

    Other Requirements

    Finally, there are also IT requirements which must be satisfied but since they are not value-add for patients and doctors I’ll look into them later.

    This list is not exhaustive but the plan is to be agile – list the most important requirements, stack rank them and work down the list in iterations and re-plan often. Requirements will change, new ones will become apparent and some may even disappear before implementation starts. We will embrace this and not worry about the future too much.

    This entry was posted in sample-solution  and tagged #mental-health #mental-health-project #requirements #user-story  on .
    Discuss this on Twitter or LinkedIn
  • mental-health-orgchartI want to briefly wrap up the section on stakeholders specific to mental health services because this is where we will get our requirements from.

    At the top of the organisation is the chief executive and the trust board. Below her are the IT director, medical director, nursing director and three operational directors who are responsible for Child and Adolescent Mental Health Services (CAMHS), Adult Mental Health Services and Geriatric Mental Health Services respectively.

    The IT director is responsible for IT staff such as support staff and system administrators; IT systems and hardware; and IT projects.

    The medical director manages all the doctors in the organisation and the nursing director similarly the nursing staff.

    The operational directors focus on their individual services with a mix of staff including psychologists, counsellors, therapists, social workers and administration staff.

    Collecting all the information we have so far with an estimate on the individual’s needs:

    WhoInterest/PowerClassConcerns
    Chief ExecLow/HighAcquirerLow costs
    IT DirectorHigh/HighAcquirer/AssessorLow costs Ease of deployment
    Medical/Nursing DirectorsLow/HighUserMinimal training, or time taken away from duties
    Services DirectorsLow/HighUser/CommunicatorFunctionality
    IT administratorsHigh/LowAdministratorAutomated maintenance Simple troubleshooting Secure implementation Zero friction installs and upgrades
    User support staffLow/LowSupport staffEase of use Training material
    Team membersDepends/LowUsersFunctionality Ease of use

    The last three are generalisations – if this were reality I would be looking for specific people since the Interest/Power level is unique to a person and not the role.

    This entry was posted in sample-solution  and tagged #mental-health #stakeholder #stakeholders  on .
    Discuss this on Twitter or LinkedIn