My commute is around two and a half hours each way so I read a lot on the train. One of the subjects I’ve recently become interested in Domain Driven Design or DDD. I’ve found it isn’t really a new topic for me but more like someone has documented many of the techniques I’ve always used.
DDD discusses data warehouses primarily as an output or reporting function within a larger application. The book Implementing Domain-Driven Design by Vaughn Vernon mentions reporting repeatedly as a by-product of DDD (particularly when used with Event Sourcing) but not directly as a possible use case.
I would agree that not all concepts can be reused in data warehouse solutions since the only interface available is often one that transfers a set of mutations (property value changes) without the accompanying reasons (it is a key idea in DDD that you need to design your model not by changes in attributes but by operations performed on entities). For example an order count has decreased and the reason is missing – was the order returned, cancelled, an error etc. So where can it be applied? Are any of the concepts useful when designing data warehouses?
Lets start with one of the core concepts – Ubiquitous Language is a rigorous shared language used between developers and users. It is used to make sure that conversations are accurate and productive. It should evolve as the team’s understanding of a domain changes. The ubiquitous language is what forms the domain model at the heart of a software solution.
I find this description very similar to an equivalent concept in data warehousing – the Dimensional Model. This model, and its associated dimension bus matrix, is based on real business processes and terminology. The dimensional model is the public face of the data warehouse. It needs to be precise, reflect the terms used by business users and form a common vocabulary between users and the development team. For example when browsing an Analysis Services cube in Excel, the dimensions and facts defined in the dimensional model are directly visible to end users – if they don’t automatically understand what is on-screen then the model doesn’t describe the business.
Entities and Value Objects
There are two types of object in domain driven design – entities and value objects. Value objects are immutable and identified via their attributes. For example $100 USD in one object is the interchangeable with $100 USD in another. Entities on the other had cannot be identified purely by their attributes – there must be some sort of unique identifier (in data warehouse terms this is a business key) to differentiate similar entities. For example, one John Smith may not be the same as another and need a Customer-Id to differentiate the two.
With respect to dimensional models, value objects should not be implemented as top-level dimensions but instead be added at sets of attributes to the entities which own them. For example ‘Product Colour’ is a value type (colour) and should belong in the ‘Product’ dimension. This seems obvious when written this way but happens a lot.
Entities and Aggregate Roots
DDD groups sets of closely related entities under the control of a single ‘Aggregate root’. Some entities make no sense unless the parent entity is also within context; order lines and parent orders are the typical example.
So value objects shouldn’t be dimensions and I don’t think ordinary entities should be either. True dimensions are the aggregate roots and the one thing that seals it for me is that an aggregate root (according to DDD) defines a transactional boundary – you should not update multiple aggregate roots within a single transaction; instead sagas keep your data warehouse in sync (eventually).
Domain driven design and business intelligence share a number of common concepts yet the two philosophies are rarely seen as related. I think there is a lot to be gained by applying software concepts from different viewpoints which may not ordinarily be considered.
I see one big difference, though. DDD specifically stays away from a single “Enterprise-wide” model, where Dimensional Modeling strives for it, especially with the concept of Conformed Dimensions. Any thoughts on how to bridge these?
One topic I’ve found interesting is how to design integrations from a DDD application to a Data Warehouse. I have tried this before in Near Real-Time Data Warehousing and it gets interesting, and full of pitfalls. Anything to say about this?
I think the concept of a single enterprise warehouse bus may work for smaller companies but I haven’t done an architecture with a single set of conformed dimensions for a while now. I mostly follow a staging/warehouse/presentation layer mechanism with multiple stars existing in the presentation layer. Funnily enough the stars tend to map to “bounded contexts” so yet more similarity I guess. There are common enterprise wise dimensions/entities that exist in multiple presentation cubes they rarely have the exact same set of attributes though.
For your second point – yes, full of pitfalls. I would look at Lambda Architecture or Data Vault depending on whether your real-time is required at the presentation layer or not.
Good to see some analogies with DDD concepts such as bounded context being made with DW patterns. I’ve found a lot of friction can exist between the producers of data in an enterprise where DDD has been adopted and the consumers of that data. Typically the producers e.g. a team responsible for a bounded context will publish data for public consumption onto a bus or message queue that is a reflection of their domain model and wont care much for any consumers who need to join data from other producers’ bounded contexts. Consumers, such as a data warehouse team can have their work cut out aggregating data that is produced in this way when most of the data they are interested in comes from multiple bounded contexts. In some cases, I’ve heard DW team members talk about the need for an “enterprise wide data model” that upstream consumers adhere to which can flow into their data warehouse already pre-aggregated and fit hand in glove with their data model. My response to them has centered around coupling. Coupling every team in a non-trivial enterprise to a single data model hampers productivity and has never been shown to work. I emphasise to them that models exist to solve a problem and that a model should be tailored to solving one context’s problem only. So, referring to a data warehouse as part of the reporting and analysis context within an enterprise is a good way to get the data warehouse team to start thinking about models within their own bounded context. It also means they start to get aligned with the way that their sibling teams (the producers) are thinking. Getting all of the developers in your business aligned with this common theme of bounded contexts, and context specific models is probably the one things that warrants being “enterprise wide”. A lot more so than an enterprise wide data model.