Better software through software architecture and devops

@jamessnape

Posts

  • I’m currently trying to learn F# because I’m keen to learn new programming styles as well as languages. It turns out that many of the concepts we C# programmers know and love such as Linq (monads), generics and async workflows originated in either F# or other functional languages. Thinking ‘functionally’ is a great skill to have too. How does this apply to surrogate key mapping? Well to borrow a notation from F# we are looking for a function like this:

    string –> int

    That is, a function that takes a string (the business key) and returns an integer (the surrogate key). Surrogate key lookup is a perfect fit for the functional view where “functions have no side effects”. Pass the same string to our lookup function any number of times and it should return the same integer value. The poorly performing version of this function might run off to the database every call and retrieve the value but there is a familiar functional technique called Memoization that can help. C# programmers might call this technique “store the values in a hashtable and only call the database if the value is missing”. A few other optimisations are necessary. Firstly, memoization will only cache the result of a single call so if we have a few hundred thousand dimension members in the database it will still take a lot of calls to populate the cache. Secondly, my lookup function doesn’t really care about the mechanics for the real database call so it would be nice if we could abstract that away. Finally, because I intend this class to be used a part of a multithreaded pipeline it needs to make sure that the internal data structures are protected. Piecing these requirements together we can start to flesh out the code. The main map function as we mentioned takes a string and returns an int:

    public int Map(string businessKey) { }

    Since we want to prime the cache with a set of values and abstract the real lookup functionality the best place to configure this is in the constructor:

    public DimensionMapper(IDictionary<string, int>initialCache, Func<string, int> lookup) { }

    Assuming the constructor just saves these parameters for later we can create a first cut version of the Map function:

    public int Map(string businessKey)
    {
        int surrogateKey;
    
        if (this.map.TryGetValue(businessKey, out surrogateKey))
        {
            return surrogateKey;
        }
    
        surrogateKey = this.lookup(businessKey);
        this.map.Add(businessKey, surrogateKey);
    
        return surrogateKey;
    }
    

    This works but it isn’t thread safe. For that we need a ReaderWriterLockSlim since only writes need to be synchronised. If you look at the code above there are two parts to it – the first few lines check the cache and return a value if it exists (the majority path); the last three lines are concerned with calling the real lookup function and populating the cache with the result when it doesn’t exist. Splitting on this boundary allows us to wrap the first part in a read lock and the second in a write lock - turning the write part into a separate function is a little cleaner:

    public int Map(string businessKey)
    {
        this.updateLock.EnterUpgradeableReadLock();
    
        try
        {
            int surrogateKey;
    
            if (this.map.TryGetValue(businessKey, out surrogateKey))
            {
                return surrogateKey;
            }
    
            return this.Lookup(businessKey);
        }
        finally
        {
            this.updateLock.ExitUpgradeableReadLock();
        }
    }
    
    private int Lookup(string businessKey)
    {
        this.updateLock.EnterWriteLock();
    
        try
        {
            int surrogateKey = this.lookup(businessKey);
            this.map.Add(businessKey, surrogateKey);
            return surrogateKey;
        }
        finally
        {
            this.updateLock.ExitWriteLock();
        }
    }
    

    So we have most of the class written now and I haven’t discussed anything to do with databases or how we get a real surrogate key because…well its not relevant here since a function is passed to the constructor. I like this ability to concentrate on just a single algorithm and not worry about the wider solution. From what I’ve learned so far F# is better as this than C#.

    For the full class definition see the full source file in context and associated unit tests.

    This entry was posted in sample-solution  and tagged #f #functional #surrogate-key  on .
    Discuss this on Twitter or LinkedIn
  • I’ve been working on this for a few weeks now, half an hour at a time in the evenings and I can safely say it’s pretty hard to maintain a train of thought in thirty minute intervals. However a bare minimum implementation is complete and ready to discuss.

    We start with an acceptance test:

    https://gist.github.com/jsnape/5887988

    The first part of the feature describes the user story and the second part tells us that when we load three patient referrals then the total count should be 3 with 1 on the 1st January.

    I’m using SpecFlow for acceptance tests since it is very easy to define tables and there are some useful binding utilities as we will see. After entering the test we can immediately compile the application and run the tests without writing anything else. The test will obviously fail since we haven’t written any code. In fact the acceptance test will stay broken for some time as we write code and unit tests. When it passes we know the feature is done.

    So thinking about this functionally we effectively want to write a function that transforms an enumerable of source referral records into an enumerable of referral facts; then pipe this iterator into a SqlBulkCopy instance. Effectively this code needs to work:

    referralrepository.BulkCopy(referrals.Select(x => mapper.Map(x)));

    This is a Linq transform with a mapping function applied to each item in the source list. In the next few posts I’m going to break it into bite size chunks to implement.

    This entry was posted in sample-solution  and tagged #linq #mental-health-project #specflow  on .
    Discuss this on Twitter or LinkedIn
  • Tableau Visualization

    This week has been dominated by the Tableau Customer Conference. I was fortunate to get a ticket since it was sold out but one of our architects couldn’t go so I filled in. I’m glad I did.

    It’s been a while since I got to learn about a completely new technology so it is a refreshing change to be a bit of a novice. After a number of Microsoft conferences this one felt quite different too – less geeky with a more mixed crowd. It was interesting to be able to talk with non-technical types such as data analysts, business managers and statisticians.

    I mainly went to the technical sessions but a couple of the keynote sessions were really interesting. Firstly ‘Creating a culture of data at Facebook’ gave some useful ideas about creating communities and getting more staff comfortable with visualizations. It was also nice to listen to a blogger I’ve read for a while (but only just discovered worked for Facebook). The second was Prof. Hans Rosling. I’ve seen his TED talk but in person was completely different – probably because he was talking to a room full of data visualisation professionals. He had plenty of anecdotes about how his famous visualizations came about. Ellie Fields gives a good description of his talk.

    So back to the day job now but with some new ideas about business intelligence and data visualization.

    This entry was posted in data-visualization  and tagged #conference #data-visualization #hans-rosling #tableau #tableau-customer-conference  on .
    Discuss this on Twitter or LinkedIn
  • Last time we were looking at the mental health project I was discussing the dimensional model. I think its time to have a crack at some code now. But this first session is just about setting up my project.

    There are some key things every agile project should do:

    • Automated build with acceptance and unit tests
    • Automated code analysis
    • Automated deployment with integration tests

    Note everything is automated - it has to be repeatable and not need human intervention or it won’t get done. I’m a big fan of continuous integration and continuous deployment so I’m going to use Team City as a build service since its free for a single agent.

    Team City is a very configurable and powerful tool but I want to make sure that I can build and deploy from my local command line in exactly the same way that the Team City agent will since it makes debugging issues easier and allows developers to check the build works before committing.

    There are lots of build script tools around such as FinalBuilder but I prefer MSBuild since its readily available and a text format. Visual Studio uses MSBuild internally but we are not going to change project files; we are going to create a higher level script to tie everything together. Since this is a simple start it’s all going in one build file.

    https://gist.github.com/jsnape/5730292

    The build script is split into 2 main parts. At the top are property and item definitions – this is the build metadata controlling what and how the build will happen. Below that are Imports and Targets which deal with the mechanics of building. This split makes it easy to add new projects and settings without having to change your overall build script.

    There are four main targets listed which are Clean, SourceAnalysis, Compile and Test. The last three of which make a build. It’s fairly self-explanatory but if you don’t know MSBuild script imagine anything in a $() is a single value or variable, @() is a list of items. Each target has a list of tasks which are executed in order to complete the target.

    So, this script is very simple; it just runs StyleCop over a set of source files, builds a Visual Studio solution and runs Xunit against a set of assemblies. Not much but it gives us a single command line action to build and test the solution as we add features:

    PS> msbuild draco.proj
    

    This is then setup as a single step in TeamCity. Every check-in causes the build to run and tests to execute.

    The complete set of source for this project is available at https://github.com/jsnape/draco.

    This entry was posted in sample-solution  and tagged #build #build-automation #build-management #ci #continuous-integration #deployment  on .
    Discuss this on Twitter or LinkedIn
  • tattoo work by Keith Killingsworth source http://commons.wikipedia.org/wiki/File:Tattoos.jpg

    So in the comments on a recent post on Risk Driven Architecture, Jamie Thomson asked whether the problems associated with change can be mitigated by using views. I firmly believe that views can help but unfortunately not enough to save you from clients that connect directly with Analysis Services cubes.

    So it got me thinking about a similar mitigation for cubes. Unfortunately nothing came to mind apart from an analogy:

    Dimensional models are like tattoos – you have to live with them for a long time

    Why you might ask? Well you can add to them, maybe fill in some extra colour but basically once you’ve committed to you are stuck with them because every spread sheet and report using your model will need fixing if you try to remove something. Like tattoos, you can remove them but its going to be painful and cost a lot of money.

    I don’t have any tattoos (not because I don’t like them, I just can’t decide on one that I’d have to live with for so long). However I’ve heard plenty of guidance about taking your time before committing – one of the best techniques is to simply draw your new tattoo with a Sharpie and try it on for size for a while.

    How does this help with dimensional models? Well the same techniques apply. Try a new model on for size, especially if you can arrange it for the new model to fade like the Sharpie as time passes which automatically limits client usage. Maybe process the cube manually for a while – your users will soon tell you if the data is useful. This fits with an agile approach too - only put measures and attributes in the cube if you need them and don’t add stuff in the hope that it will be used productively.

    This entry was posted in business-intelligence  and tagged #dimensional-model  on .
    Discuss this on Twitter or LinkedIn