Thinking functionally about surrogate key mapping

I’m currently trying to learn F# because I’m keen to learn new programming styles as well as languages. It turns out that many of the concepts we C# programmers know and love such as Linq (monads), generics and async workflows originated in either F# or other functional languages. Thinking ‘functionally’ is a great skill to have too. How does this apply to surrogate key mapping? Well to borrow a notation from F# we are looking for a function like this:

string –> int

That is, a function that takes a string (the business key) and returns an integer (the surrogate key). Surrogate key lookup is a perfect fit for the functional view where “functions have no side effects”. Pass the same string to our lookup function any number of times and it should return the same integer value. The poorly performing version of this function might run off to the database every call and retrieve the value but there is a familiar functional technique called Memoization that can help. C# programmers might call this technique “store the values in a hashtable and only call the database if the value is missing”. A few other optimisations are necessary. Firstly, memoization will only cache the result of a single call so if we have a few hundred thousand dimension members in the database it will still take a lot of calls to populate the cache. Secondly, my lookup function doesn’t really care about the mechanics for the real database call so it would be nice if we could abstract that away. Finally, because I intend this class to be used a part of a multithreaded pipeline it needs to make sure that the internal data structures are protected. Piecing these requirements together we can start to flesh out the code. The main map function as we mentioned takes a string and returns an int:

public int Map(string businessKey)
{
}

Since we want to prime the cache with a set of values and abstract the real lookup functionality the best place to configure this is in the constructor:

public DimensionMapper(IDictionary<string, int>initialCache, Func<string, int> lookup)
{
}

Assuming the constructor just saves these parameters for later we can create a first cut version of the Map function:

public int Map(string businessKey)
{
    int surrogateKey;

    if (this.map.TryGetValue(businessKey, out surrogateKey))
    {
        return surrogateKey;
    }

    surrogateKey = this.lookup(businessKey);

    this.map.Add(businessKey, surrogateKey);

    return surrogateKey;
}

This works but it isn’t thread safe. For that we need a ReaderWriterLockSlim since only writes need to be synchronised. If you look at the code above there are two parts to it – the first few lines check the cache and return a value if it exists (the majority path); the last three lines are concerned with calling the real lookup function and populating the cache with the result when it doesn’t exist. Splitting on this boundary allows us to wrap the first part in a read lock and the second in a write lock – turning the write part into a separate function is a little cleaner:

public int Map(string businessKey)
{
    this.updateLock.EnterUpgradeableReadLock();

    try
    {
        int surrogateKey;

        if (this.map.TryGetValue(businessKey, out surrogateKey))
        {
            return surrogateKey;
        }

        return this.Lookup(businessKey);
    }
    finally
    {
        this.updateLock.ExitUpgradeableReadLock();
    }
}

private int Lookup(string businessKey)
{
    this.updateLock.EnterWriteLock();

    try
    {
        int surrogateKey = this.lookup(businessKey);

        this.map.Add(businessKey, surrogateKey);

        return surrogateKey;
    }
    finally
    {
        this.updateLock.ExitWriteLock();
    }
}

So we have most of the class written now and I haven’t discussed anything to do with databases or how we get a real surrogate key because…well its not relevant here since a function is passed to the constructor. I like this ability to concentrate on just a single algorithm and not worry about the wider solution. From what I’ve learned so far F# is better as this than C#.

For the full class definition see the full source file in context and associated unit tests.

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s