Interesting Links #4

These seem to get longer and longer. A whole pile of links for you.

Management and Organisational Behaviour

How Serving Is Your Leadership? – Who is working for who here?

Be a Manager – “The only reason there’s so many awful managers is that good people like you refuse to do the job.”

I’m the Boss! Why Should I Care If You Like Me? – Because your team will be more productive… Here are some pointers.

Software Development

Technical debt 101 – Do you think you know what technical debt is and how to tackle it? Even so I’m sure this article has more you can discover and learn. A must read.

Heisenberg Developers – So true. In fact this hits a little close to home since we use JIRA, the bug tracking tool mentioned in the article.

What is Defensive Coding? – Many think that defensive coding is just making sure you handle errors correctly but that is a small part of the process.

Need to Learn More about the Work You’re Doing? Spike It! – So you are an agile shop, your boss is demanding some story estimates and you have no idea how complex the piece of work is because it’s completely new. What do you do?

Software Development with Feature Toggles – Don’t branch, toggle instead.

Agile practices roundup – here are a number of articles I’ve found useful recently:

How to review a merge commit- Phil dives into the misunderstood world of merge commits and reviews. Also see this list of things to look out for during code reviews.

Functional Programming

Don’t Be Scared Of Functional Programming – A good introduction to functional programming concepts using JavaScript as the demonstration language.

Seamlessly integrating T-SQL and F# in the same code – The latest version of FSharp.Data allows you to write syntax checked SQL directly in your F# source and it executes as fast as Dapper.

Railway Oriented Programming – This is a functional technique but I’ve recently been using it in C# when I needed to process many items in a sequence, any of which could fail and I want to collect all the errors up for reporting back to ops. It is harder to do in C# since there are no discriminated unions but a custom wrapper class is enough.

Erlang and code style – A different language this time, Erlang. How easy is programming when you don’t have to code defensively and crashing is the preferred way of handling errors.

Twenty six low-risk ways to use F# at work – Some great ways to get into F# programming without risking your current project.

A proposal for a new C# syntax – A lovely way to look at writing C# using a familiar but lighter weight syntax. C#6 have some of these features planned but this goes further. Do check out the link at the end of the final proposal.

Excel-DNA: Three Stories – Integrating F# into Excel – a data analysts dream…

Data Warehousing

Signs your Data Warehouse is Heading for the Boneyard – Some interesting things to look out for if you hold the purse strings to a data warehouse project. How many have you seen before?

The 3 Big Lies of Data – I’ve heard these three lies over and over from business users and technology vendors alike. Who is kidding who?

Six things I wish we had known about scaling – Not specifically about data warehouses but these are all issues we see on a regular basis.

Why Hadoop Only Solves a Third of the Growing Pains for Big Data – You can’t just go and install a Hadoop cluster. There is more to it than that.

Microsoft Azure Machine Learning – Finally it looks like we can have a simple way of doing cloud scale data mining.

Data Visualization

5 Tips to Good Vizzin’ – So many visualizations break these rules.

Five indicators you aren’t using Tableau to its full potential – I’ve seen a few of these recently – tables anyone?

Create a default Tableau Template – Should save some time when you have a pile of dashboards to create.

Building a Tableau Center of Excellence – It is so easy to misunderstand Tableau which is not helped by a very effective sales team. This article has some great advice for introducing Tableau into your organisation.

Beginner’s guide to R: Painless data visualization – Some simple R data visualization tips.

Visualizing Data with D3 – If you need complete control over your visualization then D3 is just what you need. It can be pretty low-level but its easy to produce some amazing stuff with a bit of JavaScript programming.

Testing

I Don’t Have Time for Unit Testing – I’ve recently been guilt of this myself so I like to keep a reminder around – you will go faster if you write tests.

Property Based Testing with FsCheck – FsCheck is a fantastic tool primarily used in testing F# code but there is no reason it can’t be used with C# too. It generates automated test cases to explore test boundaries. I love the concise nature of F# test code too especially with proper sentences for test names.

Analysis Services

I’ve collected a lot of useful links for Analysis Services, both tabular and multidimensional:

DAX Patterns website – This website is my go-to resource for writing DAX calculations. These two are particularly useful:

Using Tabular Models in a Large-scale Commercial Solution – Experiences of SSAS tabular in a large solution. Some tips, tricks and things to avoid.

Also:

Interesting Links #3

Latest links for easy consumption over the May long weekends – I missed out on March so have dropped some of the less interesting ones to keep the list short.

Organisational Behaviour

Programmers, Teach Non-Geeks The True Cost of Interruptions – a simple way to show to your boss how drive-by-management kills programmer productivity. Also work reading Maker’s Schedule, Manager’s Schedule which highlights the differences. If this is still a problem then this notice might be your only solution

The Death Of Expertise – The Dunning-Kruger effect is often strong in semi-technical managers especially in industries where confidence plays a large part in success such as finance. This article discusses some of the problems related to treating all opinions as equal and ignoring experts.

Save Your Software from the Start: Overcoming Skewed Thinking in the Project Planning Stage – Very simply, why we always underestimate the true complexity and cost of a project plus some tools to help overcome these psychological effects.

Why Good Managers Are So Rare – Gallup finds that companies fail to choose the candidate with the right talent for the job 82% of the time. Managers account for at least 70% of variance in employee engagement scores across business units.

I Give Up: Extroverted Barbarians at the Gates – Anyone remember the “perpendicular transparent red lines” video doing the rounds? This is an on-the-nail deconstruction of what is happening and why it happens. If you are an introvert then this other post might feel very familiar to you.

Agile

Coconut Headphones: Why Agile Has Failed – A rant about how modern agile methodologies seem to only consist of management practises. Take note of the end points to being successful.

The death of agile? – Additional comment on the above. 

Writing User Stories for Back-end Systems – The real functionality a user sees in a business intelligence project is quite small and can easily be described in a few words. This makes breaking up user stories into sprint sized chunks hard. This article gives some great advice that can be translated to BI projects. 

Design Your Agile Project, Part 1 – So how do you pick the right kind of agile project? When should you use Kanban and when should you use Scrum? How is the business side of equation handled? Also Part 2, Part 3, and Part 4.

Large Agile Framework Appropriate for Big, Lumbering Enterprises – A perfect solution to doing agile in finance organisations (wink). Love the concept of ‘Pair Managing’.

Metrics that matter with evidence-based management – Its long but Martin does a great job looking at lots of the metrics in use today, why their use is limited and a far better approach to designing metrics that really help.

Databases

Is ETL Development doomed? – “Long term, the demand for ETL skills will decline”. The demand will mutate into one for more abstract ETL capabilities.

Testing

Intro to Unit Testing 9: Tips and Tricks – A handy list of tips that can make maintaining unit test code a little easier.

FsCheck + XUnit = The Bomb – Even if you write code in C# it may be wise to think about writing unit tests in F# since the code is more concise, easier to read and with FxCheck can find things you might not.

Data Visualization

5 Tips to Good Vizzin’ – Should be required reading for anyone who is thinking about creating dashboards in Tableau.

A Natural Approach to Analytics – This explains why using tools such as Tableau for largely static dashboards is a waste of time. Users need to interact with the data in a way they cannot do when relegated to dashboard consumers.

Big Data/Hadoop

Modern Financial Services Architectures Built with Hadoop – Hortonworks looks at big data in financial services.

Beyond hadoop: fast queries from big data – I think Hadoop might be catching up here but it is still a bit of an elephant compared to SQL Server/Oracle etc when it comes to raw query performance.

Don’t understand Big Data? Blame your genes! – 5 common errors for dealing with big data.

The Parable of Google Flu: Traps in Big Data Analysis – Big data answers are not always correct. This paper looks at some of the pitfalls.

No, Hadoop Isn’t Going To Replace Your Data Warehouse – More thoughts on modern data architectures and hybrid transactional/analytical processing.

 

Taking on dependencies with Deeply

Generally I try hard to avoid adding dependencies to a library project designed for reuse. Since Deeply is a Nuget package I have no idea how it might end up being used and for that reason I’m unwilling to add dependencies that might not fit with a user’s design. As a user of Deeply however, I’m finding that I have to add the same patterns repeatedly and would rather just use a pre-existing package.

How to reconcile these opposing arguments? I’ve decided to add a new package to Nuget – Deeply.Extras. This assembly is free to on take whatever dependencies make sense. Initially this is going to be Autofac for its CommonServiceLocator implementation and CsvHelper to provide a CsvBulkRepository.

Interesting Links #2

January was a long month so I’ve got quite a list for you. I may consider doing these more often if readers think there are too many items for a single list.

Governance

Self-Service Business Intelligence Governance – Essential reading/watching for anyone planning to deliver self-service business intelligence.

Five Stages of Data Grief – we’ve all been through this, “If you don’t think you have a quality problem with your data you haven’t looked at it yet”.

Functional Programming

Maybe that shouldn’t be settable – Bringing some of the F# Option type goodness into a C# world.

Software Process

Five Tips to Get Your Organisation Releasing Software Frequently – my team score well on these but culturally I can see some being quite difficult to implement, particularly around the devops style organisation of teams.

Pairing vs. Code Review: Comparing Developer Cultures – pros and cons for each style of quality culture. Which, if any, is best?

Is Agile BI Really a Better Mousetrap? – A great article on the benefits of agile BI. This really appeals due to its use of development process business intelligence – measure and optimise just like we preach to our customers.

Using Vertical Slicing and Estimation to make Business Decisions at Adobe – A good look at the release planning process at Adobe with some nice techniques discussed.

Personal Development

Of Orcs and Software Craftsmanship – Best quote of the month if you are a parent: “These are the types of error messages that make debugging a software like debugging a 2 month old baby.”

Yak Shaving Defined – Sometimes if feels like this all day long in software.

Organisational Behaviour

Performance Reviews Are Not Useful; Feedback Is – Personally I think performance reviews are something that human resources departments mandate; feedback is something that leaders give.

If Managers Don’t Give Performance Reviews, What Happens? – Well, as it turns out, a lot of good things start to happen.

Top 10 ways to ensure your best people will quit – some common mistakes; how many have you come across?

Testing and Test Driven Development

These next three links are related and if you read the first you should also read the second and third.

The Failures of “Intro to TDD” – Justin Searls rips into the current way of teaching test driven development.

The Domain Discontinuity – Bob Martin responds comprehensively but ends with why the issue is not about test driven development but wider issues such as architecture and domain design.

Commentary on ‘Roman Numerals Kata with Commentary’ – Ultimately you must understand your domain before trying to do test driven development.

Databases

Default Configuration of SQL Server – Like most software, out of the box SQL is configured for the most general case and may need extra tuning for specific workloads. Thomas gives a simple set of extra configuration changes and reasons why. Also love the quote “If you are working in a bank, they may not apply to you.”

Data Visualization

Announcing Power BI for Office 365 – In case you missed it, all the fancy new BI capabilities in the Microsoft cloud are publicly available now. Shame we are stuck using corporate infrastructure.

Famous Movie Quotes as Charts – A fun look at communication in chart form.

Ten Tips and Tricks for New Tableau users – A rather nausea inducing format but useful tips for making great Tableau dashboards.

Power Tools for Tableau – Desperate for some sort of an API with Tableau? This may be the answer.

Statistics and Data Analysis

Revolution Analytics – Want to run ‘R’ statistics against your Hadoop data? This seems to be the way to do it…

Learn R interactively with the swirl package – It looks like R is going to be an important tool for us so anything that makes it easier to learn is a bonus.

Learn Data Science Online with DataCamp – Similarly, learning data science online and interactively.

Analysis of Health Inspection Data using F# – Another great example of using F# (and D3) to analyse data quickly and easily.

Big Data

Big Data: The organizational challenge – Some interesting stats comparing companies with the best analytic capabilities vs. those that don’t.

Update on Stinger: the view from a Microsoft Committer – Stinger is the Hortonworks initiative for faster SQL queries against Hadoop. This article describes some of the recent performance gains.

How To Install Hadoop on Windows with HDP 2.0 – Get Hadoop running on Windows with a minimum of fuss. However, our local Hadoop expert recommends you only do this at home; in the enterprise just setup a proper development cluster.

How To Use Microsoft Excel to Visualize Hadoop Data – Tutorial for visualizing Hadoop data in Excel/PowerView, this one is for stock quotes.

How to Visualize Website Clickstream Data – Another Hadoop tutorial this time on web click-stream data.

50+ Open Source Tools for Big Data – I think one of the problems with open source is it littered with cute names that do little to describe software function so here is a useful list to help you distinguish the likes of Orient, Flock, Storm and others.

Building your own web analytics system using Big Data tools – Should you build these things yourself? What are the choices? Are there any risks?

Master Data the noun in Big Data sentences – I often talk about master data and spend more time worrying about dimension design than facts. It is useful to see how this applies to big data too.

You don’t have big data… – With all this talk of big data it is worth remembering that most use cases do not quality at big. Most likely you have ‘hot data’.

 

Using an open source committer policy in the enterprise

Southbourne BeachHow would you change your behaviour if the person reviewing your code had the final say on whether it makes it into the source repository?

I often see code reviews done in principle but not practice. The workflow goes something like this:

  • Developer assigned a new feature
  • Developer designs and codes up the feature
  • Developer checks the code into source control
  • A reviewer is found/assigned
  • The reviewer reviews code and finds a bunch of issues – a lot to do with the design
  • The developer fixes a few of the easy code related issues
  • Time is running out so the remaining issues are left
  • Developer moves to the next feature

Contrast this with the usual open source workflow where the reviewer is the committer and not the developer; the reviewer is responsible for the final code quality and it easy for them to refuse your code.

If you were the developer in this situation what would you do? Would you discuss your design with the reviewer before you started work? Would you keep the reviewer updated with your progress and any decisions you made? Would you make sure there were no surprises for the reviewer when they finally saw your code?

If you did then the reviewer feedback should be minimal and relatively easy to fix, more importantly code quality is maintained and without an expensive rewrite.

Interesting Links #1

Since I manage to read so much on the train I think readers will find some of the articles useful so I plan on listing up the best ones each month.

Business Intelligence

Databases

Code

Testing

Development Process

Personal Development

Organisational Behaviour

  • The Open-Office Trap – New Yorker article rounding up all the research done one open space workplace productivity. Some interesting results among the expected ones.
  • Can-Do vs. Can’t-Do Culture – “The trouble with innovation is that truly innovative ideas often look like bad ideas at the time.” Next time you are thinking why something won’t work, take a moment to consider if you are stopping innovation.
  • Don’t interrupt developers – Absolutely nails why you should not interrupt developers.
  • Are Your Programmers Working Hard, Or Are They Lazy? – “the appearance of hard work is often an indication of failure” – a must read for both developers and managers.

Deeply 0.2.0 Alpha

I’ve just pushed a new version of Deeply to nuget.org. This version provides just enough functionality to write some basic ETL jobs:

  • Parallel and Sequential Tasks
  • Execute SQL Task
  • Execute Process Task
  • Simple Dataflow Task

The tasks are pretty self-explanatory. The key part it nearly all the setup is done in the constructor; once the structure is created then it is executed asynchronously.

Data flows are a little harder to configure. You need a source, a target and a mapping function. A source is anything conforming to IEnumerable<T>, a target is class that accepts and IEnumerable<T> implemented in IBulkRepository<T> and finally a mapping function that maps the source<T> to the target<T>.

The code for using a simple data flow looks a little like the pseudo-csharp below:

var source = new CsvReader("C:\sourcefile.csv");

var connectionFactory = new SqlConnectionFactory("Data Source=(localdb)\v11.0;");

var columnMappings = new Dictionary<string, string="">()
            {
                { "Id", "Id" },
                { "Name", "Name" },
                { "Created", "Created" }
            };

var target = new SqlBulkRepository("dbo.FactTable", connectionFactory, columnMappings);

var dataflow = new SimpleDataflowTask<sourcetype, targettype="">(
 this.source, MappingFunctions.Identity, target);

var context = new TaskContext();
await dataflow.ExecuteAsync(context);

If anyone would like to help write some examples and documentation I’d be immensely grateful but otherwise please let me know of your experiences using this package.

I’m fed* up with SQL Server Integration Services

* I mean this in a the most British sense of the phrase

I remember how painful the original Data Transformation Services tool was to use (loops anyone?) and when Integration Services was shipped with SQL Server 2005 is was a breath of fresh air for anyone trying to build data warehouses. For SQL Server developers the choice was simple use the free tool that was easy to use and fast to execute or try to code your own. In 2005 the code your own option was hard. You had to write your own threading model, workflow, streaming primitives, event logging etc. since the managed APIs of the time were not that mature.

In contrast Integration Services was easy. Easy to create, even if you don’t know anything able code; easy to run, even if you are not a SQL Server administrator and blisteringly fast, provided you follow some best practice guidelines. This may be one of the reasons I dislike it so much – it sets unrealistic expectations to project managers on how long it should take to develop real software solutions. Bear with me on this one as the point of the post is not about what’s wrong with SSIS but how it limits you in the wider solution.

I do remember an e-mail conversation with Kirk Haselden before I joined Microsoft about how COM, which SSIS is built on, leaks abstractions all over the .NET API. He maintained it was the right thing to do; I wasn’t so sure but it was his product so I didn’t put up much of a fight.

I believe that SSIS is designed as a tool for DBAs or those database developers that only know SQL, MDX and Reporting Services. It is the best is can possibly be without introducing code concepts to these people.

A few months back I read a post by Teo Lachev called When Developers and BI Collide which I agree with large parts of – primarily that you must have BI specialists if you want to develop BI solutions and some that I disagree with – maintaining SSIS is easier than maintaining custom code, coders are not BI pros. I consider myself a coder AND a BI pro and there are a number of similar people working in the team I architect for at the moment. Actually when hiring I have found is it often easier and more productive to find and teach a coder about BI than the reverse.

So anyway I digress, I joined Microsoft in 2006 and did a lot of consulting around integration services. It was a popular tool for customers to have problems with. We used it as part of a solution for a large UK bank’s faster payments implementation. It was a hard project for me – the rest of the team were designing C# code and publishing object based APIs. I had to use a shared encryption routine so when BizTalk unwrapped a transaction at the other end it would be able to decrypt it. This API meant I has to spend a good proportion of my time writing boring and error prone code to convert data sets (SSIS data flow) to objects and back to data sets again. This data mapping code was the interesting part though – I hate ‘programming’ by mouse; click, click, type, drag, click…and this is what the SSIS experience is. That was the first time I regretted using SSIS on a project.

There are plenty of posts about what is wrong with SSIS and some equally passionate responses. My main issues with it are all related to real world usage. I have never been involved in a project where SSIS was the whole solution. It is always just a component of something bigger, an architecture, a project, a development team and a process. I work exclusively in agile teams now and every step of the way SSIS slows things down:

  • Unit testing is not possible (SSISUnit is really a component or integration test)
  • Agile team development requires code branching and merging which is not possible
  • Distributed source control (such as Git) can’t be used at all since there is no way to lock a file whilst you are working on it
  • Code reviews are difficult – you have to open up every package and click every box to check
  • It is hard to enforce project standards – StyleCop and FxCop do not work on VSA code
  • There is no way to share code – copy/paste coding is prolific
  • Everyone uses a template package to try and ensure some standards – unfortunately you can’t make changes to that template though since it was copied
  • COM leaks abstractions everywhere from the C# APIs to the type system
  • The type system to too tightly bound to metadata – need to change a column length, shame now you have to open all the relevant packages and fix the issues; ANSI <-> Unicode conversions must be explicit
  • There is no way to stub out data sources or destinations i.e. temporarily replace a SQL table with a local file for testing
  • Mouse based programming

The net result of all this is yes, it is very quick to get something running but you must forever pay interest on the technical debt you just created. SSIS is not friction free in a team development environment.

There are two situations where I believe you should use SSIS:

  1. If you are a DBA or database developer, you just need to get something done quickly and don’t care about the maintenance issues I’ve described above
  2. You need to take advantage of some of the more complex to code components such as CDC or fuzzy matching (remember that it is very easy to call packages from code anyway so no need to base the entire solution on SSIS)

What are the alternatives? The simplest one is to make use of SSIS in an abstract way – code up something that can load your configuration, pass it to packages, execute them and capture the events for logging. We use something like this on my current project and to an extent it has helped a little. We still have to manage lots of packages though.

Next up the ladder is to create abstractions of the tasks that packages are performing and generate them. Biml is a perfect example of this.

Finally, I mentioned that APIs back in 2005 were not that mature. Here in 2013 though we have some fantastic APIs to work with – Task Parallel Library, asynchronous workflows, Linq, functional programming concepts and the rich ecosystem of Nuget packages. Coding isn’t as hard as it used to be.

I started out this summer to produce an end to end BI solution in an agile way but quickly found out I needed to update some of my skills with respect to the C# tools and techniques available. So whilst I haven’t been blogging I have coded, learned and tried out ideas. Some of these are ready to show and you can try them out look for Deeply on Github or on Nuget. It is early days but try it out and let me know what you think.

Thinking functionally about surrogate key mapping

I’m currently trying to learn F# because I’m keen to learn new programming styles as well as languages. It turns out that many of the concepts we C# programmers know and love such as Linq (monads), generics and async workflows originated in either F# or other functional languages. Thinking ‘functionally’ is a great skill to have too. How does this apply to surrogate key mapping? Well to borrow a notation from F# we are looking for a function like this:

string –> int

That is, a function that takes a string (the business key) and returns an integer (the surrogate key). Surrogate key lookup is a perfect fit for the functional view where “functions have no side effects”. Pass the same string to our lookup function any number of times and it should return the same integer value. The poorly performing version of this function might run off to the database every call and retrieve the value but there is a familiar functional technique called Memoization that can help. C# programmers might call this technique “store the values in a hashtable and only call the database if the value is missing”. A few other optimisations are necessary. Firstly, memoization will only cache the result of a single call so if we have a few hundred thousand dimension members in the database it will still take a lot of calls to populate the cache. Secondly, my lookup function doesn’t really care about the mechanics for the real database call so it would be nice if we could abstract that away. Finally, because I intend this class to be used a part of a multithreaded pipeline it needs to make sure that the internal data structures are protected. Piecing these requirements together we can start to flesh out the code. The main map function as we mentioned takes a string and returns an int:

public int Map(string businessKey)
{
}

Since we want to prime the cache with a set of values and abstract the real lookup functionality the best place to configure this is in the constructor:

public DimensionMapper(IDictionary<string, int>initialCache, Func<string, int> lookup)
{
}

Assuming the constructor just saves these parameters for later we can create a first cut version of the Map function:

public int Map(string businessKey)
{
    int surrogateKey;

    if (this.map.TryGetValue(businessKey, out surrogateKey))
    {
        return surrogateKey;
    }

    surrogateKey = this.lookup(businessKey);

    this.map.Add(businessKey, surrogateKey);

    return surrogateKey;
}

This works but it isn’t thread safe. For that we need a ReaderWriterLockSlim since only writes need to be synchronised. If you look at the code above there are two parts to it – the first few lines check the cache and return a value if it exists (the majority path); the last three lines are concerned with calling the real lookup function and populating the cache with the result when it doesn’t exist. Splitting on this boundary allows us to wrap the first part in a read lock and the second in a write lock – turning the write part into a separate function is a little cleaner:

public int Map(string businessKey)
{
    this.updateLock.EnterUpgradeableReadLock();

    try
    {
        int surrogateKey;

        if (this.map.TryGetValue(businessKey, out surrogateKey))
        {
            return surrogateKey;
        }

        return this.Lookup(businessKey);
    }
    finally
    {
        this.updateLock.ExitUpgradeableReadLock();
    }
}

private int Lookup(string businessKey)
{
    this.updateLock.EnterWriteLock();

    try
    {
        int surrogateKey = this.lookup(businessKey);

        this.map.Add(businessKey, surrogateKey);

        return surrogateKey;
    }
    finally
    {
        this.updateLock.ExitWriteLock();
    }
}

So we have most of the class written now and I haven’t discussed anything to do with databases or how we get a real surrogate key because…well its not relevant here since a function is passed to the constructor. I like this ability to concentrate on just a single algorithm and not worry about the wider solution. From what I’ve learned so far F# is better as this than C#.

For the full class definition see the full source file in context and associated unit tests.

Mental Health Referrals – First Feature Acceptance Test

I’ve been working on this for a few weeks now, half an hour at a time in the evenings and I can safely say it’s pretty hard to maintain a train of thought in thirty minute intervals. However a bare minimum implementation is complete and ready to discuss.

We start with an acceptance test:

The first part of the feature describes the user story and the second part tells us that when we load three patient referrals then the total count should be 3 with 1 on the 1st January.

I’m using SpecFlow for acceptance tests since it is very easy to define tables and there are some useful binding utilities as we will see. After entering the test we can immediately compile the application and run the tests without writing anything else. The test will obviously fail since we haven’t written any code. In fact the acceptance test will stay broken for some time as we write code and unit tests. When it passes we know the feature is done.

So thinking about this functionally we effectively want to write a function that transforms an enumerable of source referral records into an enumerable of referral facts; then pipe this iterator into a SqlBulkCopy instance. Effectively this code needs to work:

referralrepository.BulkCopy(referrals.Select(x => mapper.Map(x)));

This is a Linq transform with a mapping function applied to each item in the source list. In the next few posts I’m going to break it into bite size chunks to implement.