Better software through software architecture and devops


Tag Archives: #dataflow

  • I’ve just pushed a new version of Deeply to This version provides just enough functionality to write some basic ETL jobs:

    • Parallel and Sequential Tasks
    • Execute SQL Task
    • Execute Process Task
    • Simple Dataflow Task

    The tasks are pretty self-explanatory. The key part it nearly all the setup is done in the constructor; once the structure is created then it is executed asynchronously.

    Data flows are a little harder to configure. You need a source, a target and a mapping function. A source is anything conforming to IEnumerable<T>, a target is class that accepts and IEnumerable<T> implemented in IBulkRepository<T> and finally a mapping function that maps the source to the target.

    The code for using a simple data flow looks a little like the pseudo-csharp below:

    var source = new CsvReader("C:\\sourcefile.csv");
    var connectionFactory = new SqlConnectionFactory("Data Source=(localdb)\\v11.0;");
    var columnMappings = new Dictionary<string, string>()
        { "Id", "Id" },
        { "Name", "Name" },
        { "Created", "Created" }
    var target = new SqlBulkRepository(
        "dbo.FactTable", connectionFactory, columnMappings);
    var dataflow = new SimpleDataflowTask<TSource, TTarget>(
        this.source, MappingFunctions.Identity, target);
    var context = new TaskContext();
    await dataflow.ExecuteAsync(context);

    If anyone would like to help write some examples and documentation I’d be immensely grateful but otherwise please let me know of your experiences using this package.

    This entry was posted in data-warehousing  and tagged #dataflow #deeply #etl #nuget  on .
    Discuss this on Twitter or LinkedIn