Tag Archives: Microsoft Excel

Interesting Links #2

January was a long month so I’ve got quite a list for you. I may consider doing these more often if readers think there are too many items for a single list.


Self-Service Business Intelligence Governance – Essential reading/watching for anyone planning to deliver self-service business intelligence.

Five Stages of Data Grief – we’ve all been through this, “If you don’t think you have a quality problem with your data you haven’t looked at it yet”.

Functional Programming

Maybe that shouldn’t be settable – Bringing some of the F# Option type goodness into a C# world.

Software Process

Five Tips to Get Your Organisation Releasing Software Frequently – my team score well on these but culturally I can see some being quite difficult to implement, particularly around the devops style organisation of teams.

Pairing vs. Code Review: Comparing Developer Cultures – pros and cons for each style of quality culture. Which, if any, is best?

Is Agile BI Really a Better Mousetrap? – A great article on the benefits of agile BI. This really appeals due to its use of development process business intelligence – measure and optimise just like we preach to our customers.

Using Vertical Slicing and Estimation to make Business Decisions at Adobe – A good look at the release planning process at Adobe with some nice techniques discussed.

Personal Development

Of Orcs and Software Craftsmanship – Best quote of the month if you are a parent: “These are the types of error messages that make debugging a software like debugging a 2 month old baby.”

Yak Shaving Defined – Sometimes if feels like this all day long in software.

Organisational Behaviour

Performance Reviews Are Not Useful; Feedback Is – Personally I think performance reviews are something that human resources departments mandate; feedback is something that leaders give.

If Managers Don’t Give Performance Reviews, What Happens? – Well, as it turns out, a lot of good things start to happen.

Top 10 ways to ensure your best people will quit – some common mistakes; how many have you come across?

Testing and Test Driven Development

These next three links are related and if you read the first you should also read the second and third.

The Failures of “Intro to TDD” – Justin Searls rips into the current way of teaching test driven development.

The Domain Discontinuity – Bob Martin responds comprehensively but ends with why the issue is not about test driven development but wider issues such as architecture and domain design.

Commentary on ‘Roman Numerals Kata with Commentary’ – Ultimately you must understand your domain before trying to do test driven development.


Default Configuration of SQL Server – Like most software, out of the box SQL is configured for the most general case and may need extra tuning for specific workloads. Thomas gives a simple set of extra configuration changes and reasons why. Also love the quote “If you are working in a bank, they may not apply to you.”

Data Visualization

Announcing Power BI for Office 365 – In case you missed it, all the fancy new BI capabilities in the Microsoft cloud are publicly available now. Shame we are stuck using corporate infrastructure.

Famous Movie Quotes as Charts – A fun look at communication in chart form.

Ten Tips and Tricks for New Tableau users – A rather nausea inducing format but useful tips for making great Tableau dashboards.

Power Tools for Tableau – Desperate for some sort of an API with Tableau? This may be the answer.

Statistics and Data Analysis

Revolution Analytics – Want to run ‘R’ statistics against your Hadoop data? This seems to be the way to do it…

Learn R interactively with the swirl package – It looks like R is going to be an important tool for us so anything that makes it easier to learn is a bonus.

Learn Data Science Online with DataCamp – Similarly, learning data science online and interactively.

Analysis of Health Inspection Data using F# – Another great example of using F# (and D3) to analyse data quickly and easily.

Big Data

Big Data: The organizational challenge – Some interesting stats comparing companies with the best analytic capabilities vs. those that don’t.

Update on Stinger: the view from a Microsoft Committer – Stinger is the Hortonworks initiative for faster SQL queries against Hadoop. This article describes some of the recent performance gains.

How To Install Hadoop on Windows with HDP 2.0 – Get Hadoop running on Windows with a minimum of fuss. However, our local Hadoop expert recommends you only do this at home; in the enterprise just setup a proper development cluster.

How To Use Microsoft Excel to Visualize Hadoop Data – Tutorial for visualizing Hadoop data in Excel/PowerView, this one is for stock quotes.

How to Visualize Website Clickstream Data – Another Hadoop tutorial this time on web click-stream data.

50+ Open Source Tools for Big Data – I think one of the problems with open source is it littered with cute names that do little to describe software function so here is a useful list to help you distinguish the likes of Orient, Flock, Storm and others.

Building your own web analytics system using Big Data tools – Should you build these things yourself? What are the choices? Are there any risks?

Master Data the noun in Big Data sentences – I often talk about master data and spend more time worrying about dimension design than facts. It is useful to see how this applies to big data too.

You don’t have big data… – With all this talk of big data it is worth remembering that most use cases do not quality at big. Most likely you have ‘hot data’.


Mental Health Functional Architecture

Context is everything with architecture. I’ve often had conversations which started with the phrase “how come you didn’t…” – once the context is explained the decision is usually obvious.

The context for this architecture is purely my own since there are no real customers. I  want to satisfy the concerns and requirements as simply as possible but leave room to swap out parts of the architecture to investigate new approaches and technologies.

The diagram below is a pretty standard set of data warehouse components. If you are a traditional Microsoft guy then, from left to right, the components would be Integration Services, SQL Server, Integration Services again, SQL Server, Analysis Services and Excel or Reporting Services respectively. Alternatively you might be using Hadoop as the source mirror and Tableau for the data mart and consume components or some other combination.


I always try to set firewalls within an architecture so that any problems can be isolated and replaced without too much disruption. In this instance I’m going to use those firewalls so that I can try out new ideas and technologies.

The main synchronisation points are the three data stores – Source Mirror, Data Warehouse and Data Mart. (In this instance I am using the term data mart to mean a prepared, subject area specific store optimised for analytical/aggregate queries.)

The responsibilities for each stage are as follows:

  • Acquire > Source Mirror: receive data from source, ensure minimal load on source via high watermarks or another strategy, archive data for historical accuracy.  A key point here is that the source mirror has the same metadata as the source itself. No joins or transforms on the way. Sometimes simple data conversions are useful but less is more.
  • Load > Data Warehouse: apply business logic and transform the source data into dimensional model, update data warehouse.
  • Summarise > Data Mart: aggregate generation or cube processing.
  • Consume & Act: covers any output tool such as Reporting Services, Excel, Tableau Dashboard, R, F# etc.

I consider the SQL/relational store to be “the data warehouse” and not Analysis Services which is better suited to a data mart role.

It’s quite hard to be succinct when talking about architecture and this post is quite lengthy so I’ll split it and talk about risk driven architecture in the next post.