Tag Archives: testing

Interesting Links #4

These seem to get longer and longer. A whole pile of links for you.

Management and Organisational Behaviour

How Serving Is Your Leadership? – Who is working for who here?

Be a Manager – “The only reason there’s so many awful managers is that good people like you refuse to do the job.”

I’m the Boss! Why Should I Care If You Like Me? – Because your team will be more productive… Here are some pointers.

Software Development

Technical debt 101 – Do you think you know what technical debt is and how to tackle it? Even so I’m sure this article has more you can discover and learn. A must read.

Heisenberg Developers – So true. In fact this hits a little close to home since we use JIRA, the bug tracking tool mentioned in the article.

What is Defensive Coding? – Many think that defensive coding is just making sure you handle errors correctly but that is a small part of the process.

Need to Learn More about the Work You’re Doing? Spike It! – So you are an agile shop, your boss is demanding some story estimates and you have no idea how complex the piece of work is because it’s completely new. What do you do?

Software Development with Feature Toggles – Don’t branch, toggle instead.

Agile practices roundup – here are a number of articles I’ve found useful recently:

How to review a merge commit– Phil dives into the misunderstood world of merge commits and reviews. Also see this list of things to look out for during code reviews.

Functional Programming

Don’t Be Scared Of Functional Programming – A good introduction to functional programming concepts using JavaScript as the demonstration language.

Seamlessly integrating T-SQL and F# in the same code – The latest version of FSharp.Data allows you to write syntax checked SQL directly in your F# source and it executes as fast as Dapper.

Railway Oriented Programming – This is a functional technique but I’ve recently been using it in C# when I needed to process many items in a sequence, any of which could fail and I want to collect all the errors up for reporting back to ops. It is harder to do in C# since there are no discriminated unions but a custom wrapper class is enough.

Erlang and code style – A different language this time, Erlang. How easy is programming when you don’t have to code defensively and crashing is the preferred way of handling errors.

Twenty six low-risk ways to use F# at work – Some great ways to get into F# programming without risking your current project.

A proposal for a new C# syntax – A lovely way to look at writing C# using a familiar but lighter weight syntax. C#6 have some of these features planned but this goes further. Do check out the link at the end of the final proposal.

Excel-DNA: Three Stories – Integrating F# into Excel – a data analysts dream…

Data Warehousing

Signs your Data Warehouse is Heading for the Boneyard – Some interesting things to look out for if you hold the purse strings to a data warehouse project. How many have you seen before?

The 3 Big Lies of Data – I’ve heard these three lies over and over from business users and technology vendors alike. Who is kidding who?

Six things I wish we had known about scaling – Not specifically about data warehouses but these are all issues we see on a regular basis.

Why Hadoop Only Solves a Third of the Growing Pains for Big Data – You can’t just go and install a Hadoop cluster. There is more to it than that.

Microsoft Azure Machine Learning – Finally it looks like we can have a simple way of doing cloud scale data mining.

Data Visualization

5 Tips to Good Vizzin’ – So many visualizations break these rules.

Five indicators you aren’t using Tableau to its full potential – I’ve seen a few of these recently – tables anyone?

Create a default Tableau Template – Should save some time when you have a pile of dashboards to create.

Building a Tableau Center of Excellence – It is so easy to misunderstand Tableau which is not helped by a very effective sales team. This article has some great advice for introducing Tableau into your organisation.

Beginner’s guide to R: Painless data visualization – Some simple R data visualization tips.

Visualizing Data with D3 – If you need complete control over your visualization then D3 is just what you need. It can be pretty low-level but its easy to produce some amazing stuff with a bit of JavaScript programming.

Testing

I Don’t Have Time for Unit Testing – I’ve recently been guilt of this myself so I like to keep a reminder around – you will go faster if you write tests.

Property Based Testing with FsCheck – FsCheck is a fantastic tool primarily used in testing F# code but there is no reason it can’t be used with C# too. It generates automated test cases to explore test boundaries. I love the concise nature of F# test code too especially with proper sentences for test names.

Analysis Services

I’ve collected a lot of useful links for Analysis Services, both tabular and multidimensional:

DAX Patterns website – This website is my go-to resource for writing DAX calculations. These two are particularly useful:

Using Tabular Models in a Large-scale Commercial Solution – Experiences of SSAS tabular in a large solution. Some tips, tricks and things to avoid.

Also:

Interesting Links #2

January was a long month so I’ve got quite a list for you. I may consider doing these more often if readers think there are too many items for a single list.

Governance

Self-Service Business Intelligence Governance – Essential reading/watching for anyone planning to deliver self-service business intelligence.

Five Stages of Data Grief – we’ve all been through this, “If you don’t think you have a quality problem with your data you haven’t looked at it yet”.

Functional Programming

Maybe that shouldn’t be settable – Bringing some of the F# Option type goodness into a C# world.

Software Process

Five Tips to Get Your Organisation Releasing Software Frequently – my team score well on these but culturally I can see some being quite difficult to implement, particularly around the devops style organisation of teams.

Pairing vs. Code Review: Comparing Developer Cultures – pros and cons for each style of quality culture. Which, if any, is best?

Is Agile BI Really a Better Mousetrap? – A great article on the benefits of agile BI. This really appeals due to its use of development process business intelligence – measure and optimise just like we preach to our customers.

Using Vertical Slicing and Estimation to make Business Decisions at Adobe – A good look at the release planning process at Adobe with some nice techniques discussed.

Personal Development

Of Orcs and Software Craftsmanship – Best quote of the month if you are a parent: “These are the types of error messages that make debugging a software like debugging a 2 month old baby.”

Yak Shaving Defined – Sometimes if feels like this all day long in software.

Organisational Behaviour

Performance Reviews Are Not Useful; Feedback Is – Personally I think performance reviews are something that human resources departments mandate; feedback is something that leaders give.

If Managers Don’t Give Performance Reviews, What Happens? – Well, as it turns out, a lot of good things start to happen.

Top 10 ways to ensure your best people will quit – some common mistakes; how many have you come across?

Testing and Test Driven Development

These next three links are related and if you read the first you should also read the second and third.

The Failures of “Intro to TDD” – Justin Searls rips into the current way of teaching test driven development.

The Domain Discontinuity – Bob Martin responds comprehensively but ends with why the issue is not about test driven development but wider issues such as architecture and domain design.

Commentary on ‘Roman Numerals Kata with Commentary’ – Ultimately you must understand your domain before trying to do test driven development.

Databases

Default Configuration of SQL Server – Like most software, out of the box SQL is configured for the most general case and may need extra tuning for specific workloads. Thomas gives a simple set of extra configuration changes and reasons why. Also love the quote “If you are working in a bank, they may not apply to you.”

Data Visualization

Announcing Power BI for Office 365 – In case you missed it, all the fancy new BI capabilities in the Microsoft cloud are publicly available now. Shame we are stuck using corporate infrastructure.

Famous Movie Quotes as Charts – A fun look at communication in chart form.

Ten Tips and Tricks for New Tableau users – A rather nausea inducing format but useful tips for making great Tableau dashboards.

Power Tools for Tableau – Desperate for some sort of an API with Tableau? This may be the answer.

Statistics and Data Analysis

Revolution Analytics – Want to run ‘R’ statistics against your Hadoop data? This seems to be the way to do it…

Learn R interactively with the swirl package – It looks like R is going to be an important tool for us so anything that makes it easier to learn is a bonus.

Learn Data Science Online with DataCamp – Similarly, learning data science online and interactively.

Analysis of Health Inspection Data using F# – Another great example of using F# (and D3) to analyse data quickly and easily.

Big Data

Big Data: The organizational challenge – Some interesting stats comparing companies with the best analytic capabilities vs. those that don’t.

Update on Stinger: the view from a Microsoft Committer – Stinger is the Hortonworks initiative for faster SQL queries against Hadoop. This article describes some of the recent performance gains.

How To Install Hadoop on Windows with HDP 2.0 – Get Hadoop running on Windows with a minimum of fuss. However, our local Hadoop expert recommends you only do this at home; in the enterprise just setup a proper development cluster.

How To Use Microsoft Excel to Visualize Hadoop Data – Tutorial for visualizing Hadoop data in Excel/PowerView, this one is for stock quotes.

How to Visualize Website Clickstream Data – Another Hadoop tutorial this time on web click-stream data.

50+ Open Source Tools for Big Data – I think one of the problems with open source is it littered with cute names that do little to describe software function so here is a useful list to help you distinguish the likes of Orient, Flock, Storm and others.

Building your own web analytics system using Big Data tools – Should you build these things yourself? What are the choices? Are there any risks?

Master Data the noun in Big Data sentences – I often talk about master data and spend more time worrying about dimension design than facts. It is useful to see how this applies to big data too.

You don’t have big data… – With all this talk of big data it is worth remembering that most use cases do not quality at big. Most likely you have ‘hot data’.

 

Interesting Links #1

Since I manage to read so much on the train I think readers will find some of the articles useful so I plan on listing up the best ones each month.

Business Intelligence

Databases

Code

Testing

Development Process

Personal Development

Organisational Behaviour

  • The Open-Office Trap – New Yorker article rounding up all the research done one open space workplace productivity. Some interesting results among the expected ones.
  • Can-Do vs. Can’t-Do Culture – “The trouble with innovation is that truly innovative ideas often look like bad ideas at the time.” Next time you are thinking why something won’t work, take a moment to consider if you are stopping innovation.
  • Don’t interrupt developers – Absolutely nails why you should not interrupt developers.
  • Are Your Programmers Working Hard, Or Are They Lazy? – “the appearance of hard work is often an indication of failure” – a must read for both developers and managers.

A behaviour driven testing framework for batch processing systems

Recently I’ve been working on a testing framework to support testing of batch systems such as data warehouses.

The framework is called ‘posh-gwen‘ due to the three behaviour driven methods Given, When, and thEN. The first version is on github at: https://github.com/jsnape/posh-gwen. Comments, suggestions and pull requests are welcome.

So why should you care about using this framework?

It is difficult to test batch systems using modern test frameworks such as Specflow or FitNesse because of the simple rule that good tests should be isolated from one another. All these frameworks run tests in sequence:

  • do something
  • check something
  • clean up
  • move on to the next test

For this to be successful each test has to run very fast. Most batch processing systems are optimised for bulk processing of data. They may take tens of seconds to run end to end even with a single row of data so running hundreds of tests independently can take hours.

This framework is designed to break the rule of sequential test execution. All tests are run in parallel by phase.

The best way to test batch processing is for a known input data to contain many test cases. The batch is run loading all data at once. Finally a number of queries are executed against the resulting system. So for example a data warehouse might load a number of source files using an ETL framework such as SQL Server Integration Services. Once loaded the data warehouse can be queried to check that expected values exist in the final system.

It is still important to make sure that each test is isolated from others or else changes in one might cause a number of others to fail or become invalid.

We can do this for batch processing by data isolation – that is to carve up data domains in a way that only a single test uses data from that domain. Then verification of the results force the query to execute against that test specific sub-domain.

There are a number of suitable domains to use but any with high cardinality are best:

  • Dates – each day is a single test (or blocks of days, weeks, years etc. for those tests that need to span days).
  • Transaction identifiers – use a map of IDs to test cases or in the case of strings prefix the transaction id with the test case number.
  • Business keys – for entities such as customer or product there is usually an ID field used as the business key; use the same methods as transaction identifiers.
  • Custom attributes – if none of the above will work then you might consider adding an extra attribute to the source data which is passed through the batch system. Obviously this is not a preferred solution single you will have to change your system.
  • Combinations of the above – sometimes depending on where you need to validate you might need multiple solutions.

Go try it out and let me know how it goes. I plan on adding more features over the coming months.