Better software through software architecture and devops

@jamessnape

Tag Archives: #metrics

  • "Queue for Coffee". You know they serve good coffee when you see this. A line of people queuing for coffee at a street vendor. Rusty's Market, Cairns Australia.

    Queues are used to load balance work between teams but its easy to overlook a decrease in efficiency they bring.

    Its fairly common knowledge that batching work causes inefficiencies. This is why a core technique for lean software delivery is map your value stream and then limit the amount of work in progress. I wanted to see just how much efficiency can be gained by removing queues so started to map this with an idea example.

    Traditional Batched delivery

    Imagine we have a fairly standard software delivery pipeline: specify, build, verify, and release. I’m being purposefully vague here as the actual work doesn’t matter. In a traditional waterfall style delivery you would generally work on the specify part first to create a big requirements document; chuck it over the wall to the next team to build; once they have done that the built code gets chucked at the next team to verify; and finally the finished product is released to live.

    It not that obvious but the requirements document, the built code and the release package are queues. They contain a batch of work for the next stage.

    Lets visualize this by creating an ideal delivery pipeline. Imagine we have 10 work items; all exactly the same size; so they all take exactly one day to complete at every stage. (I know this isn’t the case in reality but that would only makes things worse.) The delivery plan as work flows through the process is shown below:

    DayBacklogSpecifyBuildVerifyReleaseDone
    WaitingDoingDoneDoingDoneDoingDoneDoingDone
    110
    291
    3811
    4712
    5613
    6514
    7415
    8316
    9217
    10118
    1119
    1210
    1391
    14811
    15712
    16613
    17514
    18415
    19316
    20217
    21118
    2219
    2310
    2491
    25811
    26712
    27613
    28514
    29415
    30316
    31217
    32118
    3319
    3410
    3510
    3610
    Total101001010010551

    Each step in the above process has an active ‘Doing’ column and a waiting ‘Done’ queue. Since the output queue for one stage is the input queue for the next I leave the work items in the done queue until they can actively be worked on. This helps calculate the active time and waiting time metric for each process.

    Since the release stage doesn’t involve processing items at the individual level you can see the whole package is released in one go.

    If we add up the values in the columns and work out the active and waiting times and also calculate the average cycle time for this configuration we have a maximum work in progress of 10 and timings:

    Days
    Cycle Time29.5
    Active Time31
    Waiting Time255
    Total Time286
    Calendar Time36

    So what if we don’t batch the work into a document?

    Limiting WIP by Sequencing

    This time the only thing we will change is to pass work to the next queue as soon as we have completed it. Everything else stays the same.

    DayBacklogSpecifyBuildVerifyReleaseDone
    WaitingDoingDoneDoingDoneDoingDoneDoingDone
    110
    291
    3811
    47111
    561111
    6511111
    74111111
    83111112
    92111113
    101111114
    11111115
    1211116
    131117
    14118
    1519
    1610
    1710
    1810
    Total1010101010551

    Immediately we can see the whole thing is far shorter. Calculating the metrics for a WIP of 1 we have:

    Days
    Cycle Time11.5
    Active Time31
    Waiting Time75
    Total Time106
    Calendar Time18

    Note the active time is still 31 - its the same amount of effort as before but the whole thing is done in half the time because we removed 100 days of waiting time! Consequently the average cycle time has dropped by over half too. Remember nothing else changed apart from removing the batch.

    I could stop here but this is a good point to look at further optimization.

    Remove the Queues

    The next simplest optimization is to remove the queues between stages. This is harder in the real world due to uneven work size and arrival rate but we can look at the theoretical flow.

    DayBacklogSpecifyBuildVerifyReleaseReleaseDone
    WaitingDoingDoingDoingWaitingDoneDone
    110
    291
    3811
    47111
    561111
    651112
    741113
    831114
    921115
    1011116
    111117
    12118
    1319
    1410
    1510
    1610
    Total101010551

    The stats are better still - active time is still 31 days but there is only waiting at the end of the process because the release time still need to deliver a bunch of work into live in one go. Cycle time is only 10 days.

    Days
    Cycle Time10
    Active Time31
    Waiting Time55
    Total Time86
    Calendar Time16

    As this point the only optimization left is to ask if the release team can increase their deployment frequency. It could be possible since items are queuing at release.

    Increase Release frequency

    Whilst we could release every day like the very best DevOps practitioners lets assume there is a limit in the release process which means every other day is the best possible.

    DayBacklogSpecifyBuildVerifyReleaseReleaseDone
    WaitingDoingDoingDoingWaitingDoingDone
    110
    291
    3811
    47111
    561111
    651112
    7411112
    8311122
    9211112
    10111124
    1111112
    121126
    13112
    1428
    152
    1610
    Total101010155

    The stats for this configuration are interesting - for the first time the active time has gone up because we are asking the release team to do 4 more releases. Similarly, the calendar time remains the same because we are still limited by the time it takes to get the last work items released but there is far less waiting and the overall cycle time has dropped to just 5.5 days.

    Days
    Cycle Time5.5
    Active Time35
    Waiting Time15
    Total Time50
    Calendar Time16

    Summary

    This simple exercise demonstrates how hidden waiting time delays deliveries and smaller batch sizes can rectify that. In reality you will need some queuing between stages to balance out demand in this configuration. Look out for hidden queues:

    • Documents handed from one team to another - the entire document is a batch.
    • Pipelined iterations in scrum teams - where the requirements are written in the sprint before dev and the testing in the sprint after; the iteration backlog becomes the batch.
    • Done states between Doing states
    • Approval steps

    If you practice Scrum then aim to complete work in a sprint (specify, develop, and verify) or you are leaving waiting time on the table.

    Photo by David Clode on Unsplash

    This entry was posted in agile  and tagged #metrics #lean #devops  on .
    Discuss this on Twitter or LinkedIn
  • Code on a laptop screen with dark theme

    Metrics are only useful if they help you improve. Code coverage KPIs are most often circumvented. Liberal use of the ExcludeFromCodeCoverage attribute is to be avoided.

    Abuse of the ExcludeFromCodeCoverage attribute

    I once worked on project that had a mandatory code coverage target. If your commit didn’t maintain the overall coverage ratio of 75% then it was rejected. The reasoning came from good intentions; a high code coverage is good therefore we will create the mandate that is must be high. It had some unfortunate side effects though. The first unintended consequence is that developers only wrote enough tests to keep the value above the target instead of considering how they needed to test their code. The second consequence was the proliferation of [ExcludeFromCodeCoverage] attributes adoring all sorts of classes.

    The attribute was originally designed for generated code but more recently I’ve seen it be applied where the code that is hard to test or too simple to test. Ultimately this hides code from testing metrics. Your code coverage metric is no longer accurate nor useful.

    Yeah, we are at 85% code coverage ignoring the code we excluded because it was hard to test.

    My immediate response to this is “How come it isn’t 100% then?”

    I would much rather have a lower, but accurate, code coverage metric so I consider the use of this attribute harmful on any code that isn’t generated.

    I wish code analysis tools like SonarCube or the Roslyn analyzers treated this attribute like SuppressMessageAttribute - the Justification property should be filled out when applied. Better still, just fix the issue and avoid the need to use either of them.

    Why should I test properties?

    The consensus on Should you Unit Test simple properties? seems pretty much for testing and you can use the examples here to test yours. I prefer property based testing though because it can find edge cases you didn’t think of. There is a great intro at Property-Based Testing with C# using FsCheck. Your code will effectively look like:

    [Property]
    public Property Set_Then_Get_Returns_Same(string exampleValue)
    {
        var target = new ClassYouWantToTest();
        target.PropertyToTest = exampleValue;
        return (target.PropertyToTest == exampleValue).ToProperty();
    }
    

    FsCheck will generate a bunch of random values to try this test with so these 4 lines of code are resulting in hundreds of unique tests for this property including, for this example: blank string, null value, very short, very long, non-printing, accent characters, etc.

    Why should I test … something else?

    I will come back and add more examples as I encounter them.

    Summary

    • Code coverage metrics are only useful if they help improve the software quality.
    • Mandatory targets can lead to harmful practices such as excluding code from testing or writing superficial tests.
    • ExcludeFromCodeCoverage attributes should be treated in the same way as SuppressMessage - they hide warnings that should really be fixed.

    Photo by Luca Bravo on Unsplash

    This entry was posted in code  and tagged #metrics #testing #csharp  on .
    Discuss this on Twitter or LinkedIn