Better software through software architecture and devops

@jamessnape

The hidden effects of queuing on cycle time

"Queue for Coffee". You know they serve good coffee when you see this. A line of people queuing for coffee at a street vendor. Rusty's Market, Cairns Australia.

Queues are used to load balance work between teams but its easy to overlook a decrease in efficiency they bring.

Its fairly common knowledge that batching work causes inefficiencies. This is why a core technique for lean software delivery is map your value stream and then limit the amount of work in progress. I wanted to see just how much efficiency can be gained by removing queues so started to map this with an idea example.

Traditional Batched delivery

Imagine we have a fairly standard software delivery pipeline: specify, build, verify, and release. I’m being purposefully vague here as the actual work doesn’t matter. In a traditional waterfall style delivery you would generally work on the specify part first to create a big requirements document; chuck it over the wall to the next team to build; once they have done that the built code gets chucked at the next team to verify; and finally the finished product is released to live.

It not that obvious but the requirements document, the built code and the release package are queues. They contain a batch of work for the next stage.

Lets visualize this by creating an ideal delivery pipeline. Imagine we have 10 work items; all exactly the same size; so they all take exactly one day to complete at every stage. (I know this isn’t the case in reality but that would only makes things worse.) The delivery plan as work flows through the process is shown below:

DayBacklogSpecifyBuildVerifyReleaseDone
WaitingDoingDoneDoingDoneDoingDoneDoingDone
110
291
3811
4712
5613
6514
7415
8316
9217
10118
1119
1210
1391
14811
15712
16613
17514
18415
19316
20217
21118
2219
2310
2491
25811
26712
27613
28514
29415
30316
31217
32118
3319
3410
3510
3610
Total101001010010551

Each step in the above process has an active ‘Doing’ column and a waiting ‘Done’ queue. Since the output queue for one stage is the input queue for the next I leave the work items in the done queue until they can actively be worked on. This helps calculate the active time and waiting time metric for each process.

Since the release stage doesn’t involve processing items at the individual level you can see the whole package is released in one go.

If we add up the values in the columns and work out the active and waiting times and also calculate the average cycle time for this configuration we have a maximum work in progress of 10 and timings:

Days
Cycle Time29.5
Active Time31
Waiting Time255
Total Time286
Calendar Time36

So what if we don’t batch the work into a document?

Limiting WIP by Sequencing

This time the only thing we will change is to pass work to the next queue as soon as we have completed it. Everything else stays the same.

DayBacklogSpecifyBuildVerifyReleaseDone
WaitingDoingDoneDoingDoneDoingDoneDoingDone
110
291
3811
47111
561111
6511111
74111111
83111112
92111113
101111114
11111115
1211116
131117
14118
1519
1610
1710
1810
Total1010101010551

Immediately we can see the whole thing is far shorter. Calculating the metrics for a WIP of 1 we have:

Days
Cycle Time11.5
Active Time31
Waiting Time75
Total Time106
Calendar Time18

Note the active time is still 31 - its the same amount of effort as before but the whole thing is done in half the time because we removed 100 days of waiting time! Consequently the average cycle time has dropped by over half too. Remember nothing else changed apart from removing the batch.

I could stop here but this is a good point to look at further optimization.

Remove the Queues

The next simplest optimization is to remove the queues between stages. This is harder in the real world due to uneven work size and arrival rate but we can look at the theoretical flow.

DayBacklogSpecifyBuildVerifyReleaseReleaseDone
WaitingDoingDoingDoingWaitingDoneDone
110
291
3811
47111
561111
651112
741113
831114
921115
1011116
111117
12118
1319
1410
1510
1610
Total101010551

The stats are better still - active time is still 31 days but there is only waiting at the end of the process because the release time still need to deliver a bunch of work into live in one go. Cycle time is only 10 days.

Days
Cycle Time10
Active Time31
Waiting Time55
Total Time86
Calendar Time16

As this point the only optimization left is to ask if the release team can increase their deployment frequency. It could be possible since items are queuing at release.

Increase Release frequency

Whilst we could release every day like the very best DevOps practitioners lets assume there is a limit in the release process which means every other day is the best possible.

DayBacklogSpecifyBuildVerifyReleaseReleaseDone
WaitingDoingDoingDoingWaitingDoingDone
110
291
3811
47111
561111
651112
7411112
8311122
9211112
10111124
1111112
121126
13112
1428
152
1610
Total101010155

The stats for this configuration are interesting - for the first time the active time has gone up because we are asking the release team to do 4 more releases. Similarly, the calendar time remains the same because we are still limited by the time it takes to get the last work items released but there is far less waiting and the overall cycle time has dropped to just 5.5 days.

Days
Cycle Time5.5
Active Time35
Waiting Time15
Total Time50
Calendar Time16

Summary

This simple exercise demonstrates how hidden waiting time delays deliveries and smaller batch sizes can rectify that. In reality you will need some queuing between stages to balance out demand in this configuration. Look out for hidden queues:

  • Documents handed from one team to another - the entire document is a batch.
  • Pipelined iterations in scrum teams - where the requirements are written in the sprint before dev and the testing in the sprint after; the iteration backlog becomes the batch.
  • Done states between Doing states
  • Approval steps

If you practice Scrum then aim to complete work in a sprint (specify, develop, and verify) or you are leaving waiting time on the table.

Photo by David Clode on Unsplash

This entry was posted in agile  and tagged #metrics #lean #devops  on .
Discuss this on Twitter or LinkedIn