The build is broken! How much time do we waste?

Continuous Integration (CI) encourages frequent code commit as opposed to running extensive QA process before merging the work to the trunk. I like Continuous Integration because the code changes are visible to everyone immediately. If someone makes a mistake, we can detect it before the problem is buried deep in the source code history.

However, one of my colleagues has a concern.

When someone breaks a build, developers who happen to download the broken source tree are blocked to work. It’s waste of engineering resource.

That’s right, but I wonder how much time we actually waste.

Suppose each developer, on average, breaks the build 30 minutes per week and we have five developers in the team. The average downtime of the build is 0.5 \times 5 = 2.5 (hours/week). Also suppose each developer downloads the source tree 4 times a day, which is 100 times per week as 4 \times 5 (developers) \times 5 (days) = 100 (gets/week).

Probability that no download operation hits the broken build period (= nobody is blocked to work) should follow Bernoulli trial.

\binom{n}{k}p^k(1-p)^{(n-k)}

where k=0, p = \frac{2.5 (hours)}{40 (hours)}, and n = 100, thus

P_0 = \binom{100}{0}(\frac{2.5}{40})^0(1-\frac{2.5}{40})^{(100-0)} = 0.00157

Similarly, the probability that only one download operation hits the broken build period is:

P_1 = \binom{100}{1}(\frac{2.5}{40})^1(1-\frac{2.5}{40})^{(100-1)} = 0.01050

And so on.

Now, suppose a developer has to wait for 30 minutes before the broken build is fixed. The expected blocking period when only one download operation hits the broken build period is P_1 \times 0.5 (hours) \times 1 (time), which is

\binom{100}{1}(\frac{2.5}{40})^1(1-\frac{2.5}{40})^{(100-1)} \times 0.5 \times 1 = 0.00525 (hours)

Similarly, the expected blocking period when just two download operations hit the broken build period is:

\binom{100}{2}(\frac{2.5}{40})^2(1-\frac{2.5}{40})^{(100-2)} \times 0.5 \times 2 = 0.0346 (hours)

And so on.

The total expected blocking period per week should be

\sum_{i=0}^{100}{\left(\binom{100}{i}(\frac{2.5}{40})^i(1-\frac{2.5}{40})^{(100-i)} \times 0.5 \times i\right)}=3.13 (hours)

Therefore, with the given parameter, we wastes 3.13 hours per week. As our total engineering hours is 5 (developers) \times 40 (hours) = 200 (hours), the waste ratio is about 3.13 / 200 = 0.0156, which is 1.56%.

The following graph shows the expected waste time ratio for various team size. Interestingly, it’s linear relations.

Considering the benefit of CI, I would say less than 5% of blocking time is acceptable. According to the graph, therefore, it’s not something we should worry about until the team size becomes more than 16 developers.

I used the following R script for the calculation.

brokenPeriod  <- 0.5      # Build broken peirod = 30 min per week
devCount <- 1:50          # number of developers
downloadPerWeek <- 4 * 5  # Each developer refresh the source tree four times a day

wasteTimeF <- function(numOfDevs){
  blockedDownload <- 1:(numOfDevs * downloadPerWeek)
  totalDownload <- numOfDevs * downloadPerWeek
  p <- (brokenPeriod * numOfDevs) / 40
  
  sum(dbinom(blockedDownload, totalDownload, p) * (blockedDownload * brokenPeriod))
}

wasteRatioF <- function(numOfDevs){
  wasteTimeF(numOfDevs) / (numOfDevs * 40)
}

wasteRatio <- sapply(devCount, wasteRatioF) 
Advertisements

About Moto

Engineer who likes coding
This entry was posted in Algorithm, Statistics and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s