Jan 11, 2019

In search of lightning feedback loop in a large codebase

Hongbo Zhang
Compiler & Build System

Important: This is an archived blog post, kept for historical reasons. Please note that this information might be outdated.

Feedback loops

As software developers, when we touch a codebase, we want the edit-compile cycle, to be as short as possible. Studies show that feedback loop less than 1s, or 100ms will prevent devleopers getting distracted.

This seems to be possible in a small project, but as project grows to 10k files, or even more, 100k files in a large company like Google, Facebook, this seems to be extremely challenging.

In this article, we are talking about how BuckleScript - it supports both OCaml and ReasonML syntax - is trying to solve this issue.

The edit-build cycle in BuckleScript consists of two components: the compiler which does type checking and code generation and the scheduler which figures out what to rebuild and how to do it concurrently.

The importance of compiler's cold performance

In order to reduce the edit-build latency, some languages adopt an approach of in memory compiler + watch mode. We think this is not a scalable or reliable approach.

A compiler is a complex piece of software, the chance that a compiler has memory leak is not low. It is not observed in real world since compiler is used mostly in a short-lived setting, it starts up fast and dies off quickly.

However, this is decimated when compiler is put in server mode. When we have 10k or 100k files held in memory, it is very easy to observe OOM (out of memory issues).

We figured that to deliver a scalable and reliable system, it is better to decouple the compiler's complexity from the scheduler. When the compiler cold starts and dies off quickly, the operating system process mechanism serves as an obviously correct garbage collector, which increases the reliability of the whole system.

To reduce such latency in a single compiler's workload, we spent lots of time tweaking the performance of the compiler itself, for example, rewriting the hot path in C code, most BuckleScript compiler source code is written in an imperative C-style to avoid allocation.

To have a general idea of how fast BuckleScript's compiler runs:

test>cat fib.ml let rec fib = function | 0 | 1 -> 1 | n -> fib (n - 1) + fib (n - 2)
test>time /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-cmi -bs-cmj -c fib.ml real 0m0.008s user 0m0.004s sys 0m0.003s

Having a compiler run fast with a cold start lays the groundwork so that it won't be the bottleneck in the whole process. This is because when the architecture decouple the compiler from the scheduler, the compiler will be invoked by the scheduler hunderds of thousands of times in a build cycle, the latency of the compiler in a single compilation unit will add up.

The art of being incremental

Having a compiler running fast in cold mode does not solve the scalability issue alone. Take a code base which contains 100K source files for example, 100ms per file would result in 10,000s latency which is not acceptable. To solve such issue, we need reduce the workload as much as possible during each edit-build cycle.

In a statically typed language, suppose we have two compilation units A , B, and B depends on A. Whenever A changes, B gets recompiled, to make it worse, the recompilation propogates, all dependencies of B get recompiled, which will result in a snowball effect. In this model, whenever we touch the non-leaf compilation unit, the latency would be large.

The key observation here for BuckleScript is that B does not really depend on the last modified time of A, it depends on the intermediate output of A (e.g, .cmj and .cmi file) which may not change even if we modify A. What we work on here is to reduce the probability of changing A's intermediate output due to changing A. With the integration of a scheduler, it can help stop the propagation as early as possible.

In BuckleScript, each compilation unit is composed of two files, the implementation file and interface file which are compiled to intermediate output .cmj and .cmi separately.

The interface builds are completely separate from implemetation, it does not depend on .cmj (implementation intermediate output) at all.

Suppose the interface is not changed, whenever we touch the implementation file, BuckleScript designs the data structure of .cmj in a way that the content of .cmj is seldom changed. (Let's say its probability is 0.05 which is rare in cases when the arity of a function changed)

If neither .cmj nor .cmi gets changed, the scheduler would stop propagation.

Suppose the .cmj file is still changed (P = 0.05), A's dependency B would get recompiled. Note B.cmi only depends on A.cmi, so it will not get compiled, only B.cmj will get recompiled, its chances that B.cmj gets changed will be even lower. In practice, the probabiity of the length of propagation chain is more than two is less than

0.05 * 0.05 = 0.0025

This means when we are in an edit-build cycle, whenever A gets changed, it may have many direct dependencies, but the longest rebuild sequence will get settled in at most two compilation units. Suppose the scheduler schedule the tasks in parallel, the longest sequence is bound by two, which means the rebuild cycle is very close to compiling two compilation units.

In implementation, we also generalized the idea of stopping propogation of .cmi changes, so whenever we are adding some comments in the interface file, it will get settled quickly.

The worst case is that the root of an interface changed. In such case, it does not mean all its dependencies will get recompiled, it depends on how many dependencies' interface depend on the root's interface. The cool thing is that since interface's dependency chain is completely decoupled from implementation's dependency, actually it used to be a subset of implementation's dependency chain, so only a subset of its dependencies will get compiled. We will see a concrete an example later.

A fast scheduler

As we said, the time spent in an edit-build cycle is mostly composed of two parts: compiling invoking the compiler, scheduling.

We are reusing the very fast scheduler provided by Ninja

Where other build systems are high-level languages, Ninja aims to be an assembler.

BuckleScript outputs assembler style instructions consumed by Ninja which does scheduling very fast and provides good parallelism. By integrating with Ninja's restat attribute, we are able to implement the idea of a specialized content based build system.

It is fast enough for 99% of use cases. For a project less than 1k files, the time spent in an edit-build cycle is dominated by the compiler, but as project grows to 10K files or even more, the time spent in compiler is stable and bounded due to our incremental design, however, the time will start be dominated by the scheduler.

For a nop build around 10k files, it takes around 700ms for Ninja to figure out nothing needs to be rebuilt.

The current Ninja model is simple, every time it is invoked, it will re-read build.ninja instructions, check stats of artifacts and do the scheduling.

Instead of making a long-lived compiler, we propose to have a long-lived scheduler. The complexity of a scheduler is significantly lower than a compiler. Having an in-memory scheduler will help reducing redundant work such as parsing build.ninja instructions which is around of size 2M for 10k files. With the integration of watch mode, it does not need stat all artifacts each time, this should help increase the scalability of scheduler to 100K files or even more.

Tests on a synthetic benchmark

Our synthetic bench is borrowed from OMake and public available here.

The benchmark has these characteristics: The task is to build n^2 libraries with n^2 modules each (for a given small number n), and the dependencies between the modules are created in a way so that we can stress both the dependency analyzer of the build utility and the ability to run commands in parallel.

We modified the benchmark to add interface file for each implementation file.

The benchmark is running on MacBook Pro 18 with CPU 2.6 GHz Intel Core i7, Memory 32 GB 2400 MHz DDR4

The test is running against n = 3,5,7,9, where the source code size would be 2*3^4=162, 2*5^4=1250, 2*7^4=4802, 2*9**4=13122.

Below is what we get for a cold build from scratch:

Source sizeClean build (ms)
0162684
12505,100
480224,112
13122125,248
Source sizeNop build (ms)Touching root module (m_1_1_1_1.ml)Touching (m_1_1_1.mli)
0162165954
125079120133
4802266369367
13122728963962

We can see from the table as size grows, the time spent in edit-build cycle shifts from compilation to the scheduler, this is due to the fact that Ninja scheduler does not save the work for each test.

Source sizeAdding value to root module (m_1_1_1_1.ml)Changing root interface
01625670
1250131155
4802370428
13122969991

The verbose build log for adding values on source of size 13122 is as below:

SH
test9>ninja -C lib/bs -v ninja: Entering directory `lib/bs' [1/2] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -c -o src/dir_1_1/m_1_1_1_1.mlast -bs-syntax-only -bs-binary-ast /Users/hongbozhang/git/bsb-bench/test9/src/dir_1_1/m_1_1_1_1.ml [2/2] /usr/local/lib/node_modules/bs-platform/lib/bsb_helper.exe -g 0 -MD src/dir_1_1/m_1_1_1_1.mlast [1/5905] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I src/dir_4_6 -I src/dir_4_1 -I src/dir_4_8 -I src/dir_6_5 -I src/dir_6_2 -I src/dir_8_3 -I src/dir_8_4 -I src/dir_2_3 -I src/dir_2_4 -I src/dir_6_3 -I src/dir_4_9 -I src/dir_6_4 -I src/dir_4_7 -I src/dir_2_5 -I src/dir_2_2 -I src/dir_8_5 -I src/dir_8_2 -I src/dir_9_9 -I src/dir_3_7 -I src/dir_1_3 -I src/dir_9_7 -I src/dir_3_9 -I src/dir_1_4 -I src/dir_7_6 -I src/dir_7_1 -I src/dir_5_5 -I src/dir_7_8 -I src/dir_5_2 -I src/dir_3_8 -I src/dir_9_6 -I src/dir_1_5 -I src/dir_9_1 -I src/dir_1_2 -I src/dir_3_6 -I src/dir_9_8 -I src/dir_3_1 -I src/dir_5_3 -I src/dir_5_4 -I src/dir_7_9 -I src/dir_7_7 -I src/dir_8_7 -I src/dir_2_9 -I src/dir_8_9 -I src/dir_2_7 -I src/dir_4_2 -I src/dir_6_8 -I src/dir_4_5 -I src/dir_6_1 -I src/dir_6_6 -I src/dir_2_1 -I src/dir_2_6 -I src/dir_8_8 -I src/dir_8_1 -I src/dir_2_8 -I src/dir_8_6 -I src/dir_6_7 -I src/dir_6_9 -I src/dir_4_4 -I src/dir_4_3 -I src/dir_7_2 -I src/dir_7_5 -I src/dir_5_8 -I src/dir_5_1 -I src/dir_5_6 -I src/dir_1_9 -I src/dir_3_4 -I src/dir_3_3 -I src/dir_9_4 -I src/dir_1_7 -I src/dir_9_3 -I src/dir_5_7 -I src/dir_7_4 -I src/dir_5_9 -I src/dir_7_3 -I src/dir_9_2 -I src/dir_1_1 -I src/dir_9_5 -I src/dir_1_6 -I src/dir_3_2 -I src/dir_1_8 -I src/dir_3_5 -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -o src/dir_1_1/m_1_1_1_1.cmj -c src/dir_1_1/m_1_1_1_1.mlast File "/Users/hongbozhang/git/bsb-bench/test9/src/dir_1_1/m_1_1_1_1.ml", line 4, characters 4-5: Warning 32: unused value a.

Ignoring the preprocess stage, We can see there are 5906 jobs scheduled but only one job is processed. This is because adding a single value does not change m_1_1_1_1.cmj so that the change of propogation is stopped immediately.

The result is surprisingly good even if change the interface of root files, by looking at the verbose build log:

SH
test9>ninja -C lib/bs -v ninja: Entering directory `lib/bs' [1/2] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -c -o src/dir_1_1/m_1_1_1_1.mliast -bs-syntax-only -bs-binary-ast /Users/hongbozhang/git/bsb-bench/test9/src/dir_1_1/m_1_1_1_1.mli [2/2] /usr/local/lib/node_modules/bs-platform/lib/bsb_helper.exe -g 0 -MD src/dir_1_1/m_1_1_1_1.mliast [1/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-no-builtin-ppx-mli -bs-no-implicit-include -I src/dir_4_6 -I src/dir_4_1 -I src/dir_4_8 -I src/dir_6_5 -I src/dir_6_2 -I src/dir_8_3 -I src/dir_8_4 -I src/dir_2_3 -I src/dir_2_4 -I src/dir_6_3 -I src/dir_4_9 -I src/dir_6_4 -I src/dir_4_7 -I src/dir_2_5 -I src/dir_2_2 -I src/dir_8_5 -I src/dir_8_2 -I src/dir_9_9 -I src/dir_3_7 -I src/dir_1_3 -I src/dir_9_7 -I src/dir_3_9 -I src/dir_1_4 -I src/dir_7_6 -I src/dir_7_1 -I src/dir_5_5 -I src/dir_7_8 -I src/dir_5_2 -I src/dir_3_8 -I src/dir_9_6 -I src/dir_1_5 -I src/dir_9_1 -I src/dir_1_2 -I src/dir_3_6 -I src/dir_9_8 -I src/dir_3_1 -I src/dir_5_3 -I src/dir_5_4 -I src/dir_7_9 -I src/dir_7_7 -I src/dir_8_7 -I src/dir_2_9 -I src/dir_8_9 -I src/dir_2_7 -I src/dir_4_2 -I src/dir_6_8 -I src/dir_4_5 -I src/dir_6_1 -I src/dir_6_6 -I src/dir_2_1 -I src/dir_2_6 -I src/dir_8_8 -I src/dir_8_1 -I src/dir_2_8 -I src/dir_8_6 -I src/dir_6_7 -I src/dir_6_9 -I src/dir_4_4 -I src/dir_4_3 -I src/dir_7_2 -I src/dir_7_5 -I src/dir_5_8 -I src/dir_5_1 -I src/dir_5_6 -I src/dir_1_9 -I src/dir_3_4 -I src/dir_3_3 -I src/dir_9_4 -I src/dir_1_7 -I src/dir_9_3 -I src/dir_5_7 -I src/dir_7_4 -I src/dir_5_9 -I src/dir_7_3 -I src/dir_9_2 -I src/dir_1_1 -I src/dir_9_5 -I src/dir_1_6 -I src/dir_3_2 -I src/dir_1_8 -I src/dir_3_5 -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -o src/dir_1_1/m_1_1_1_1.cmi -c src/dir_1_1/m_1_1_1_1.mliast [2/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_1_1.cmj -c src/dir_1_1/m_1_1_1_1.mlast [3/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -w -30-40+6+7+27+32..39+44+45+101 -nostdlib -I '/Users/hongbozhang/git/bsb-bench/test9/node_modules/bs-platform/lib/ocaml' -color always -o src/dir_1_1/m_1_1_2_2.cmj -c src/dir_1_1/m_1_1_2_2.mlast [4/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_2_1.cmj -c src/dir_1_1/m_1_1_2_1.mlast [5/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_2_3.cmj -c src/dir_1_1/m_1_1_2_3.mlast [6/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_2_9.cmj -c src/dir_1_1/m_1_1_2_9.mlast [7/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_2_4.cmj -c src/dir_1_1/m_1_1_2_4.mlast [8/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_2_5.cmj -c src/dir_1_1/m_1_1_2_5.mlast [9/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_2_6.cmj -c src/dir_1_1/m_1_1_2_6.mlast [10/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_2_7.cmj -c src/dir_1_1/m_1_1_2_7.mlast [11/5906] /usr/local/lib/node_modules/bs-platform/lib/bsc.exe -bs-package-name test -bs-package-output commonjs:lib/js/src/dir_1_1 -bs-assume-has-mli -bs-no-builtin-ppx-ml -bs-no-implicit-include -I ... -color always -o src/dir_1_1/m_1_1_2_8.cmj -c src/dir_1_1/m_1_1_2_8.mlast

The log means that there are 5906 jobs to be scheduled but only 11 jobs processed. This is because there only 9 direct dependencies, the cmj/cmi changes' propogation stops in the first level of dependency, indirect dependencies are not propagated any more since the intermediate representation stays stable after that. There are 9 more jobs to do compared with changing implementation but the latency only increased by 30ms, which means the scheduler does a good job in parallelism.

The real world scenario may be a bit worse than our synthetic benchmark, since our syntethic benchmark only impose dependencies across implementation files, the dependencies across interface files are flat.

To conclude, currently BuckleScript scales very well to projects with around 10k files, and with some work in improving the scheduler, we believe it is not too difficult to scale it to 100K files or even more. To reach such enormous scalability in a reliable way, we propose to have a long-lived scheduler and fast cold compiler. The language interface design and the design of the data structure for the implementation will make the longest rebuild propogation chain bounded by two in most cases.

Want to read more?
Back to Overview