Only test changes
We are living in ‘survival of the fastest’ era. We don’t have time for anything. We prefer reading blogs instead of books and we look for tweets rather than lengthy press releases. So when it comes to testing a release that has only a few changes, we don’t have time to run all the tests.
The question but is: which subset of tests we should be running?
I have touched this subject in Test small vs. all, but looking at build change logs and picking up tests to run is a task that requires decision making. What if we can know the changes automatically and run tests based upon that?
That is possible through TIAMaps. No this term is not mine but part of it is. It originates from Microsoft’s concept of ‘Test Impact Analysis’ which I got to know from Martin Fowler’s this blog post. I’d recommend to read it first.
If you are lazier than me and couldn’t finish the whole blog, below is a summary along with a picture copied from there:
First you determine which pieces of your source code are touched upon by your tests and you store this information is some sort of maps. Then when your source code changes, you get the tests to run from the map and then just run those tests.
Below is a summary of TIAMap implementation in our project.
Why we needed it:
We didn’t do it for fun or due to “let’s do something shiny and new”. We are running out of time. Our unit tests suite has around six thousand tests and a complete run (yes, they run in parallel) takes about 20 minutes. Hmmm… a little change that needs to go has to go through 20 minutes of Unit test execution, that’s bad. Let’s see what others are doing. Oh yeah, Test Impact Analysis is the solution.
Generating TIA Maps
Code coverage comes to the rescue. If we already have a tool that finds out which lines of code are touched by all tests, can’t we have a list of source files that are touched by a single test?
So we configured a job that would run for tests and saves this simple map: test name -> source file names. There were two lessons that we learned:
- Initially, we had a job that would run for all 6,000 thousands and it was taking days. We became smarter and after generating first TIA Map for all tests, we only update maps for the tests that changed. We don’t have a way to find the test names that changed, but our job is based upon timestamp of files that have test code.
- We were storing the Map in a SQLite Db. As the Db had to pushed to our repository again and again, it was difficult to find deltas of change. We switched to simple text file to store the Map. Changes can be seen in our source control tools and anyone can look at those text files for any inspections.
As you can imagine that the hard part is to get those TIAMaps. Once we have them, we now do the following:
- When there is a need to run tests, we determine which source files have changed since the last run.
- We have a Python script that does the magic of consulting the maps and getting a list of tests to be executed.
- We feed that list of tests to our existing test execution program.
How is it going?
It is early to say that as we have rolled this as pilot and I may have more insights into the results in few months. But the initial feedback is indicative of us being on the right path. Time is being saved big time and we are looking for any issues that may arise due to faulty maps or execution logic.
Have you ever tried anything similar? Or would you like to try it out?