To have constant heartbeat of release, testing has to take the central stage. It can no more be an activity that is performed at the end of release cycle rather it has to happen during all the phases of a release. And mind you release cycle has shrunk from months to days.
In “Effective DevOps” book, the authors lay out many plans for making an effective plan to move towards a DevOps culture. On automation, it suggests that automation tools belong to either of the following three types:
- Scheduled Automation: This tool runs on a predefined schedule.
- Triggered Automation: This tool runs when a specific event happens…
- On-Demand Automation: This tool is run by the user, often on the command line…
(Page 185 under Test and Build Automation section)
The way, we took upon this advice to ramp up our efforts for Continuous Testing is that each Testing that we perform should be available in all three forms:
- Scheduled Testing: Daily or hourly tests to ensure that changes during that time were merged successfully. There are no disruptions by the recent changes.
- Triggered Testing: Testing that gets triggered on some action. For example a CI job that runs tests which was executed due to push by a Developer.
- On-Demand Testing: Testing that is executed on a needs basis. A quick run of tests to find out how things are on a certain front.
Take Performance testing for example. It should be scheduled to find issues on daily or weekly basis, but it could also be triggered as part of release criteria or it could be run On-Demand by an individual Developer on her box.
In order to achieve this, we re-defined our testing jobs to allow all three options at once. As the idea was to use Tools in this way, we picked upon Jenkins.
There are other options too like GoCD and Microsoft Team Foundation Server (TFS) is also catching up but Jenkins has the largest set of available plugins to do a variety of tasks. Also our prime use case was to use Jenkins as Automation Server and we have yet to define delivery pipelines.
(the original icon is at: http://www.iconarchive.com/show/plump-icons-by-zerode/Document-scheduled-tasks-icon.html )
I’ll write separately on Triggered and On-Demand testing soon and now getting into some details on how we accomplished Scheduled Testing below.
We had few physical and virtual machines, on which we were using Windows Task Scheduler to run tasks. That task will kick off on a given day and time, and would trigger Python script. The Python scripts were in a local Mercurial repository based in one of these boxes.
The testing jobs were Scheduled perfectly but the schedule and outcome of these jobs were not known to the rest of the team. Only testing knew when these jobs run and whether last job was successful or not.
We made on of the boxes as Jenkins Master and others as slaves. We configured Jenkins jobs and defined the schedules there. We also moved all our Python scripts to a shared Mercurial repository on a server that anyone could get. We also created custom parts into our home grown Build system that allows running pieces in series or parallel.
Given that Jenkins gives you a web view which can be accessed by all, the testing schedule became public to everyone. Though we had a “Testing Dashboard” but it was an effort to keep it up-to-date. Also anyone in the team could see how was the last few jobs of say Performance testing and what were the results.
Moving to Jenkins and making our scripts public also helped us make same set of tests Triggered and On-Demand. More details on coming posts so as how.
I wish I could show a “Before” and “After” pictures that many marketing campaigns do to show how beautiful it now looks like.
Do you have Scheduled testing in place? What tools you use and what policies you apply?
We are living in ‘survival of the fastest’ era. We don’t have time for anything. We prefer reading blogs instead of books and we look for tweets rather than lengthy press releases. So when it comes to testing a release that has only a few changes, we don’t have time to run all the tests.
The question but is: which subset of tests we should be running?
I have touched this subject in Test small vs. all, but looking at build change logs and picking up tests to run is a task that requires decision making. What if we can know the changes automatically and run tests based upon that?
That is possible through TIAMaps. No this term is not mine but part of it is. It originates from Microsoft’s concept of ‘Test Impact Analysis’ which I got to know from Martin Fowler’s this blog post. I’d recommend to read it first.
If you are lazier than me and couldn’t finish the whole blog, below is a summary along with a picture copied from there:
First you determine which pieces of your source code are touched upon by your tests and you store this information is some sort of maps. Then when your source code changes, you get the tests to run from the map and then just run those tests.
Below is a summary of TIAMap implementation in our project.
Why we needed it:
We didn’t do it for fun or due to “let’s do something shiny and new”. We are running out of time. Our unit tests suite has around six thousand tests and a complete run (yes, they run in parallel) takes about 20 minutes. Hmmm… a little change that needs to go has to go through 20 minutes of Unit test execution, that’s bad. Let’s see what others are doing. Oh yeah, Test Impact Analysis is the solution.
Generating TIA Maps
Code coverage comes to the rescue. If we already have a tool that finds out which lines of code are touched by all tests, can’t we have a list of source files that are touched by a single test?
So we configured a job that would run for tests and saves this simple map: test name -> source file names. There were two lessons that we learned:
- Initially, we had a job that would run for all 6,000 thousands and it was taking days. We became smarter and after generating first TIA Map for all tests, we only update maps for the tests that changed. We don’t have a way to find the test names that changed, but our job is based upon timestamp of files that have test code.
- We were storing the Map in a SQLite Db. As the Db had to pushed to our repository again and again, it was difficult to find deltas of change. We switched to simple text file to store the Map. Changes can be seen in our source control tools and anyone can look at those text files for any inspections.
As you can imagine that the hard part is to get those TIAMaps. Once we have them, we now do the following:
- When there is a need to run tests, we determine which source files have changed since the last run.
- We have a Python script that does the magic of consulting the maps and getting a list of tests to be executed.
- We feed that list of tests to our existing test execution program.
How is it going?
It is early to say that as we have rolled this as pilot and I may have more insights into the results in few months. But the initial feedback is indicative of us being on the right path. Time is being saved big time and we are looking for any issues that may arise due to faulty maps or execution logic.
Have you ever tried anything similar? Or would you like to try it out?
The only way to improve yourself and your craft is to reflect on the sate of affairs. The #StateOfTesting Survey gives us exactly that opportunity where testers from across the globe give their valuable feedback and then gain value from this collective wisdom.
I have been taking part in the survey for sometime and I see that not much testers from Pakistan are doing that. Given that our community is on the rise and we had our first ever testing conference, it’s time to get in touch with testing community of the world!
The link to the survey is: http://qablog.practitest.com/state-of-testing/
Thanks for the help and let’s make (testing) world better than today!
Code Coverage is a good indicator of how well your tests are covering your code base. It is done through tools through which you can run your existing set of tests and then get reports on code coverage with file level, statement or line level, class and function level and branch level details.
Since the code coverage report gives you a number, the game of numbers kicks in just like any other number game. If you set hard targets, people would like to get it, and at times a number means nothing. Here are my opinions based upon experience on how to best use Code Coverage in your project:
Do run code coverage every now and then to guide new unit test development
It’s worth running coverage tools every so often and looking at these bits of untested code. Do they worry you that they aren’t being tested?
The question is not to get the list of untested code base; the question is whether we should write tests for that untested code base?
In our project, we measure code coverage for functions and spit out list of functions that are not tested. The testing team then do not order the developers to write tests against them. The testing team simply suggests to write tests and the owner of that code prioritize that task based upon how critical is that piece and how often that is requested by the Users.
Dorothy Graham suggests in this excellent talk that coverage can be either like “butter” or like “strawberry jam” on your bread. You decide if you want “butter’ like coverage i.e. cover all areas or you want “strawberry jam” coverage i.e. cover some areas more in depth.
Do not set a target of 100% code coverage
Setting up a coverage goal is in itself disputed and is often misused as Brian Marick notes in this paper which has been foundation of any Code Coverage work thereafter. Also anything that claims 100% is suspicious e.g. consider following statements:
- We can’t ship unless we have 100% code coverage
- We want 100% reported defects to be addressed in this release
- We want 100% tests to be executed in each build.
You can easily see that a 100% code coverage gives you “Test all Fallacy” to imply that we can test it all. Brian suggests in the same paper that 90 or 95% coverage is good enough.
We have set a target of 90% function coverage but it is not mandatory for release. We provide this information on the table along with other testing data like results of tests, occurrence of bugs per area etc. and then leave the decision to ship on the person who is responsible. Remember, the job of testing is to provide information not make release decisions.
Yes, there is no simple answer to how much code coverage we need. Read this for amusement and know why we get different answers to this question.
Do some analysis on the code coverage numbers
As numbers can mean different things to different people, so we need to ask stakeholders why they need code coverage numbers and what they mean when they want to be covered.
We asked this question, got the answer which is to do a test heat analysis on our code coverage numbers. It gives us following information:
- Which pieces are hard to be automated? Or easy to be automated
- Which pieces are to be tested next? (as stated in first Do)
- Which pieces need more manual testing?
- How much effort is needed for Unit testing?
Do use tools
There are language and technology specific tools. For our C++ API, we have successfully used Coverage Validator (licensed but very rightly priced) and OpenCppCoverage (free tool) that extract info by executing GoogleTest tests.
Do not assume coverage as tested well
You can easily write a test to cover each function or each statement, without testing it well or even without testing it at all.
Along with our function wise code coverage that I mentioned above, we have a strong code review policy which includes reviewing the test code. Also we write many scenario level tests that do not add to the coverage but cover the workflows (or the orders in which functions will be called) which are more important to our users.
Brian summarizes it nicely in the before mentioned paper:
I wouldn’t have written four coverage tools if I didn’t think they’re helpful. But they’re only helpful if they’re used to enhance thought, not replace it.
How you have used Code Coverage in your projects? What Dos and Don’ts you’d like to share?
It happens to all of us. We are used to doing a process in a particular way for years and never think of other ways of doing it. Then someday someone says something. That serves as an eye opener and we start seeing other ways of doing the same thing.
This happened to us for our rule: “Break a build if a single unit test fails”
Sounds very simple and rational. We have this rule for may be 10 years and I have repeated this over and over in all new projects that we took up in these years. Our structure follows the Plan A as shown below i.e. to run tests as part of the build process and if a single test fails, build fails.
What changed in year 2017 was a quest to find ways to release faster, you know the DevOps like stuff. So we started to look at the reasons for the build to be broken. We build our source 4-6 times a day, so we had enough data to look into. One of the reasons was always a failing test.
Now we thought, as you must be thinking by now, that it is a good thing. We should not ship a build for which a given test breaks. But our data (and wisdom) suggested that failing tests were in following three categories:
- The underlying code changed but test code was not changed to reflect this. Thus test fails but code doesn’t fail.
- The test is flaky (expect a full blog post on what flaky tests are and what we are doing about them). For now, flaky test is that passes and fails with same code on different occasions.
- The test genuinely fails i.e. the feature is indeed broken.
Now 1 and 2 are important and Developer who wrote the test need to pay attention. But does this stop the build to be used? Of course not.
3 is a serious issue but with the notions of ‘Testing in Production’ combined with the fact that fix is just few hours away, we figured out a new rule as shown in Plan B.
Yes, when a build fails due to a failing test, we actually report bugs for failing tests and declare the build as Pass. Thus the wheel keeps rolling and if it needs to be stopped or rolled back, it can be.
A few weeks into this strategy while all looked good, it happened what was always feared to be happening. A build in which 20% of our tests were failing was declared pass and our bug tracker saw around 100 new bugs that night. Let’s call that spamming.
That raised our understanding to move from binary (fail or pass) to a bit fuzzy implementation. We came with a new rule that if 10 or more tests (where 10 is an arbitrary number) fail, we’ll declare the build as failed. Otherwise we’ll follow the same path. So this is now our current plan called Plan C.
I know what you are thinking that a single important test could be much important than 10 relatively unimportant tests. But we don’t have a way to weigh tests for now. When we have that, we can even improve our plans.
Does this sound familiar to your situation? What is your criteria of a passed build in context of test results?
The modern notion of software development requires to work in team. We work as individual contributors, but it’s the team that delivers the final outcome. From development through production, it is a team effort that enables quality at speed.
I look around and most teams are not as productive as they might have been. To individuals that I talk always tell that they are putting their best, but somehow the net result is not what they want. Those who have such feelings include Project Mangers, Scrum Masters, Development Managers, Testing Managers, Developers, Testers and the list goes on.
I think I have a fix to suggest. In fact, a very simple one and that is “Show Respect”.
You might be thinking, oh now we’ll get some sermons on the old philosophies of respecting people and we are in 21st century. But humans are humans, they only work at their best when they are respected.
Google did a famous study just couple of years ago which was summarized in this beautiful (though lengthy) article in New York Times. It suggests:
In the best teams, members listen to each other and show sensitivity to feelings and needs
More details of the study are here where “Psychological Safety” is defined.
Now you might think or claim that you already do that as a leader or team member. Your organization may have “Respect at the workplace” as one of business values. But how do you know if you are practicing what you preach? I’m suggesting following 3 tests for you to get answers to questions like these: “Am I respectful to my team members?”, “Is my manager respectful to all?”, “Who is not showing respect in the team?”
Respect the presence
I learned this from my grandfather when I was about 8 or 10 years old that whenever anyone in the family visited him, he greeted them by standing from his seat. He was in his eighties at that time but he’d stand up for his 2 year grandson or grand-daughter.
So here is the test: when someone approaches you at work, how do you respect them? Do you stand up to greet them? Do you offer that person the time one is looking for? Do you respond as if you want them to go away from you?
(the picture is taken from: https://www.practicaletiquette.com/how-to-show-respect.html )
Respect the opinion
When two people have the same opinion, one of them is redundant
So here is the test: when someone offers you opinion at work, how do you respect them? Do you always want people to offer opinions which fit fine in your frame of things? Do you listen to any opinion coming from any member of the team? Do you follow the advice given in the opinion?
Respect the feelings
Respecting someone’s presence of opinion gets you at a position where you start respecting people’s feelings. Just like we have different skin tones, our reactions to same incident can be very different. Respect that difference and try to understand that not everyone thinks the same way about any thing in this world.
So here is the test: when someone feels differently than your thinking, how do you respect them? Do you empathize with them to understand more? Do you give space to be able to share their feelings? (what we call Psychological Safety in Google study above)
Before I go, let me tell you that I’ve personally seen this respect trick to be working. In teams where everyone was respectful, team members were more influential and they cooperated with their best efforts to do wonders.
How has been your experience? Do you also believe that Respect is the root of team productivity? You can have a different opinion and I respect that.
If there were a time to test the passion of Software Testing community in Pakistan, it was last Saturday when resilient members attended the 6th Islamabad Testers Meetup in big number despite all the traffic challenges. The event hosted by MTBC at their campus in collaboration with PSTB saw professionals of various backgrounds who discussed and learned lot of new ideas on the day.
The event was a much awaited one as after the successful PSQC’17, activities restarted to gear towards PSQC’18. Umer Saleem welcomed everyone in the morning as Host of the day, introduced the event agenda and handed over the stage for the first talk.
Kalim Ahmed Riaz, Senior SQA Engineer from Global Share presented his thoughts on “Ways to Improve Software Quality”. Kalim who maintains his own blog on testing, quoted several everyday examples to explain his views including buying vegetables and the knowledge needed to define requirements. He then suggested multiple ways to cure common Quality diseases including “treating Testers as Clients”. By which he meant that not only Testers need to step up to think and act like real Clients but also the management need to acknowledge issues reported by Testing team to be real one and not postponing them sighting they are not real users. His slides are here: IsbTestersMeetup_Ways_improve_Quality_Kalim
The next presentation was well handled by a Story-teller Farrukh Sheikh who is Lead QA Engineer at Ciklum. Farrukh shared his thoughts on “Adaptation” skill which he believes is a must for all testers. He emphasized everyone to get out of their “deceptive illusion” where they believe themselves as strong and well-built testers and rather learn the new skills of the trade. He suggested to read a lot, practice new methodologies, learn coding as few ways to build muscles for tackling the challenges of tomorrow. Full slide deck: IsbTestersMeetup_Adaptation_Farrukh
(more photos are here)
After the talks, we did an experiment of “Open House Discussion” which was very well received. I was handed over the task to moderate it around the selected topic (based upon registered participant’s feedback) “Challenges in Implementing Test automation”. The format was that anyone from audience would share a challenge he or she is facing, and then everyone in the audience was free to give input on how to address that challenge. The discussion was around following challenges shared by members present there:
- Not finding time to do automation
- How to stabilize automation when everything is changing?
- Automation falling at configurations or keeping them up-to-date
- What’s next if first round automation is done
There were many suggestions offered from experienced and uniquely qualified testers in the hall which ranged from “talking to management to buy time”, “involving Programmers in the team to be part of the automation project”, “including automation tasks in Sprint planning”, “keeping an open eye on the changes within and outside your organization”, “being selective in what to automate” and so on. The discussion was of much worth and I plan to write a full blog post on these topics. It was heart-warming to see that we as a community are now moving forward to discuss such challenges and hopefully next time we will have solved these problems and ready for next set of problems.
Dr. Zohaib Iqbal, President PSTB was then invited to give updates from PSTB. He shared the lessons learned so far in holding such events and highlighted various programs including Certifications around CTFL, ISTBQ partner program and Accredited training program. He invited all to be part of upcoming PSQC’18 to be happening at Lahore in Spring next year.
The stage was then handed over to Adeel Sarwar, CTO MTBC for a closing note. Adeel briefly talked about MTBC vision including their wonderful initiatives to incorporate fresh college graduates into their workforce and how this model has worked really well for them. He raised his concern of people in IT not reading enough which reminded me of my campaign of “Certified Book Reading Tester”. He made it clear to everyone that we need to raise the bar to improve quality of Software Quality professionals. He thanked the audience and shared some ideas on future collaborations in which MTBC can be a part.
Shields were presented to speakers and certificates were given to the wonderful MTBC organization team who were well applauded by round of applauses. We then moved to outside for an open air tea with snacks setting. The food was as delicious as were the discussions and lot of new ideas were shared by participants. New friends were made and old friendships were revived. The conversations were never ending but with every beautiful meeting, it ended too soon for all of us.
Thanks again to the hosts, PSTB support and above all the “cheetah” testers who always respond to such events. Together, let’s make Pakistan testing community even stronger!