In an earlier post, it was explained that how a DevOps testing consists of three main types: Scheduled Testing, On-Demand Testing and Triggered Testing. I covered how we scheduled different types of testing in the same article and now let me dig into details of On Demand Testing.
Keeping the notion that “testing is an activity” that is performed more often than a “phase” towards the end, it is necessary to configure Continuous Testing. That requires to setup automated testing jobs in a way that they can be called when needed. This is what we call On Demand Testing. This is contrary to common notion of having testers ready to do testing “On Demand” as features get ready to test.
(the original photo is here)
For example, we have a set of tests that we call “Data Conversion tests” as they convert data from our old file format to new file format. As you can imagine, lot of defects uncovered by such tests are data dependent and a typical bug looks like “Data conversion fails for this particular file due to this type of data”. Now as a Developer takes upon that defect and fixes this particular case, she’d like to be sure that this change hasn’t affected the other dataset in the conversion suite of tests. The conversion testing job is setup in a way that a Developer can run it at any given time.
I shared that we are using Jenkins for the scheduled testing. So one way for an individual team member to run any of the jobs on-demand is to push set of changes, log into the Jenkins system and start a build of automated testing job say Conversion Testing for the above example. This is good, but this might be too late as changes have been pushed. Secondly the Jenkins VMs are setup in a separate location than where the Developer is sitting and feedback time can be anything between 2-3 hours.
Remember, tightening the feedback loop is a main goal of DevOps. The quicker we can let a Developer know that the changes work (or don’t work), the better we are positioned to release more often with confidence.
So in this case, we exploited our existing build scripts which are written in Python but are basically a set of XML files that define parts that can be executed by any team member. We added a shortened dataset that has enough diversity and yet is small, which runs within 15-20 minutes. Then we added a part in the XML file that can do this conversion job on any box at any given time by anyone in the team.
Coming back to the same Developer fixing a Conversion defect, the Developer after fixing the bug now can run the above part on her system. Within half an hour, she’ll have results and if they look good, she’d push changes with a confidence that next round of Scheduled Testing with larger dataset would also pass.
Please note that we have made most of our testing jobs as On-Demand but we are having hard time at few. One of them being Performance Testing because that is done on a dedicated machine in a controlled environment for consistency of results. Let’s see what we can do about it.
Do you have automated testing that can be executed On-Demand? How did you implement it?
Doing it for the first time is very hard. But doing it again and again with same energy and passion is even harder. That’s why as we witnessed a very successful Pakistan Software Quality Conference (PSQC’18) on April 7th, we felt even more accomplished than the first edition PSQC’17.
Last year it was in Islamabad, and this year the biggest IT event in terms of Quality Professionals attendance moved to the culture capital of Pakistan: Lahore. Beautifully decorated in national color theme, the main auditorium of FAST NU Campus witnessed 200+ amazing people join us from various cities across Pakistan.
After recitation of the Holy Quran, event host Sumara Farooq welcomed the audience and invited PSTB President Dr. Muhammad Zohaib Iqbal for the Opening note. Dr. Zohaib recapped the journey of community building event and shared demographics of the audience. He emphasized the need for everyone to act upon to the Conference theme “Adapting to Change”.
We had two quick Key Note sessions in the first half. The first one being on “Software Security Transformations” by Nahil Mahmood (CEO Delta Tech) who spoke about the grave realities of current software security situation in Pakistan. He then urged to take a part in making every software secure enough in accordance with various Industry standards. Read more about those in the PDF: PSQC18_NahilMahmood_SoftwareSecurityTransformation.
The second talk was on “Quality Engineering in Industry 4.0” by Dr. Muhammad Uzair Khan (Managing Director Quest Lab) who first explained the notion of Industry 4.0. He envisioned a future where systems are being tested in more and more automated way with Exploratory manual testing going in background. He also rightly cautioned that any prediction to the future is a tricky business.
Few honorable guests then spoke at the event including Professor Italo, Honorary Consulate of Purtagal Feroz Iftikhar and HOD of FAST NU CS Department. Shields were then presented to Sponsors of the event by Dr. Zohaib and myself. Contour Software was represented by Moin Uddin Sohail and Stewart Pakistan (formerly CTO24/7) by their HR Head Afsheen Iftikhar.
A tea break was now needed to refresh participants for the more technical stuff which was coming their way. This time was well utilized by all to meet strangers who quickly became friends.
(More photos covering the event are coming soon at our facebook page)
Second session had five back to back talks:
- “Performance Testing… sailing on unknown waters” by Qaiser Munir, Performance Test Manager from Stewart in which after giving some definitions, he shared a case study on how a specific client felt happy with insights it needed from Performance Testing. Full slides here: PSQC18_QaiserMuneer_PerformanceTesting
- “Agile Test Manager – A shift in perspective” by Ahmad Bilal Khalid, Test Manager from 10Pearls who travelled from Karachi for the event. ABK, as he likes to be called, recalled his own transformation from a traditional Test Manager to Test Coach who is more of an enabler. His theme of experienced Testers becoming Dinosaurs and not helping new ones learn new stuff did hit well and resulted in quite a fruitful discussion. Read more here: PSQC18_AhmadBilalKhalid_TestManager-ChangingTimes
- “Agile Regression Testing” by Saba Tauqir, Regression Team Lead from Vroozi shared her current work experience where they have a dedicated team for Regression Testing. This also sparked a debate within the audience so as how much Regression Testing can be sustained in Agile environments. See her talk here: PSQC18_SabaTquqir_RegressionTesting
- “To be Technical or not, that is the question” by Ali Khalid, Automation Lead at Contour Software which perhaps was the star talk of the day. He took upon story of a hypothetical tester “Asim” and how he became a Technical Tester through four lessons. Easing up learning with some funny clips and GIFs, Ali gripped the audience to convey the message strongly which included creating an attitude towards designing algorithms and enjoying solving problems. Full slides here: PSQC18_AliKhalid_ToBeTechnical
- “Power of Cucumber” by Salman Saeed, Automation Lead from Digitify Pakistan who talked about his journey towards automation through the same tool. He explained different features of it, the Gherkin language, the tools needed to run it and shared a piece of code that showed a sample Google search test case. He urged all to use powerful tools like Cucumber to begin their automation journeys. He also promised to share code to whoever contacts him, so feel free to bug him. His slides are here: PSQC18_SalmanSaeed_PowerofCucumber
A delicious lunch was waiting in the Cafeteria which was basically an excuse to learn from each other while enjoying the food. I could see many people catching up with Speakers to ask their follow up questions and some healthy conversations around it.
Audience was welcome back by three more talks in the afternoon session:
- “Distributed Teams” by Farah Gul, SQA Analyst at Contour Software, another speaker from Karachi. She first explained how different location and time zones create the challenge of working together as a team. She shared some real examples on how marketing campaigns failed in a foreign country due to language barriers. At the end she suggested some ways to curb these challenges which included understanding culture, spending more time in face to face communication and asking for clarity. Slides are here: PSQC18_FarahGul_DistributedTeams
- “Backend Testing using Mocha” by Amir Shahzad, Software QA Manager at Stella Technology who started off his talks with the ever rising need of testing the backends. He explained how RESTful APIs can be tested using Mocha with some sample code. He also mentioned other libraries that can be used to do better assertions and publishing HTML reports. His talk is here: PSQC18_AmirShahzad_Mocha
- “ETL Testing” by Arsala Dilshad, Senior SQA Engineer at Solitan Technologies who shared her first-hand experience of testing ETL solutions. After providing an overview of her company’s processes, she told how data quality, system integration and other testing are needed to provide a Quality solution. Read more details here: PSQC18_ArsalaDilshad_ETL Testing
Then came the best part of the day. We experimented with a new segment called “My 2 cents in 2 minutes” which provide participants to come onto stage and share any challenge they are facing in their profession. Inspired by 99 seconds sessions at TestBash, this proved to be a marvelous way to engage the audience. Around 20 awesome thoughts were presented by Quality Professionals who talked on a variety of topics. I do plan to write some follow up posts on some of the stuff that was brought there as it would be unjust to sum it up here in few lines.
Another tea break was needed to defeat the afternoon nap and seeing some Samosas (and other eateries) being served with tea resulted in many happy faces.
We were then back for the final and perhaps the best talk of the day. “Melting pot of Emotional and Behavioral Intelligence“ by Muhammad Bilal Anjum Practice Head QA & Testing from Teradata who has more than a decade experience in Analytics. Bilal gave some examples on how the current situation of a potential customer can be predicted from the available data. For example, Telco data combined with Healthcare and other sources can be used to predict how likely a person will buy some health solution. He then explained how culture plays a key role in human behaviors and why Industry Consultants are in demand for jobs like above. At the end he threw some ideas on how such solutions can be tested.
With all talks finished, I was asked to close the day being the General Chair of PSQC’18. I took upon this opportunity to thank sponsors, partners, organizers (Qamber Rizvi, Salman Saeed, Adeel Shoukat, Ali Khalid, Mubashir Rashid, Muhammad Ahsan, Amir Shahzad, Ayesha Waseem, Salman Sherin , Ovyce Ashraph), speakers and audience and made a point to mention that how collaboration can produce results which can never be surpassed by individuals. To spark some motivation for participants to try out wonderful ideas presented in the day, I used Munnu Bhai’s famous Punjabi poem with punch line “Ho naheen jaanda, karna painda aye” (It never happens automatically, you have to do it)
Long live Pakistan Software Quality community and we’ll be back with more events through the year and yes PSQC’19 in the Spring of next year!
To have constant heartbeat of release, testing has to take the central stage. It can no more be an activity that is performed at the end of release cycle rather it has to happen during all the phases of a release. And mind you release cycle has shrunk from months to days.
In “Effective DevOps” book, the authors lay out many plans for making an effective plan to move towards a DevOps culture. On automation, it suggests that automation tools belong to either of the following three types:
- Scheduled Automation: This tool runs on a predefined schedule.
- Triggered Automation: This tool runs when a specific event happens…
- On-Demand Automation: This tool is run by the user, often on the command line…
(Page 185 under Test and Build Automation section)
The way, we took upon this advice to ramp up our efforts for Continuous Testing is that each Testing that we perform should be available in all three forms:
- Scheduled Testing: Daily or hourly tests to ensure that changes during that time were merged successfully. There are no disruptions by the recent changes.
- Triggered Testing: Testing that gets triggered on some action. For example a CI job that runs tests which was executed due to push by a Developer.
- On-Demand Testing: Testing that is executed on a needs basis. A quick run of tests to find out how things are on a certain front.
Take Performance testing for example. It should be scheduled to find issues on daily or weekly basis, but it could also be triggered as part of release criteria or it could be run On-Demand by an individual Developer on her box.
In order to achieve this, we re-defined our testing jobs to allow all three options at once. As the idea was to use Tools in this way, we picked upon Jenkins.
There are other options too like GoCD and Microsoft Team Foundation Server (TFS) is also catching up but Jenkins has the largest set of available plugins to do a variety of tasks. Also our prime use case was to use Jenkins as Automation Server and we have yet to define delivery pipelines.
(the original icon is at: http://www.iconarchive.com/show/plump-icons-by-zerode/Document-scheduled-tasks-icon.html )
I’ll write separately on Triggered and On-Demand testing soon and now getting into some details on how we accomplished Scheduled Testing below.
We had few physical and virtual machines, on which we were using Windows Task Scheduler to run tasks. That task will kick off on a given day and time, and would trigger Python script. The Python scripts were in a local Mercurial repository based in one of these boxes.
The testing jobs were Scheduled perfectly but the schedule and outcome of these jobs were not known to the rest of the team. Only testing knew when these jobs run and whether last job was successful or not.
We made on of the boxes as Jenkins Master and others as slaves. We configured Jenkins jobs and defined the schedules there. We also moved all our Python scripts to a shared Mercurial repository on a server that anyone could get. We also created custom parts into our home grown Build system that allows running pieces in series or parallel.
Given that Jenkins gives you a web view which can be accessed by all, the testing schedule became public to everyone. Though we had a “Testing Dashboard” but it was an effort to keep it up-to-date. Also anyone in the team could see how was the last few jobs of say Performance testing and what were the results.
Moving to Jenkins and making our scripts public also helped us make same set of tests Triggered and On-Demand. More details on coming posts so as how.
I wish I could show a “Before” and “After” pictures that many marketing campaigns do to show how beautiful it now looks like.
Do you have Scheduled testing in place? What tools you use and what policies you apply?
We are living in ‘survival of the fastest’ era. We don’t have time for anything. We prefer reading blogs instead of books and we look for tweets rather than lengthy press releases. So when it comes to testing a release that has only a few changes, we don’t have time to run all the tests.
The question but is: which subset of tests we should be running?
I have touched this subject in Test small vs. all, but looking at build change logs and picking up tests to run is a task that requires decision making. What if we can know the changes automatically and run tests based upon that?
That is possible through TIAMaps. No this term is not mine but part of it is. It originates from Microsoft’s concept of ‘Test Impact Analysis’ which I got to know from Martin Fowler’s this blog post. I’d recommend to read it first.
If you are lazier than me and couldn’t finish the whole blog, below is a summary along with a picture copied from there:
First you determine which pieces of your source code are touched upon by your tests and you store this information is some sort of maps. Then when your source code changes, you get the tests to run from the map and then just run those tests.
Below is a summary of TIAMap implementation in our project.
Why we needed it:
We didn’t do it for fun or due to “let’s do something shiny and new”. We are running out of time. Our unit tests suite has around six thousand tests and a complete run (yes, they run in parallel) takes about 20 minutes. Hmmm… a little change that needs to go has to go through 20 minutes of Unit test execution, that’s bad. Let’s see what others are doing. Oh yeah, Test Impact Analysis is the solution.
Generating TIA Maps
Code coverage comes to the rescue. If we already have a tool that finds out which lines of code are touched by all tests, can’t we have a list of source files that are touched by a single test?
So we configured a job that would run for tests and saves this simple map: test name -> source file names. There were two lessons that we learned:
- Initially, we had a job that would run for all 6,000 thousands and it was taking days. We became smarter and after generating first TIA Map for all tests, we only update maps for the tests that changed. We don’t have a way to find the test names that changed, but our job is based upon timestamp of files that have test code.
- We were storing the Map in a SQLite Db. As the Db had to pushed to our repository again and again, it was difficult to find deltas of change. We switched to simple text file to store the Map. Changes can be seen in our source control tools and anyone can look at those text files for any inspections.
As you can imagine that the hard part is to get those TIAMaps. Once we have them, we now do the following:
- When there is a need to run tests, we determine which source files have changed since the last run.
- We have a Python script that does the magic of consulting the maps and getting a list of tests to be executed.
- We feed that list of tests to our existing test execution program.
How is it going?
It is early to say that as we have rolled this as pilot and I may have more insights into the results in few months. But the initial feedback is indicative of us being on the right path. Time is being saved big time and we are looking for any issues that may arise due to faulty maps or execution logic.
Have you ever tried anything similar? Or would you like to try it out?
The only way to improve yourself and your craft is to reflect on the sate of affairs. The #StateOfTesting Survey gives us exactly that opportunity where testers from across the globe give their valuable feedback and then gain value from this collective wisdom.
I have been taking part in the survey for sometime and I see that not much testers from Pakistan are doing that. Given that our community is on the rise and we had our first ever testing conference, it’s time to get in touch with testing community of the world!
The link to the survey is: http://qablog.practitest.com/state-of-testing/
Thanks for the help and let’s make (testing) world better than today!
Code Coverage is a good indicator of how well your tests are covering your code base. It is done through tools through which you can run your existing set of tests and then get reports on code coverage with file level, statement or line level, class and function level and branch level details.
Since the code coverage report gives you a number, the game of numbers kicks in just like any other number game. If you set hard targets, people would like to get it, and at times a number means nothing. Here are my opinions based upon experience on how to best use Code Coverage in your project:
Do run code coverage every now and then to guide new unit test development
It’s worth running coverage tools every so often and looking at these bits of untested code. Do they worry you that they aren’t being tested?
The question is not to get the list of untested code base; the question is whether we should write tests for that untested code base?
In our project, we measure code coverage for functions and spit out list of functions that are not tested. The testing team then do not order the developers to write tests against them. The testing team simply suggests to write tests and the owner of that code prioritize that task based upon how critical is that piece and how often that is requested by the Users.
Dorothy Graham suggests in this excellent talk that coverage can be either like “butter” or like “strawberry jam” on your bread. You decide if you want “butter’ like coverage i.e. cover all areas or you want “strawberry jam” coverage i.e. cover some areas more in depth.
Do not set a target of 100% code coverage
Setting up a coverage goal is in itself disputed and is often misused as Brian Marick notes in this paper which has been foundation of any Code Coverage work thereafter. Also anything that claims 100% is suspicious e.g. consider following statements:
- We can’t ship unless we have 100% code coverage
- We want 100% reported defects to be addressed in this release
- We want 100% tests to be executed in each build.
You can easily see that a 100% code coverage gives you “Test all Fallacy” to imply that we can test it all. Brian suggests in the same paper that 90 or 95% coverage is good enough.
We have set a target of 90% function coverage but it is not mandatory for release. We provide this information on the table along with other testing data like results of tests, occurrence of bugs per area etc. and then leave the decision to ship on the person who is responsible. Remember, the job of testing is to provide information not make release decisions.
Yes, there is no simple answer to how much code coverage we need. Read this for amusement and know why we get different answers to this question.
Do some analysis on the code coverage numbers
As numbers can mean different things to different people, so we need to ask stakeholders why they need code coverage numbers and what they mean when they want to be covered.
We asked this question, got the answer which is to do a test heat analysis on our code coverage numbers. It gives us following information:
- Which pieces are hard to be automated? Or easy to be automated
- Which pieces are to be tested next? (as stated in first Do)
- Which pieces need more manual testing?
- How much effort is needed for Unit testing?
Do use tools
There are language and technology specific tools. For our C++ API, we have successfully used Coverage Validator (licensed but very rightly priced) and OpenCppCoverage (free tool) that extract info by executing GoogleTest tests.
Do not assume coverage as tested well
You can easily write a test to cover each function or each statement, without testing it well or even without testing it at all.
Along with our function wise code coverage that I mentioned above, we have a strong code review policy which includes reviewing the test code. Also we write many scenario level tests that do not add to the coverage but cover the workflows (or the orders in which functions will be called) which are more important to our users.
Brian summarizes it nicely in the before mentioned paper:
I wouldn’t have written four coverage tools if I didn’t think they’re helpful. But they’re only helpful if they’re used to enhance thought, not replace it.
How you have used Code Coverage in your projects? What Dos and Don’ts you’d like to share?
It happens to all of us. We are used to doing a process in a particular way for years and never think of other ways of doing it. Then someday someone says something. That serves as an eye opener and we start seeing other ways of doing the same thing.
This happened to us for our rule: “Break a build if a single unit test fails”
Sounds very simple and rational. We have this rule for may be 10 years and I have repeated this over and over in all new projects that we took up in these years. Our structure follows the Plan A as shown below i.e. to run tests as part of the build process and if a single test fails, build fails.
What changed in year 2017 was a quest to find ways to release faster, you know the DevOps like stuff. So we started to look at the reasons for the build to be broken. We build our source 4-6 times a day, so we had enough data to look into. One of the reasons was always a failing test.
Now we thought, as you must be thinking by now, that it is a good thing. We should not ship a build for which a given test breaks. But our data (and wisdom) suggested that failing tests were in following three categories:
- The underlying code changed but test code was not changed to reflect this. Thus test fails but code doesn’t fail.
- The test is flaky (expect a full blog post on what flaky tests are and what we are doing about them). For now, flaky test is that passes and fails with same code on different occasions.
- The test genuinely fails i.e. the feature is indeed broken.
Now 1 and 2 are important and Developer who wrote the test need to pay attention. But does this stop the build to be used? Of course not.
3 is a serious issue but with the notions of ‘Testing in Production’ combined with the fact that fix is just few hours away, we figured out a new rule as shown in Plan B.
Yes, when a build fails due to a failing test, we actually report bugs for failing tests and declare the build as Pass. Thus the wheel keeps rolling and if it needs to be stopped or rolled back, it can be.
A few weeks into this strategy while all looked good, it happened what was always feared to be happening. A build in which 20% of our tests were failing was declared pass and our bug tracker saw around 100 new bugs that night. Let’s call that spamming.
That raised our understanding to move from binary (fail or pass) to a bit fuzzy implementation. We came with a new rule that if 10 or more tests (where 10 is an arbitrary number) fail, we’ll declare the build as failed. Otherwise we’ll follow the same path. So this is now our current plan called Plan C.
I know what you are thinking that a single important test could be much important than 10 relatively unimportant tests. But we don’t have a way to weigh tests for now. When we have that, we can even improve our plans.
Does this sound familiar to your situation? What is your criteria of a passed build in context of test results?