Technology

Mutation testing

9/4/2024

•

min read

Imagine: you worked day and night to get your new and revolutionary position ready for your customer. You've used TDD, you've tested, the code has been reviewed, and your dashboard shows an amazing 99% rule, method, and branch coverage. You implement your position after acceptance and go home feeling good. The next day, you arrive at work and - out of the blue - your mailbox is filled with customer test findings... what happened? After some analysis of these findings, it appears that the 99% line and branch coverage did not lead to a complete picture of the quality of the code. How is this possible?

What went wrong?

Why can't we rely on our old friends, line coverage and branch coverage? Well, the problem is that high line or branch coverage is no real indication of the quality of your code. Let's look at three issues with rule and branch coverage:

‍

What is missing?

Well, there is no control over calling the perform method so any side effects of this method (extra parameters being set, context being changed...) are not covered by this test.

Is this test complete?

No, there is no control of the limit of a 0 value for the variable “i”, while limit values often have a special meaning, can cause separate behavior or even cause unwanted exceptions (divide by 0, etc.).

What's wrong with the two tests above?

The return value of method foo is completely ignored, while a return value has an explicit meaning (otherwise it would not be used) and can have a direct impact on the result of the process flow. To make it even more concrete, check out the test below, a test for the power function. What can go wrong if this is your only test?

Reply: power () can add, multiply, y to the power x instead of x to the power y, etc., and yet the test would not fail.

What do we need - is there a better way?

A solution to these problems can be found in mutation tests. Mutation testing is about testing your tests, checking the quality of your tests. In mutation testing, you perform your tests on slightly modified versions of your source code. Such a modified version of your code is called a mutant. The end game is: get all mutants - or as many as possible - killed. Killing a mutant means that at least one of your tests fails for this mutation.

Mutation testing is a fairly old idea, it was invented in the 1970s but wasn't very mainstream until recently because a huge number of mutants can be generated from your code. This makes carrying out mutation tests very resource-intensive. Because computers are extremely powerful these days, the concept is becoming increasingly popular.

How does it work?

How does a mutation test work? A tree structure is created from your source code and a mutation can be applied to all nodes that contain conditions, constant values for variables, etc. These can include, for example, deleting or ignoring a condition.

The unit tests are carried out for all generated mutations. Once a test fails, the mutant is killed and that test ends. This way, all mutants are processed and at the end you get an overview of the percentage of killed mutations.

A tool like PIT can use this result to generate a detailed report on the quality of your tests per class, package, etc.

Types of mutators

What are the different types of mutators? Here are the main types:

As already mentioned, you have mutations in conditions, and you also have mutations on mathematical operators or logical operators. When looking at return values, you can replace a boolean true with false, ignore a numerical value, etc., and - in the case of collections - you can create a mutation that returns an empty collection instead of the intended collection.

Do we need perfection?

Do we need perfection? No, this is not necessary in all cases. As you can see in the screenshot, which shows the generation of an informational message, mutation tests fail because there is no test for the constructed message. Is this a test that we absolutely need? Maybe not...

‍

Extra benefits

What did improving mutation coverage give us, besides being able to get rid of work without fear of the next day 😊?

Well, we've discovered a few bugs that weren't reported, we added tests that covered unforeseen but likely scenarios. We also removed dead code, unnecessary checks,... So just executing the process provided value. In addition, our tests became more robust and complete. So I'd say it's a no-brainer to add these kinds of tests to the development process.