Quality Testing

Fast failure recovery lets you take more risk and increase speed

By Jim Grey (about)

I can just imagine the “oh no” moment that rippled through Feedly last week when they learned that their big new release for iOS 9 crashed on launch for iOS 7 and 8 users. Their first response: add a hastily-written warning to the App Store description.


Their second response: fix the bug, fast. It was ready the next day.


Apparently, Apple has a way for developers to rush the very occasional critical fix into the App Store, sidestepping the normal, weeks-long approval queue.

Apple’s App Store fast track provides a safety net, and I don’t blame them for using it. Because Feedly could recover fast, hardly anybody will remember this gaffe, and it won’t cost the company very much.

I actually feel slightly bad talking about it, because it keeps the memory of this bug alive. But it illustrates such a simple equation: the faster you can recover from failure, the less perfect your product has to be when you ship it, and therefore the less expensive it is to build and support, and the faster your company can move.

I’m not advocating for delivering buggy product on purpose. Follow good development practices and test for important risks to deliver the best software you can as fast as you can. But when you inevitably deliver a bad bug, being set up to recover fast means you can deliver without worry.

I remember when the best way to get software to users was to mail it to them on CDs. A bug of this magnitude was a much bigger deal then. It was harder to warn users of the problem, and lots slower and more expensive to correct it. In that world, Feedly’s bug could have damaged their reputation for a long time. So back then it made sense to test more thoroughly — and therefore spend more time and money before releasing.

But today, in a world of Web software and 24-hour emergency App Store turnaround, you’ll deliver faster and with less expense when you set yourself up for fast failure recovery. Continuous integration and continuous delivery are usually a part of that strategy.

Quality Testing

You can’t test it all

By Jim Grey (about)

That new build of software you are about to test? It’s a haystack with some unknown number of needles (bugs) in it.

Hay rolls 3
Have fun finding all the needles!

As a tester, you might think your job is to find all the needles. But how do you do that when you don’t know how many needles are in there? What if there are a lot of needles in there? You’ll never have time to find them all.

You need a plan. You want to find the showstopper bugs right away, and then find as many other bugs that people will care about within the time you have. And then when they come breathing down your neck to stop already and ship, you want to be able to tell them just what badness still might lurk in the code. Give them a reason to think it over.

You do that through assessing risk and targeting test coverage. To assess risk, ask yourself some questions:

  • How stable is the code that was changed? What interactions within the software might these changes break? You’re trying to figure out how likely it is you’ll find bugs.
  • If stuff around these code changes is broken, how much could it hurt the user? How much could it hurt your company? You’re trying to figure out the impact of any bugs you might find.

Risk is the product of likelihood and impact. Test for the highest risk bugs first, working down through the risks. Test more deeply for the bigger risks, more lightly for the smaller ones.

Let’s say they want to ship before you’ve tested through the risks you think people will care about. You can then talk about the risks you haven’t checked for yet, and ask if they’re okay with shipping like that. Do you see the mind shift here? You’re not saying you haven’t run all of your tests yet, which sounds an awful lot like you can’t keep up. Instead, you’re saying that the code might not be ready yet, and here are the specific things you’d like to still check for. It puts you in a much stronger position to get that extra time — and makes it the boss’s decision about what to do next.

Ultimately, it’s best if your developers can and will take great care to not deliver so many needles. That’s always the best case. Click here to read more about it.

Quality Testing

The tester’s three mental models

By Jim Grey (about)

It’s a common mistake among new testers: test it all, every time. But the weight of all that checking soon crushes the tester, and s/he starts looking for ways to test less without missing anything important.

And so begins the journey of understanding risk likelihood and impact: how likely is a thing to be broken, and how bad is it when it is. Smart testers prioritize likelihood and impact, and test in priority order. That way, should time run out, only low risk and low impact areas of the product remain untested. Heck, you might even skip tests that are ranked low enough. Maybe you should skip those tests, as they’re likely to find bugs nobody cares about. A radical thought!

But how to rank risk and impact? This reminds me of an old joke.

Ice cream board
I wish all chalk marks could be about ice cream.

There was an engineer who kept the big machine on the shop floor running faithfully for 30 years. After he retired, the machine promptly broke down. Nobody could get it running again. In desperation, the company called the engineer and implored him to come back and fix it.

The retired engineer returned, albeit reluctantly. He spent a day looking the machine over. Then he called everybody together and marked an X in chalk on a particular component. “Replace this, and the machine will work again.” Glory be, he was right! “Send us an invoice,” the boss said.

And the engineer did: for $10,000. “Ten thousand dollars!” the boss cried. “You need to justify that!” The engineer said he’d send an itemized invoice. Here’s how it read:

One chalk mark: $1

Knowing where to put it: $9,999

Testing for risk and impact means knowing where to put it — that is, knowing where to go to find the most serious bugs. You get good at that by building these three mental models:


What impact on the rest of the software will these code changes have? In other words, what is likely not to work as desired after these code changes are made?

This means you have to learn how is the product is designed and built. That doesn’t mean you necessarily have to be able to read code, although it doesn’t hurt. You just have to pay attention as you test the product and listen to the developers’ explanations of the product’s technical details. You will know you’re building this model when you articulate how you think the product is built to a developer and they say something like, “Yeah. Those aren’t exactly the words I’d use, but they’re accurate enough.”

The code mental model helps you assess risk likelihood. “That part of the product is a little brittle, and every time something interacts with it, things are broken,” or, “I know we designed that function to handle a certain throughput, but what we’re contemplating is 10 times that, and so I’m concerned it’ll fold under the pressure.”


What parts of the product, when not working as desired, will be a problem for the customer or user? How severe a problem will it be?

To build this model, form good relationships with your support and implementation teams. You might even do rotations through support from time to time, and review customer-reported problems and seek clarity from support on how difficult they were for customers.

The customer mental model helps you assess impact. “If we ship this bug, customers are going to scream,” or, “I think support can talk customers around this bug,” or, “Customers are probably not going to even notice this bug.”


What parts of the product, when not working as desired, put the company’s revenue or reputation at risk, or interrupts smooth and efficient company operations? How severe a problem will it be?

The business mental model gets at how your company makes money and grows the business. This is often the hardest mental model to build, but to the extent you build it, you can make much more nuanced test coverage decisions. To start, you can form an understanding of the kinds of product issues that get customers to call the CEO and threaten to cancel or sue. You can come to understand the kinds of problems that place heavy burden on the support and implementation teams, or would cost the company money in terms of time taken away from revenue-generating activity or services given for free to help regain an angry customer’s trust.

Come to understand which customers, especially the most lucrative ones, are up for renewal soon, and which are unhappy with your company and why.

The business mental model helps you further assess impact. “If this doesn’t perform well, customers are going to quit us,” or, “Bugs in this part of the product always flood us with calls and disrupt our ability to deliver more software.”