Quality Testing

You can’t test it all

By Jim Grey (about)

That new build of software you are about to test? It’s a haystack with some unknown number of needles (bugs) in it.

Hay rolls 3
Have fun finding all the needles!

As a tester, you might think your job is to find all the needles. But how do you do that when you don’t know how many needles are in there? What if there are a lot of needles in there? You’ll never have time to find them all.

You need a plan. You want to find the showstopper bugs right away, and then find as many other bugs that people will care about within the time you have. And then when they come breathing down your neck to stop already and ship, you want to be able to tell them just what badness still might lurk in the code. Give them a reason to think it over.

You do that through assessing risk and targeting test coverage. To assess risk, ask yourself some questions:

  • How stable is the code that was changed? What interactions within the software might these changes break? You’re trying to figure out how likely it is you’ll find bugs.
  • If stuff around these code changes is broken, how much could it hurt the user? How much could it hurt your company? You’re trying to figure out the impact of any bugs you might find.

Risk is the product of likelihood and impact. Test for the highest risk bugs first, working down through the risks. Test more deeply for the bigger risks, more lightly for the smaller ones.

Let’s say they want to ship before you’ve tested through the risks you think people will care about. You can then talk about the risks you haven’t checked for yet, and ask if they’re okay with shipping like that. Do you see the mind shift here? You’re not saying you haven’t run all of your tests yet, which sounds an awful lot like you can’t keep up. Instead, you’re saying that the code might not be ready yet, and here are the specific things you’d like to still check for. It puts you in a much stronger position to get that extra time — and makes it the boss’s decision about what to do next.

Ultimately, it’s best if your developers can and will take great care to not deliver so many needles. That’s always the best case. Click here to read more about it.

Quality Testing

The tester’s three mental models

By Jim Grey (about)

It’s a common mistake among new testers: test it all, every time. But the weight of all that checking soon crushes the tester, and s/he starts looking for ways to test less without missing anything important.

And so begins the journey of understanding risk likelihood and impact: how likely is a thing to be broken, and how bad is it when it is. Smart testers prioritize likelihood and impact, and test in priority order. That way, should time run out, only low risk and low impact areas of the product remain untested. Heck, you might even skip tests that are ranked low enough. Maybe you should skip those tests, as they’re likely to find bugs nobody cares about. A radical thought!

But how to rank risk and impact? This reminds me of an old joke.

Ice cream board
I wish all chalk marks could be about ice cream.

There was an engineer who kept the big machine on the shop floor running faithfully for 30 years. After he retired, the machine promptly broke down. Nobody could get it running again. In desperation, the company called the engineer and implored him to come back and fix it.

The retired engineer returned, albeit reluctantly. He spent a day looking the machine over. Then he called everybody together and marked an X in chalk on a particular component. “Replace this, and the machine will work again.” Glory be, he was right! “Send us an invoice,” the boss said.

And the engineer did: for $10,000. “Ten thousand dollars!” the boss cried. “You need to justify that!” The engineer said he’d send an itemized invoice. Here’s how it read:

One chalk mark: $1

Knowing where to put it: $9,999

Testing for risk and impact means knowing where to put it — that is, knowing where to go to find the most serious bugs. You get good at that by building these three mental models:


What impact on the rest of the software will these code changes have? In other words, what is likely not to work as desired after these code changes are made?

This means you have to learn how is the product is designed and built. That doesn’t mean you necessarily have to be able to read code, although it doesn’t hurt. You just have to pay attention as you test the product and listen to the developers’ explanations of the product’s technical details. You will know you’re building this model when you articulate how you think the product is built to a developer and they say something like, “Yeah. Those aren’t exactly the words I’d use, but they’re accurate enough.”

The code mental model helps you assess risk likelihood. “That part of the product is a little brittle, and every time something interacts with it, things are broken,” or, “I know we designed that function to handle a certain throughput, but what we’re contemplating is 10 times that, and so I’m concerned it’ll fold under the pressure.”


What parts of the product, when not working as desired, will be a problem for the customer or user? How severe a problem will it be?

To build this model, form good relationships with your support and implementation teams. You might even do rotations through support from time to time, and review customer-reported problems and seek clarity from support on how difficult they were for customers.

The customer mental model helps you assess impact. “If we ship this bug, customers are going to scream,” or, “I think support can talk customers around this bug,” or, “Customers are probably not going to even notice this bug.”


What parts of the product, when not working as desired, put the company’s revenue or reputation at risk, or interrupts smooth and efficient company operations? How severe a problem will it be?

The business mental model gets at how your company makes money and grows the business. This is often the hardest mental model to build, but to the extent you build it, you can make much more nuanced test coverage decisions. To start, you can form an understanding of the kinds of product issues that get customers to call the CEO and threaten to cancel or sue. You can come to understand the kinds of problems that place heavy burden on the support and implementation teams, or would cost the company money in terms of time taken away from revenue-generating activity or services given for free to help regain an angry customer’s trust.

Come to understand which customers, especially the most lucrative ones, are up for renewal soon, and which are unhappy with your company and why.

The business mental model helps you further assess impact. “If this doesn’t perform well, customers are going to quit us,” or, “Bugs in this part of the product always flood us with calls and disrupt our ability to deliver more software.”

Quality The Business of Software

It’s remarkable that the “Great Glitch of July 8” doesn’t happen more often

By Jim Grey (about)

It was never a coordinated “cyber-attack,” as several news outlets speculated.


It was simple coincidence that several separate systems failed on the same day, last Wednesday, July 8: the trading system at the New York Stock Exchange, many systems at United Airlines, and the Web site of The Wall Street Journal.

Technology fails all the time. You just don’t usually recognize it. Have you ever noticed a page on a site loading unusually slowly? Or have you ever been unceremoniously logged out? I’m sure that as long as the screen finished loading, or that you were able to successfully log right back in, you shrugged it off and moved on. It might have been random Internet gremlins or lousy Wi-Fi. But it could also have been a failure in the service. Perhaps monitoring software noticed it and quietly performed a restart. Or maybe a few minutes of high drama unfolded in some technical operations center somewhere as technicians righted the situation.

But why do such systems fail? Several reasons:


Legacy systems patched and updated for so many years that the code has become sclerotic. Big, old companies like United Airlines are bursting with old systems. I wouldn’t be surprised if some part of their reservation system involves a mainframe! Systems like these have been repaired and extended for years upon years, and by now none of the original programmers and technicians still work there. The code has become difficult to restabilize after any change. It’s prohibitively expensive to build a new system from scratch, and even if you could afford it, you’d just introduce a whole host of new problems anyway.

System integrations and data migrations gone wrong. Company A buys Company B. There’s a lot of overlap in the technologies they use, so they integrate them or migrate the data from one to the other. In any such project, a thousand edge cases lurk that, when triggered, can cause failure. Even the most crack project team will miss some. There’s never time and money to find them all anyway. Missed edge cases are just ticking time bombs.


Poor original engineering. Because software engineering is still a nascent discipline, we’re still figuring out how best to do it. Every methodology has challenges and limitations. Smart engineers do the best they can to design a system that will work well, but are always limited by time and money. Sometimes revenue pressure leads engineers to favor fast over good. And even then, it’s very hard to imagine all the demands that will be placed on a system over time.

One of my past employers had a Web service that pumped customers’ backend-system data into our database. It was fast and reliable until we sold the product to a customer that wanted to blast in 10 years of historic data. We’d never done that before, nobody checked with the engineers first, and sure enough it made the Web service fall right onto its face. All of our customers experienced an outage.

Good old-fashioned hardware failure. United blamed its July 8 outage on a failed router. Some years ago, squirrels brought down NASDAQ by chewing through some power lines. These things happen, and most companies hedge against it with redundant hardware. But even then, sometimes a failure gets through.

Imperfect failure planning. Almost every company has failure plans in place. Most of them use as much automated failure recovery as they can. But there are just situations that evade even the best plans and the best automation.

Perfect technology is a myth. Occasional failure is certain.

Managing People Quality

Four critical tips for getting your ideas implemented at work

By Jim Grey (about)

After eight years writing and editing software documentation, I itched to make software again, like I did in college. So I took a job with a software company as a tester.

My corporate mug shot from those days - yes, long hair and a beard
My corporate mug shot from those days – the Grizzly Adams years

The company made a sprawling product for an industry I knew nothing about, so I had lots to learn. Given my background, the first thing I did was reach for the manuals. They were incomplete, inaccurate, and poorly organized. There was online help, but it was unnavigable. Nobody was ever going to use the documentation to successfully learn the product. My boss managed the technical writers too, so I marched into his office to complain. I wasn’t delicate about it. “This stuff is terrible! I can’t believe you ship this to customers! It’s an embarrassment.”

He leaned back in his chair and calmly said, “What would you do to fix it?”

“I would throw it out and start over,” I began. And then over the next ten minutes, off the top of my head I outlined a project that would create new manuals and online help that would actually help users not just use the product, but get the best value from it.

Three days later, he called me back into his office. “Remember that thing you said you’d do with the documentation? You are now manager of the Documentation Department. Make it so.”

It was a bold move for him to take a gamble on me. I’d never managed people, and my project management experience was limited. What I didn’t know was that every year the company surveyed its users about product quality – and every year the documentation got the most complaints. My boss had been told to fix this problem, but had no idea how. Then I walked in with a solution that sounded like it just might work.

Most of this story is just the nuts and bolts of the project – hiring and coaching staff, creating plans and schedules, doing visual and information design for the new manuals and online help, managing the project, reporting to management, and even doing some of the writing myself. The details would be interesting only to another technical writer. Much of this was new to me, but I had excellent support from a boss who needed to see his gamble pay off. He also helped me navigate the inevitable office politics, including another manager who kept trying to torpedo my efforts. Also, the program manager helped me master the project management tools we used, none of which I had ever even seen before. My team and I worked on the project for a year and a half. It’s not often a technical writing team gets an opportunity to do a clean-sheet rewrite like this, and they were all enthusiastic about it. I worked hard to clear their roadblocks, respond quickly to their concerns, and generally be a good guy to work for, and it paid off in the excellent work they delivered. When we were done, we had written over 3,000 pages and had created a seven-megabyte context-sensitive online help system.

I was invited to demonstrate the new online help at the annual user conference. 600 people flew in from all over the United States, and there I was before them on the opening session’s main stage. My presentation was the last of a series about new features in the product. When I finished, to my astonishment the online help received enthusiastic applause – and then one person stood, and a few more, and several more, and soon the whole room was standing and applauding. That moment remains the pinnacle of my career; I can’t imagine anything else ever overtaking it. The icing on the cake was when I overheard the VP of Sales say to my boss, “All the blankety-blank new features we pushed you to put into the product, and everybody liked the blankety-blank online help the best! The online help! You’ve got to be blankety-blank kidding me!”

I used to think I was just a grunt paid to trade the words I wrote for a paycheck. Through this project I learned just how interdependent everyone is at a company, and how everybody is important. Specifically, I learned:

If you want to see your great ideas implemented, they need to solve a big problem the company thinks it has. The problems your company thinks it has may very well be different from the problems your company actually has. Frame your ideas in terms of solving the problems the company thinks it has.

When you’re doing something you’ve never done before, find people who can coach you through it. I don’t care how far down the ladder you are at your company, your success helps determine other peoples’. Look for someone who both knows how to do the thing you need to learn and whose success depends in part on yours – that last bit motivates them to help you. In my case, it was my boss and the program manager.

Work for people who clear roadblocks out of your way so you can be most effective. I now leave situations where the boss doesn’t help me in this way. It’s that critical.

Your success always depends on other people, so treat them well. In giving my team an exciting assignment and creating an environment in which they could focus, they happily turned out huge quantities of good work. Also, after we shipped the new documentation, I promoted every writer. They deserved it.

A footnote: That company went through tough times a few years later and so we all moved on, some for better positions and others (like me) because they couldn’t afford to pay us anymore. One of the writers who had worked for me called me one day seven years later, by which time I really had moved into software testing. She said, “We have an opening here for a test manager. I’d love to work with you again, and this is a good place to work. You really should apply.” I did, and I got the job. I found out later that just before my interview, she went to the VP and said, “He’s a great boss. You don’t want to let him get away.”

Sometimes the good things you do come back to you!

The Business of Software

Much ado about Flickr

By Jim Grey (about)

I was shocked when I logged into Flickr last week and found an entirely new interface.

I'm staying right here at Flickr
My Flickr homepage.

My shock turned to disappointment and sadness that some of my contacts were super angry about the change, left strongly worded comments on their photostreams, and immediately moved their photos to other services.

make software products for a living; I’ve seen firsthand how interface changes can alienate users. They become comfortable with a product’s features and usage, even when they’re flawed. They don’t want to learn anything new (which often masks a fear that they can’t learn something new).

At the same time, Flickr (and Facebook and any other thing you do on the Web) is a product, built by a company that is trying to make money in an ever-changing landscape.

I’ve seen it often, and it’s happened at companies where I’ve worked: A company builds a good product that takes off. Success causes the company to grow or to be sold to a larger company. And then some scrappy startup company builds a product in an overlapping market that becomes a new darling. By then, the big company is so invested in what it’s always done that it struggles to adapt to the shifting market.

From where I sit, it looks like all of this happened to Flickr. Founded in 2004, Flickr quickly became arguably the king of the hill among photo-sharing sites. Web giant Yahoo! quickly noticed and, in 2006, bought the fledgling company. Success!

But consider all that’s happened in photography and on the Web since 2006. Most people had just discarded their film cameras for digital cameras. Soon cameras in phones became good enough for casual, everyday use; many of them are now very good. Users found it easy to share their photos across any number of the social networks that had emerged – primarily Facebook, which was founded in 2004, too, but also on upstart Instagram. Today, the three cameras that take the most photos uploaded to Flickr are all iPhones.

The market has shifted. It was a matter of time before Flickr either responded or became a niche product of ever decreasing importance. This new interface is its bid to stay relevant. I’m impressed with Yahoo! for moving Flickr so boldly.

I think that if people give the new interface a chance, it will work for most of them. I’ve heard complaints about slowness; I advise patience as Yahoo! would be foolish not to address legitimate performance problems. I’ve heard complaints about how crowded the interface feels; I’m also sure Yahoo! will tweak the new interface over time for better usability.

Another source of uproar is that advertising now adorns Flickr pages. I hate Web ads too, but really, they are the major way many Web products make money.

I sympathize a little with one complaint: all of us who bought Flickr Pro accounts for unlimited photo uploads now feel kind of let down, given that everybody gets a terabyte of storage now. That much storage might as well be unlimited; you could upload one photo a day for the rest of your life and never run out of space. But Flickr is letting us cancel our Pro accounts with a pro-rated refund, or keep Pro at its rate of $25 per year and never see an ad. Anybody who doesn’t have Pro already will have to pay $50 per year for that same privilege. I think this is a reasonable trade.

Flickr’s real mistake might be in underestimating how attached its users were to the old interface. But if my experience is any indication, perhaps that mistake won’t be fatal. Of my contacts, about five percent of them have moved to other services. I’ll miss seeing their photos. I wonder if they’ll soon miss the rest of the Flickr community.

Cross-posted from my personal blog, Down the Road.