Quality Testing

Myths of test automation – debunked!

By Jim Grey (about)

wrote a post last year criticizing test automation when it’s used to cover for piles of technical debt and poor development practices. But I still think there’s a place for automation in post-development testing. There are two keys to using it well: knowing what it’s good at, and counting the costs. Without those keys it’s easy to fall prey to several myths of test automation. I aim to debunk them here.

Myth: Automation is cheap and easy

It is seductive to think that just by recording your manual tests you can build a comprehensive regression-test suite. But it never seems to really work that way. Every time I’ve used record and playback, the resulting scripts wouldn’t perfectly execute the test, and I’ve had to write custom code to make it work.

St. Paul's Episcopal Church

What I’ve found is that it takes 3 to 10 times longer to automate one test than to execute it manually. And then, especially for automation that exercises the UI, the tests can be brittle: you have to keep modifying scripts to keep them running as the system under test changes.

I’ve done straight record and playback. I’ve created automated modules that can be arranged into specific checks. I’ve led a team that created tests on a keyword-driven framework. And I currently lead a team that writes code that directly exercises a product’s API. The amount of maintenance has decreased with each successive approach.

A side note: given the cost of automating one test, can you see that you want to automate only what you are going to run over and over again, because otherwise the investment doesn’t pay?

Myth: Automation can test anything, and is as good as human testing

Automation is really good at repeating sets of actions, performing calculations, iterating over many data sets, addressing APIs, and doing database reads and writes. I love to automate these things, because humans executing them over and over is a waste of their potential.

This gets at a whole philosophical discussion about what testing is. I think that running predetermined scripts, whether automated or not, is just checking, as in, “Let me check whether clicking Save actually saves the record.” This subset of testing just evaluates the software based on predefined criteria that were determined in the past, presumably based on the state of the software and/or its specification or set of user stories as they were then.

The rest of testing involves human testers experimenting and learning, evaluating the software in its context now. This is critical work if for no other reason than the software and its context (environment, hardware, related software, customer needs, business needs, and so on) changes. An exploring human can find critical problems that no automated test can.

I want human testers to be free to test creatively and deeply. I love automated checks because they take this boring, repetitive work away from humans so they have more time to explore.

Myth: When the automation passes, you can ship!

It’s seductive to think that if testing is automated, that passing automation is some sort of Seal of Approval that takes out all the risk. It’s as if “tested” is a final destination, an assurance that all bets are covered, a promise that nothing will go wrong with the software.

But automation is only as good as its coverage. And if nobody outside your automation team understands what the automation covers, saying “the automation passed” has no fixed meaning.

It’s hard to overcome this myth, but to the extent I have, it’s because as an automation lead and manager I’ve required engineers to write detailed coverage statements into each test. I’ve then aggregated them into broad, brief coverage statements over all of the parts of the software under test. Then I’ve shared that information — sometimes in meetings with PowerPoint decks, always in a central repository that others can access and to which I can link in an email when I inevitably need to explain why passing automation isn’t enough. Keeping this myth at bay takes constant upkeep and frequent reminders.

Myth: Automation is always ready to go

Hope Rescue Mission

“Hey, we want to upgrade to the next version of the database in the sandbox environment. Can you run the automation against that and see what happens?”

My answer: “Let’s assume I can even run the automation in sandbox. If it passes, what do you think you will know about the software?” The answer almost always involves feelings: “Well, I’ll feel like things are basically okay.” See “When the automation passes, you can ship!” above.

Automation is software, full of tradeoffs aimed at meeting a set of implicit and explicit goals. Unless one of those goals was “must be able to run against any environment,” it probably won’t run in sandbox. The automation might count on particular test data existing (or not existing). It might not clean up after itself, leaving lots of data behind, and that might not be welcome in the target environment. It might depend on a particular configuration of the product and its environment that isn’t present.

Even in the environment the automation usually runs in, it might not be ready to go at a moment’s notice. Another goal would need to be, “must be able to run at any time.” There are often setup tasks to perform before the automation can run: a reset of the database the automation uses, or the execution of scripts that seed data that the automation needs.

Myth: Just running the automation is enough

When I run automated tests, part of me secretly hopes they all pass. That’s because when there’s a failure, I have to comb through the automation logs to find what happened, figure out what the automation was doing when it failed, and log into the software myself and try to recreate the problem manually. Sometimes the automation finds just the tip of a bug iceberg and I spend hours exploring to fully understand the problem. Some portion of the time, the failure is a bug in the automation that must be fixed. When it’s a legitimate product bug, then I have to write the bug in the bug tracker.

I am endlessly amused by how often I’ve had to explain that just running the automation isn’t the end of it: that if there are any failures, the automation doesn’t automatically generate bug reports. The standard response is some variation of “What? …ohhhhhh,” as it dawns on them. So far, thankfully, it has always dawned on them.

Myth: Automated tests can make up for years of bad development practices

I’ve just got to restate my point from my older post on this subject. If your development team doesn’t follow good practices such as writing lots of automated unit tests (to achieve about 80% code coverage), code reviews, paired testing, or test-driven development, automation from QA is not going to fix it. You can’t test in quality — you have to build it in.

If you’re sitting on a messy legacy codebase, one where your test team plays whack-a-mole with bugs every time you make changes to it, you are far, far better served investing in the code itself. Refactor, and write piles of automated unit tests.

You want on the order of magnitude of thousands of automated unit tests, hundreds of automated business-rule tests (which hopefully directly exercise an API, rather than exercising a UI, for resiliency and maintainability), and tens of automated checks to make sure the UI is functioning.

I’ll belabor this point: Invest in better code and better development practices first. When you deliver better quality to QA, you’ll keep the cost of testing as low as possible and more easily and reliably deliver better quality to your customers and users.

Quality The Business of Software

Flickr has smartly repositioned itself to reman vital in photo sharing

By Jim Grey (about)
My Flickr camera roll.

When I started my personal blog, which is largely about film photography using vintage cameras, I found a great use for my languishing Flickr account: hosting most of the photos for my blog. Flickr has been a great tool for sharing my photography everywhere on the Internet.

The other day, I uploaded my 10,000th photo to Flickr. That’s a lot of photos! It’s so many that finding one particular photo on my computer is nigh onto impossible. From the beginning, I should have used the photo organizer that came with my copy of Photoshop Elements. But I’ve let too much water pass under the bridge: years and years of photos remain unindexed in folders on my hard drive. It would be a big, unpleasant job to organize them now.

It turns out that the easiest way for me to find one of my photographs is to search for it on Flickr. I’ve left enough bread crumbs in the titles, descriptions, and tags that with a few words in Flickr’s search box I can find anything I’ve uploaded.

It also turns out that I was inadvertently leading the way. Flickr recently made some changes to the site that makes it easier than ever to store all of your photos and find any of them in an instant. I think these smart improvements reposition Flickr well in the new world of photo storage and sharing, and give it a solid chance at remaining relevant and vital.

And it’s not a moment too soon. Flickr had been geared toward people interested in photography who wanted to share and talk about their work. Many users appeared to carefully curate their photostreams, sharing only their best photos. It remained wonderful for this purpose. But in the meantime not only have digital cameras almost entirely supplanted film cameras, but camera phones have also largely supplanted dedicated digital cameras. People were taking pictures on their phones just so they could share them on Facebook and Instagram — and Flickr was getting none of that action. It was falling behind.

Flickr finally awoke from its slumber in 2013 with a new, more modern user interface, plus one terabyte of free storage — upwards of a half million photos — for anyone, for free. Flickr’s mission had shifted: please do dump all of your photos here. And then last month Flickr rolled out yet another new user interface, and has added several powerful new features meant to make the site the only photo storage and sharing site you’ll ever need:

Automatic photo uploading. Flickr can now automatically upload every photo from your computer and your phone — every past photo and every new photo you take. Flickr marks them all as private, so only you can see them, until you choose to make them public. To enable this, you have to download the new Flickr app to your phone and download a new “Uploadr” application for your computer. But after you do, you may never again lose a photograph to a crashed hard drive or to a lost or stolen phone. And if you do have such a mishap, Flickr now lets you download any or all of your photos en masse.


Image recognition and automatic tagging. Flickr now uses image-recognition technology to guess what’s in each of your photos, and adds descriptive tags to them. You’ve always been able to tag your photos manually; those tags appear with a gray background. Flickr’s automatic tags have a white background. These tags make photos easier to find in search. It’s not perfect — a photo I took of a construction site was mistakenly tagged with “seaside” and “shore.” But it works remarkably well overall, and Flickr promises that they will keep improving the technology.

Camera roll and Magic View. Flickr has introduced an iOS-style camera roll as the main way you interact with your own photos now. Flickr is criticized for stealing this concept from Apple. But they’ve gone Apple one better by adding Magic View, which organizes photos by their tags — including the automatically generated ones. It gives you astonishing views into your photos, grouping them smartly. Finally, all of my bridge photos are in one place, and I didn’t have to lift a finger!

Flickr found 105 photos of bridges in my photostream.

Improved searchability. All these new tags makes Flickr even more searchable. You can find any of your photos in seconds on Flickr.

All of this makes Flickr a compelling place to store all of your photographs, and be able to easily find them. They’re stored on Yahoo! servers and are always backed up. With a couple clicks or taps, you can share them from there to most of the popular social media sites, including Facebook, Instagram (but only on your phone), and Twitter.

The best thing: You can still use Flickr for everything you could before. You can share your best photographs and have conversations about them. You can explore the beautiful photographs others have taken. You can geotag your photos and save them to albums and groups. And if you want nothing to do with Flickr’s new features, you can just ignore them.

I’m astonished by how well Flickr has shifted to its new mission without leaving legacy users behind. As someone who has made software for more than a quarter century, I can tell you: it is enormously difficult to do this.

Still, many of Flickr’s longtime users feel alienated. They’re expressing far less paint-peeling rage than they did after the 2013 changes, thank goodness, but they’re still quite upset. The leading complaint: there’s no way to opt out of automatic tagging, and no way to delete at once all the tags already generated. Longtime users who have carefully chosen their tags find Flickr’s automatic tags to be an unwelcome intrusion.

Flickr should probably address that. But first, they should congratulate themselves. They’ve done journeyman work.

A slightly revised version of this is cross-posted today to my personal blog, Down the Road.

Quality Testing

When test automation is nothing more than turdpolishing

By Jim Grey (about)

I used to think that writing a fat suite of automated regression tests was the way to hold the line on software quality release over release. But after 12 years of pursuing that goal at various companies, I’ve given up. It was always doomed to fail.

In part, it’s because I’ve always had to automate tests through a UI. When I did straight record-and-playback automation, the tests were enormously fragile. Even when I designed the tests as reusable modules, and even when I worked with a keyword-driven framework, the tests were still pretty fragile. My automation teams always ended up spending more time maintaining the test suite than building new tests. It’s tedious and expensive to keep UI-level test automation running.

But the bigger reason is that I’ve made a fundamental shift in how I think about software quality. Namely, you can’t test in quality – you have to build it in. Once code reaches the test team, it’s garbage in, garbage out. The test team can’t polish a turd.

Writing an enormous pile of automated tests through the UI? Turdpolishing.

I’ve worked in some places where turdpolishing was the best that could be done. Company leadership couldn’t bear the thought of spending the time and money necessary to pay down years of technical debt, and hoped that building out a big pile of automated tests would hold the line on quality well enough. I’ve led the effort at a couple companies to do just that. We never developed the breadth and depth of coverage necessary to prevent every critical bug from reaching customers, but the automation did find some bugs and that made company leadership feel better. So I guess the automation had some value.

But if you want to deliver real value, you have to improve the quality of the code that reaches your test team. Even if the software you’re building is sitting on a mountain of technical debt, better new code can be delivered to the test team starting today. I’m a big believer in unit testing. If your software development team writes meaningful unit tests for all new code that cover 60, 70, 80 percent of the code, you will see initial code quality skyrocket. Other practices such as continuous integration, pair programming, test-driven development, and even good old code reviews can really help, too.

But whatever you do, don’t expect your software test team to be a magic filter through which working software passes. You will always be disappointed.

Process Project Management Quality

If you want to ship software, stay in touch with how much you suck

By Jim Grey (about)

My colleague Matt Block recently posted on his blog a link to an article about how a software shop’s business model affects how well agile scrum works for them. It breaks business models down into emergent, essentially meaning that the company builds product to meet goals such as selling ads or driving traffic, and convergent, essentially meaning that the company builds product that directly serves a target market. The article argues that agile is made for emergent and is a poor fit for convergent. That’s just a sketch of the article; go read it to get the full flavor.

Eminence says: Monrovia Sucks!
Graffiti found in the town neighboring Monrovia

I’ve always worked for companies following convergent business models. We’ve made our money by selling the software we created, which made it always important to deliver a certain scope by a certain time. When those companies implemented agile scrum, they could never fully adapt a key principle of it: when it’s time to ship, you ship whatever is built. In a convergent world, scope is king; you ship when everything specified is built.

I e-mailed my brother, Rick Grey, a link to this article. It’s great to have a brother who does the same thing I do for a living as we can talk endlessly about it. I thought we’d have a conversation about how to scope an agile project, but instead he had a brilliant insight: What if agile is good for convergent-model companies because it tells you sooner how much your project is off track? He gave me permission to share his e-mailed reply, which I’ve edited.

– – –

What if the companies we’ve worked for and all the other convergent-model teams of the world are doing agile just fine? By “just fine” I mean “as good as they do waterfall,” which may not be “just fine,” but we’ll get to that in a minute. Meanwhile, consider:

Long waterfall project:

  • No one pays real attention to progress (there’s always next month to catch up)
  • Engineers go dark, checking out huge sections of the codebase and not merging them back for long periods
  • Engineers (who are notoriously poor estimators) claim 50% done when it’s really about 25% – and then, as the code-complete milestone nears, they (usually innocently) claim 90% done when it’s really 70%
  • A couple of days before the code-complete milestone, engineering finally acknowledges they won’t hit the milestone and delays delivery to QA – “but we’re 95% done, for sure”
  • Under the pressure of already having missed a deadline, developers quietly take shortcuts to make it possible to hit the new QA delivery date
  • Weeks and months of unmerged changes come crashing in, creating conflicts and compile/deploy problems, further delaying delivery to QA
  • QA, now staring with a multiple-week handicap on an already-too-aggressive schedule, quietly takes its own shortcuts
  • QA finds hairy showstopper bugs and so the ship date gets moved
  • Management is livid, so QA goes into confirmatory testing mode just to get it out the door

Agile project of the same size:

  • Much of the above happens at a smaller scale, one iteration at a time
  • You fail to deliver everything planned starting with the first sprint
  • Instead of spending 80% of the project thinking you don’t suck as an organization and the last 20% realizing that you do, agile lets you feel like you suck every step of the way
  • Takeaway for management: “agile sucks” and/or “we suck at agile”

I assert that most teams are bad at delivering under a convergent business model. The hallmark pathologies of software delivery under a convergent model are too numerous and powerful for most teams to overcome, but their struggles are masked by waterfall until the end. Agile surfaces the problems every iteration. You feel like a loser by week 4 instead of week 40.

But this is actually a win. You get better project visibility and a tighter feedback loop, meaning you’ve got a better chance to make adjustments earlier to get the most out of your team you have. Embrace the feedback loop as a chance to make things better, and learn not to view it as proof of how much you (collectively) suck.

– – –

I will add that agile also helps you keep resetting expectations within your organization, because it makes it standard practice to keep reestimating what it will take to finish everything. This is just what I was talking about in my last post (read it here).

Quality The Business of Software

Obamacare,, and how government software gets made

By Jim Grey (about)

I was not surprised when I heard that the Obamacare Web site,, crashed and burned right out of the gate.

But I was disappointed. Regardless of what I think of the Affordable Care Act, it’s the law. I wanted its implementation, including, to go well.

Still, I wasn’t surprised because I know how government software gets made.


Several years ago I worked in middle management for a company that built a government Web application related to health-care customer service. I was in charge of testing it to make sure it worked. It is probably not going out on a limb to say that the people who built experienced many of the same kinds of things I experienced on that project.

Let me be plain up front: I was a poor fit for government software development. I was too free-wheeling and entrepreneurial for the control-and-compliance environment that government contracting encourages. I find it difficult to write about the experience without showing my frustrations with its realities. But I think I understand those realities well and objectively.

The government doesn’t know how to do anything. They hire it all out, and then they manage and administer the process. As a result, on this project they relied heavily on compliance with “best practices,” as if those practices contained some sort of magic that delivered quality software. They don’t, of course; the government was shocked when Version 1.0 of our software had typical quality problems right out of the gate. Those practices served primarily to leave an audit trail the government could follow.

In the end, the project was a success. Despite Version 1.0’s glitches, which we quickly fixed, the software was immediately put to use and led to productivity improvements over an older, green-screen system. I spoke with many of the software’s users, and despite a few grumbles most of them liked using it.

But this was one mighty expensive piece of software to build, from winning the contract to defining what the software should do to building and maintaining the software. Here’s why.

The bid process

I was hired after we won the contract, but I heard stories about the bid process. We had no experience building software on this scale, but we wanted into the lucrative cost-plus government contracting business for its guaranteed profit margins. So we offered a lowball bid aimed at getting the government’s attention, not at what it actually was going to take to build the software. And then to our surprise we won the business. After the elation wore off, we were left with an “oh shit” feeling – we needed to actually build the software for that amount. How the heck would we pull that off?

We finished Version 1.0 on my watch, but I don’t know whether we delivered it within budget. It seemed to me, however, that the bid process encouraged underbidding and overspending.

The requirements process

When you make something for the government, they want to know exactly what they’re getting, in excruciating detail. So we started by writing the biggest, thickest requirements document I’ve ever seen. We weren’t building this software from scratch – we bought what was then the leading customer-relationship-management software platform and used it’s software-development toolkit to heavily customize it for our needs. But we had to write highly detailed specifications anyway.


Requirements gathering was more about navigating choppy political waters and brokering compromise than about specifying usable, stable, and scalable software. To develop the requirements, we flew in representatives from every company that would use the software and put them into a big room to hash out how it would work. But all of these companies were themselves government contractors. Their people all knew each other – and, frequently, competed against each other for contracts. Some of them competed against us trying to win this contract. The room was thick with mistrust and agenda,

The building process

The government lives in constant fear of being screwed by its contractors. It goes back to Abraham Lincoln’s time, when rampant fraud among suppliers threatened the Civil War effort. (Seriously. Gunpowder cut with sawdust. Uniforms that dissolved in the rain. Read about it here.)

So not only did the government hire us to build the software, they hired another firm to watch us do it. This is called independent verification and validation, or IV&V. Their job was to make sure that we followed software-development “best practices” and that we built what we said we were going to build. But making matters worse, the company that won the IV&V contract, I’m told, also had bid on the project to build the software in the first place. It always seemed clear to me that they wanted to show us to be fools so that they could take over the project. They ran us ragged over every last minor detail.

The level of perfectionism in terms of “best practice” adherence was intense. Yet when we delivered the software, it had several usability challenges and outright bugs. Worse, it struggled to keep up with the load users placed on it. If you’ve ever built software, you know that these are typical challenges with Version 1.0 of anything. But the government was shocked, dismayed, and appalled. We spent the next several months issuing update releases to make it perform as it needed to. Of course, IV&V ran roughshod over us the whole way – but they were in the hot seat too because their “best practices” had failed to prevent these problems.

The process overhead

Process is tricky to apply well. Too little leads to chaos, too much adds needless cost and delay. I’m not anti-process – rather, I’ve built a career on bringing just the right level of process into a software development environment to make it effective. But most of the process we had to follow involved documenting our work to prove to the government that we had actually done it. This frequently hindered our ability to deliver software cost-effectively, and sometimes stood in the way of quality.


We bought a well-known software product that stored requirements and linked them to the code and the test cases so we could prove that we built and tested each requirement. This involved tracing every requirement to every line of code and every test case, an enormous task in and of itself. I personally created a traceability report each quarter and sent it to the government. All of this required a lot of work from skilled technical people, but in my judgment did not materially help us better build or test the software.

Our test cases were contractually required to be documented in such detail that a trained monkey could execute them. They were at the level of “Step 1. Type your username into the Login box. Expected result: Your username appears in the Login box. Step 2. Type your password into the Password box. Expected result: A row of asterisks appears in the Password box.” A test case that took fifteen minutes to execute could have taken two hours to write and could have been a dozen pages long. We had hundreds of test cases. Many test cases were not appropriate to be added to the regression test suite and be executed every release, so we spent a lot of time writing them to execute them a small handful of times.

It was supposed to be against the rules to write a bug report that had no associated test case. Testers would often stumble upon a bug by accident or find one while doing ad-hoc testing – and then find themselves in a conundrum. Writing the test case that led to the bug and tracing it back to requirements took time we frequently lacked at that point in the game. When the bug was serious enough, everybody looked the other way when it wasn’t associated with a test case. I wonder whether any of the testers avoided writing test cases by falsely associating the bug with an existing test case.

We did get one big break. We lobbied for, and to our astonishment successfully won, an exception to a standard practice: we did not have to print screen shots of the results of every test step. Other projects for which we had contracts had to do this. As you can imagine, managing all that paper slowed progress considerably. Those projects collected those screen shots into boxes, which were sent to offsite storage.

The mounting costs

All of these process steps meant spending more money, mostly in the form of human effort. There were other ways in which the government’s way of making software added costs to the project. Here’s a short, incomplete list:

  • Frequent, ongoing training about compliance with standards, which, amusingly, is where I learned about the Civil War fraud.
  • Entering time worked on three separate software systems – one for the project-management tool, one for government accounting, and one my employer used to manage time off. I spent an hour a week entering time.
  • A prohibition on open-source software. The government wanted all software used to be “supported,” meaning that there had to be a phone number to call for help. So we spent money on commercial tools that sometimes weren’t as capable as open-source versions. In a couple cases, the only tool or component available for a task was open source, and we couldn’t build the application without it. We did get the government to bend the rule for us in those cases, but it took heavily documented justifications and layers of approvals to make it happen.
  • Strict separation of duties to protect the government against a rogue contract employee from sabotaging the system. This meant, for example, that I couldn’t restart the computers we used for testing when they needed it, I knew how to do it, but I was not allowed. I had to write a request for an infrastructure engineer to do it, and then wait sometimes for days for it to reach the top of his priority list.

As you can see, there was nothing easy or inexpensive about this project. Yet we got it done and the software worked. It’s still in use today. We showed that it’s possible – just slow and expensive – to build software the government’s way.

So I have great empathy for those who built No doubt about it: the site failed, and they built it. But they must feel tremendous pressure right now as they scramble to both handle the heat they’re getting from the government and to rush fixes to the site so that it works well enough. But if their experience building that site was anything like my experience building government software, it’s hardly shocking that it launched with challenges.

Quality Testing

Giving testers less to do

By Jim Grey (about)

I hear legends of companies who hire nothing but programmers in their test departments, and rely almost entirely on code-based tests in their software development methodologies. I’ve never seen such a shop in person. Out here in the Midwest, most testing involves humans directly exercising user interfaces.

Eric Jacobson recently wondered aloud on his blog whether testers are simply too busy finding bugs through the UI to move into more technical or programmatic testing. My typical experience has been that Development doesn’t deliver software to QA that is solid enough that testers didn’t have to spend the bulk of their time making sure the UI and the immediate interface with the database are working.

For years my schtick was to join a company when it is transitioning from small to mid-sized, when it was feeling crushed by quality problems caused by mounting technical and defect debt. They had focused on getting to market fast but had grown a tangled mess of code. My response was always to grow the QA team, primarily by hiring automation engineers to build out large automated regression suites.

After doing that at a couple companies, I lost interest in the strategy. It was expensive, it took too much time, and it never moved the quality needle enough. It just seems absurd to me now to prop up years of thin development practices with more post-development testing, especially given that automated tests in QA generally work through the UI and are therefore brittle and slow.

The view from my deck after a particularly heavy rain. You’d better believe the sump pump was running.

It’s like when my home’s crawl space used to flood after each heavy rain. A company that specialized in drying out crawl spaces recommended $6,000 in a French drain, multiple sump pumps, and encapsulation to move the water out and keep the moisture from seeping up into the house. But a buddy of mine who builds houses said, “You’ve got a negative grade around your foundation. Buy $300 in topsoil and a couple cases of beer. Invite all your friends over and issue them shovels. Fix the grading and you’ll keep the water from getting in.” I went with the topsoil and the friends. I also put in one sump pump, just in case. It runs pretty much only when the rain is torrential.

In case my admittedly imperfect metaphor isn’t obvious, the graded topsoil is the unit testing, and the sump pump is the lightweight QA automation solution. Let’s try preventing the bugs from getting in as much as we can, shall we? But let’s still check for the odd and extreme cases that are bound to get by.

This is the hierarchy of testing similar to the one Mike Kelly recommends in this blog post. (He’s building on the work of Brian Marick, by the way, who gives a framework for testing in this blog post.) Mike recommends building on the order of thousands of automated unit and component tests, hundreds of automated business-logic tests, and tens of UI-level automated tests. The unit tests run fast and frequently. The inherently slow UI automated tests run far less often.

That’s what I resolved to try after I lost my will to build huge automated regression suites in QA. I deliberately took a QA leadership role with a company transitioning out of its startup phase. The product wasn’t yet too large and too saddled with technical and defect debt; I felt like we could make up the lost ground. After some encouragement from me, the fellow who ran engineering began insisting that developers write automated unit tests for all new code. We started building each release into an environment where developers could perform rudimentary testing on it themselves with realistic data. Their goal was to make sure all the happy paths work, so that when my testers get in there they are not immediately stymied by obvious critical bugs.

Before we started to make this transition, I quietly started tracking a simple little metric. I counted the defects QA found, plus the defects created in the release that were found in production, and divided by the number of development hours in the release. The metric was a little mushy because I was working with estimated and not actual hours, and because I was having to make judgment calls about which defects in production were caused by the release and which were latent bugs. But it’s hard to ignore the order-of-magnitude improvement we got on this metric. We were tracking to about .3 defects per development hour in the couple releases before we made these changes, and within two releases we dropped to about .05-.09 defects per development hour and held steady.

This had incredible impact. Initial quality went way up in QA, meaning initial quality went way up in production. Just adding these two steps was like flipping a switch not only on many of the quality challenges we faced, but also on the amount of chaos and churn we experienced as an overall engineering team.

A side benefit was that developers seemed happier. It wasn’t that writing tests made them happy – it didn’t. They would rather have built more new stuff. But delivering better code into QA meant that they spent less time in the fix-test cycle and were interrupted by way fewer production crises. They told me that finally they could focus.

The reason why I titled this post as I did – and it’s meant to be tongue in cheek, by the way – is because my strategy means hiring more developers and fewer testers. But the benefit to testers is that they get to do far more interesting work, going deeper, thinking more creatively, and exploring more technical kinds of testing.

I can’t imagine ever moving to an all-code testing strategy. All automated testing can do is repeat series of actions. Skilled human testers can cope with complexity and adapt to change, gain and synthesize knowledge and apply it to their testing, know from experience where the product is likely to be broken, and explore the system creatively. The kinds of products I’ve always delivered and am likely to keep delivering will always need that.

Process Quality The Business of Software

“Software engineering” might be an oxymoron

By Jim Grey (about)

I’ve said it to my test teams many times: Making software isn’t quite engineering. Building a bridge – now that’s engineering. You determine how long the bridge needs to be, how much load it needs to carry, and what kind of bridge to build (steel truss, concrete arch, etc.), and from there it’s mostly mathematics and physics. Just run the calculations and you’re good.

We have bridge-building down. With a couple of notable exceptions, such as the Tacoma Narrows bridge which heaved and twisted and finally collapsed (video here), new bridges seldom fail. Old bridges fail sometimes, but it’s reliably due to accident or neglect.

The S Bridge at Blaine

My apologies to any civil engineers who stumble upon this post. I’m sure you’re cringing that I’m overlooking many subtleties of your discipline.

There’s nothing subtle, however, about how often software fails. Our users aren’t happy about it, but they aren’t surprised by it, either.

For any thing you ask a software developer to build, there will be a whole bunch of valid ways to do it, each with its own unique ways of creating failures. This is especially true when when that developer enhances existing software that he or she didn’t make in the first place. It’s tough to predict exactly how the enhancements will affect the rest of the software. The more lines of legacy code, the more time and analysis it takes to think that through.

If a developer had unlimited time and money, it might be possible to deliver perfect software. Ah, a developer can dream! But here’s where bridge-building and making software have an important thing in common: time and money are never unlimited.

I sympathize with the folks who call software a craft. People who make software use tools and knowledge in its design and construction. These are hallmarks of craft.

Another way that software is like craft is that it’s difficult to fully separate the design from the making. Even when one person designs the software and another writes the code, the coder has to make a bunch of lower-level design decisions along the way.

The software craftsmanship movement meets corporate resistance because revenue and profit ride on what we build. Our companies need to sell features to meet revenue projections, or deliver bug fixes to retain customers. That’s why timed delivery is so important: if you wait too long to deliver, the opportunity to grow or retain revenue begins to shrink.

Feeling pressure to deliver, yet knowing that if we deliver junk we’ll be in an even worse pickle, we tend to manage software-development projects like engineering projects. I think we feel like we have better control when we manage them that way. But that feeling of control can’t mask it: no matter how tightly you plan a software project, no matter how you shape your development and delivery processes to mitigate risk, no matter how much you try to predict the troubles you’ll encounter, you will discover things along the way can seriously derail those plans. It happens in two-week scrum sprints just as it does in ten-month waterfall projects. Discovery is simply endemic to software development.

As a software project manager, I try to build in buffers for the unknown. I also steer projects daily based on what we discover, adjusting plans and communicating impacts to whomever needs to know. I try to make sure our development practices deliver the best possible code to test, and then I try to arrange testing to find the worst bugs first so that near the hoped-for end, only minor bugs remain. Despite all that, important bugs still sometimes reach the user.

We ship when the software is good enough. What “good enough” means varies from context to context, but it is unfailingly short of perfect. Shipping at good enough means you succeeded.

If I delivered bridges that way, I’d never drive over one I built.

Managing People Quality

Four critical tips for getting your ideas implemented at work

By Jim Grey (about)

After eight years writing and editing software documentation, I itched to make software again, like I did in college. So I took a job with a software company as a tester.

My corporate mug shot from those days - yes, long hair and a beard
My corporate mug shot from those days – the Grizzly Adams years

The company made a sprawling product for an industry I knew nothing about, so I had lots to learn. Given my background, the first thing I did was reach for the manuals. They were incomplete, inaccurate, and poorly organized. There was online help, but it was unnavigable. Nobody was ever going to use the documentation to successfully learn the product. My boss managed the technical writers too, so I marched into his office to complain. I wasn’t delicate about it. “This stuff is terrible! I can’t believe you ship this to customers! It’s an embarrassment.”

He leaned back in his chair and calmly said, “What would you do to fix it?”

“I would throw it out and start over,” I began. And then over the next ten minutes, off the top of my head I outlined a project that would create new manuals and online help that would actually help users not just use the product, but get the best value from it.

Three days later, he called me back into his office. “Remember that thing you said you’d do with the documentation? You are now manager of the Documentation Department. Make it so.”

It was a bold move for him to take a gamble on me. I’d never managed people, and my project management experience was limited. What I didn’t know was that every year the company surveyed its users about product quality – and every year the documentation got the most complaints. My boss had been told to fix this problem, but had no idea how. Then I walked in with a solution that sounded like it just might work.

Most of this story is just the nuts and bolts of the project – hiring and coaching staff, creating plans and schedules, doing visual and information design for the new manuals and online help, managing the project, reporting to management, and even doing some of the writing myself. The details would be interesting only to another technical writer. Much of this was new to me, but I had excellent support from a boss who needed to see his gamble pay off. He also helped me navigate the inevitable office politics, including another manager who kept trying to torpedo my efforts. Also, the program manager helped me master the project management tools we used, none of which I had ever even seen before. My team and I worked on the project for a year and a half. It’s not often a technical writing team gets an opportunity to do a clean-sheet rewrite like this, and they were all enthusiastic about it. I worked hard to clear their roadblocks, respond quickly to their concerns, and generally be a good guy to work for, and it paid off in the excellent work they delivered. When we were done, we had written over 3,000 pages and had created a seven-megabyte context-sensitive online help system.

I was invited to demonstrate the new online help at the annual user conference. 600 people flew in from all over the United States, and there I was before them on the opening session’s main stage. My presentation was the last of a series about new features in the product. When I finished, to my astonishment the online help received enthusiastic applause – and then one person stood, and a few more, and several more, and soon the whole room was standing and applauding. That moment remains the pinnacle of my career; I can’t imagine anything else ever overtaking it. The icing on the cake was when I overheard the VP of Sales say to my boss, “All the blankety-blank new features we pushed you to put into the product, and everybody liked the blankety-blank online help the best! The online help! You’ve got to be blankety-blank kidding me!”

I used to think I was just a grunt paid to trade the words I wrote for a paycheck. Through this project I learned just how interdependent everyone is at a company, and how everybody is important. Specifically, I learned:

If you want to see your great ideas implemented, they need to solve a big problem the company thinks it has. The problems your company thinks it has may very well be different from the problems your company actually has. Frame your ideas in terms of solving the problems the company thinks it has.

When you’re doing something you’ve never done before, find people who can coach you through it. I don’t care how far down the ladder you are at your company, your success helps determine other peoples’. Look for someone who both knows how to do the thing you need to learn and whose success depends in part on yours – that last bit motivates them to help you. In my case, it was my boss and the program manager.

Work for people who clear roadblocks out of your way so you can be most effective. I now leave situations where the boss doesn’t help me in this way. It’s that critical.

Your success always depends on other people, so treat them well. In giving my team an exciting assignment and creating an environment in which they could focus, they happily turned out huge quantities of good work. Also, after we shipped the new documentation, I promoted every writer. They deserved it.

A footnote: That company went through tough times a few years later and so we all moved on, some for better positions and others (like me) because they couldn’t afford to pay us anymore. One of the writers who had worked for me called me one day seven years later, by which time I really had moved into software testing. She said, “We have an opening here for a test manager. I’d love to work with you again, and this is a good place to work. You really should apply.” I did, and I got the job. I found out later that just before my interview, she went to the VP and said, “He’s a great boss. You don’t want to let him get away.”

Sometimes the good things you do come back to you!

Managing People Quality

Multitasking hurts productivity

By Jim Grey (about)

“You multitask like a madman,” my boss said to me.

She meant it as a compliment, but it brought me down. I was exhausted, teetering on the edge of burnout precisely because I had been multitasking an enormous workload.

Not a skill to be praised

I managed 15 people across four teams: testers that delivered monthly bug-fix releases, test automation developers, performance testers, and technical writers. My teams were solid and I had great leads in place, which freed me to work with a security-testing vendor to start doing regular penetration tests of our product, and with a translation company to translate our product user interface into five languages each release. I had a lot going on, but I was handling it.

But then the company decided to lean hard into more international markets. The executive team asked me to gather quotes to translate our product UI into even more languages, and also to translate our giant online help system, which we had only ever offered in English. The costs were an order of magnitude more than the executive team imagined, and so I was called into endless meetings and hallway discussions to provide more data as the executives squabbled with each other over strategy. This all sucked down more than a third of my time – but I could never focus on this work for more than ten or twenty minutes because I was still managing four teams with leads who had questions and needed me to remove roadblocks.

My performance began to suffer. While I had my eye on one ball, another would drop. I started making silly mistakes. It all wore me down to a nub. To keep sane, I ended up not asking, but telling my boss to take things off my plate so I could survive. I really wanted the translation stuff to go, as I didn’t enjoy it very much. But instead she gave the technical writing and bug fix teams to other managers.

I really mean task switching, not multitasking

Everybody calls what I was doing multitasking, but it really wasn’t. Real multitasking is when we do more than one thing at a time, such as driving and talking, or walking and chewing gum. But when two things come along that require focused attention, most of us can’t do them simultaneously. We work on one task, and then we work on the next. That’s really called task switching, and it happens every time we stop testing a feature release to test an emergency hotfix, or even get interrupted to answer a question. I’ve just called it multitasking here so far so that Google’s sweet, sweet searches can find this post.

Task switching makes tasks take longer overall. It also hinders learning – all that switching from one task to another keeps things from sticking in our brains. It really is better to work on, and finish, one thing at a time. We are so much more productive that way.

The hidden costs of task switching

Say you’re working on task A when task B arrives. If you want to avoid task switching, you finish task A and then work on task B.


But say task B is hot, and your boss needs you to work on it right now. Task B goes out sooner, at the cost of delaying task A.


But there’s a hidden cost: unless a task is automatic or menial, it takes time to get oriented to it, even when you’re returning to it after only a brief interruption. You must at least try to remember where you left off. That orientation time delays overall completion.


This cost mounts the more you switch tasks. If you switch repeatedly among tasks A, B, and C, not only do all tasks finish later, but tasks A and B finish much later.

Task switching hinders learning

Dairy Queen photo

One of my first jobs was working the counter at a Dairy Queen. It took me a couple weeks to learn the technique for creating their soft serve’s signature shape, but then I could do it without even thinking about it. It had become a habit.

Making software isn’t the same as making ice-cream cones. Being effective and productive is much more about deepening skills and knowledge than about building habits.

Single-tasking helps deepen skills and knowledge because it stimulates the hippocampus, which is part of the brain that puts information in long-term memory. Task switching hurts this because it stimulates the basal ganglia, which is the part of the brain that is good at building habits.

In software development, you simply get better at what you do faster when you single task.

What to do then?

It’s impossible to entirely eliminate task switching – emergencies will arise, questions will need to be answered. And I do believe in the power of collaboration to deliver better software. But it’s a good investment to minimize task switching as much as you can.

If you’re in management, make singletasking a value. Demonstrate it by removing obstacles so people can focus on one thing for long periods:

  • Can you have “do not disturb” periods or let people work from home when they need to complete a critical task?
  • Can you give people private offices?
  • Can you schedule meetings that involve your team members so they don’t interrupt work as much, such as first thing in the morning, just before lunch, or at the end of the day?
  • Can you organize your teams so that you have people dedicated to handling customer emergencies (in a prioritized way, so they can focus on one at a time) and people dedicated to building new product?

If you’re not in management, you can still do a lot to clear your decks so you can concentrate:

  • Adjust your work schedule so that you arrive earlier or stay later than most others. (My normal work hours have been 7:30 to 4:30 for more than 20 years. The first 60-90 minutes of the day are my most productive because few others are in the office.)
  • Stop responding to e-mail as it arrives; instead, set aside specific times when you read and respond to it.
  • Work out a do-not-disturb convention with your team. In one group I worked with, we placed a little blue flag on our desk when we needed to go heads down.
  • Ask if you can work from home when you really need to concentrate.
  • At least put on your headphones, which many people interpret as a sign that you don’t want to be disturbed.

And of course, when your boss pulls you in too many directions, be sure to ask him or her to help you prioritize your work so you can focus on one thing at a time, or to assign work across your team in ways that balances the load. Multitasking alone doesn’t usually lead to burnout, but it absolutely brings you there faster when you have too much on your plate.


The difference between quality and excellence

By Jim Grey (about)

My long career in software development was briefly interrupted in the mid 1990s when I took a job editing technology books. My first project was editing a new edition of one of the publisher’s biggest sellers. I drew this plum assignment not for my l33t editorial skills, but for being the new guy. The author had a reputation for running his editors ragged, and the other editors were glad to scrape this book onto me.

I never understood why, because editing the author’s work was a pleasure. His writing was clear, engaging, and funny. When I made suggestions for improvement, he gladly took most of them. He even called me to discuss and improve on a few of them. He did require a lot of attention, all of it for the good of his book, as he sweated every detail. For example, I spent hours on the phone with him poring over proofs, which are draft printouts of the book after it’s been laid out. It’s the last stage before the book is printed, and he used this time to polish his work further. He sometimes rewrote entire paragraphs to make them funnier (as humor was his book’s hallmark) or reworked graphics to make them clearer, all of which never ceased to thrill the overworked layout department.

When we were done, we had a book to be proud of. I displayed my copy prominently on my bookshelf. It then sold a bazillion copies.

My next assignment was to edit a thick book about a communications technology that was still popular then. This author handed in cumbersome and clumsy text full of basic writing errors. His humor was lame and sometimes offensive. His technical explanations were incorrect and incomplete. I spent hours hammering his work into something marginally usable. He ignored most of my suggestions and avoided taking my calls.

After he had handed in 100 of the book’s 800 pages, he announced that he was done writing. I was incredulous as he explained that the remaining 700 pages would be reprinted (and poorly written) documentation from shareware related to this technology. What laziness! What gall! I accosted the acquisitions editor – that’s the guy who hired this author – and raised an unholy ruckus. I said, “This book will be useful to nobody!” He shrugged. “It’s his book. Is it on schedule?”

I spent the next several weeks with my stomach knotted from anger and disgust as I edited those 700 pages. I pinched my nostrils shut as I sent the chapters to layout. I suppressed my gag reflex as I reviewed the proofs. I rolled my eyes when my copy of the finished book arrived. I hid it in a dusty and forgotten corner of my bookshelf. Then I succeeded for several weeks at forgetting the whole sordid ordeal until I received a letter from somebody who actually bought the book. He wrote something that knocked me out of my chair:

“Dear Sir. I was trying to figure out this communications technology when I found your book. I wanted to tell you that it was exactly what I needed. I played with a couple of the programs the book described and, with the book’s help, got one of them running. Thank you for publishing this book. Sincerely, Some Reader.”

I was humbled. No, I was shamed. Mr. High-and-Mighty Editor thought that the author created a steaming pile of feces while giggling at the teller’s window as he cashed his advance check. Yet somebody found the book to be exactly what he needed.

I started to see that maybe I wasn’t the final arbiter of quality, that maybe quality is what meets the customer’s needs. I’ve carried this critical lesson into every job I’ve had since.

But now, many years hence, I have learned another lesson from these two books.


That first book was Macs For Dummies, Third Edition, by David Pogue, a keystone of the juggernaut Dummies franchise. More recently, you might have seen David’s technology column in the New York Times, or his acclaimed The Missing Manual series of books, or maybe the stories he does for CNBC and for CBS News Sunday Morningor the four-part series he did for NOVA on PBS. David has done very well for himself since his Dummies days. He has worked very hard for it, leveraging every opportunity with his characteristic energy, wit, and grace. He could have gone a long way on those traits alone. But his ability to do top-flight work truly distinguishes him.

I haven’t been very kind to the other author here so I won’t reveal his name or the title of his book, which sold poorly despite the one fan letter. I’ve encountered him here and there over the years and he has always seemed very happy. But he has not achieved a hundredth of what David Pogue has.

The new lesson? Something modest may meet a customer’s need. But it sure is satisfying – and the hard work sure worth it – when you can really delight the customer. And David Pogue’s case shows that talent and hard work can still really pay off.