Categories
Quality The Business of Software

Obamacare, healthcare.gov, and how government software gets made

By Jim Grey (about)

I was not surprised when I heard that the Obamacare Web site, healthcare.gov, crashed and burned right out of the gate.

But I was disappointed. Regardless of what I think of the Affordable Care Act, it’s the law. I wanted its implementation, including healthcare.gov, to go well.

Still, I wasn’t surprised because I know how government software gets made.

obamacare2

Several years ago I worked in middle management for a company that built a government Web application related to health-care customer service. I was in charge of testing it to make sure it worked. It is probably not going out on a limb to say that the people who built healthcare.gov experienced many of the same kinds of things I experienced on that project.

Let me be plain up front: I was a poor fit for government software development. I was too free-wheeling and entrepreneurial for the control-and-compliance environment that government contracting encourages. I find it difficult to write about the experience without showing my frustrations with its realities. But I think I understand those realities well and objectively.

The government doesn’t know how to do anything. They hire it all out, and then they manage and administer the process. As a result, on this project they relied heavily on compliance with “best practices,” as if those practices contained some sort of magic that delivered quality software. They don’t, of course; the government was shocked when Version 1.0 of our software had typical quality problems right out of the gate. Those practices served primarily to leave an audit trail the government could follow.

In the end, the project was a success. Despite Version 1.0’s glitches, which we quickly fixed, the software was immediately put to use and led to productivity improvements over an older, green-screen system. I spoke with many of the software’s users, and despite a few grumbles most of them liked using it.

But this was one mighty expensive piece of software to build, from winning the contract to defining what the software should do to building and maintaining the software. Here’s why.

The bid process

I was hired after we won the contract, but I heard stories about the bid process. We had no experience building software on this scale, but we wanted into the lucrative cost-plus government contracting business for its guaranteed profit margins. So we offered a lowball bid aimed at getting the government’s attention, not at what it actually was going to take to build the software. And then to our surprise we won the business. After the elation wore off, we were left with an “oh shit” feeling – we needed to actually build the software for that amount. How the heck would we pull that off?

We finished Version 1.0 on my watch, but I don’t know whether we delivered it within budget. It seemed to me, however, that the bid process encouraged underbidding and overspending.

The requirements process

When you make something for the government, they want to know exactly what they’re getting, in excruciating detail. So we started by writing the biggest, thickest requirements document I’ve ever seen. We weren’t building this software from scratch – we bought what was then the leading customer-relationship-management software platform and used it’s software-development toolkit to heavily customize it for our needs. But we had to write highly detailed specifications anyway.

obamacare1

Requirements gathering was more about navigating choppy political waters and brokering compromise than about specifying usable, stable, and scalable software. To develop the requirements, we flew in representatives from every company that would use the software and put them into a big room to hash out how it would work. But all of these companies were themselves government contractors. Their people all knew each other – and, frequently, competed against each other for contracts. Some of them competed against us trying to win this contract. The room was thick with mistrust and agenda,

The building process

The government lives in constant fear of being screwed by its contractors. It goes back to Abraham Lincoln’s time, when rampant fraud among suppliers threatened the Civil War effort. (Seriously. Gunpowder cut with sawdust. Uniforms that dissolved in the rain. Read about it here.)

So not only did the government hire us to build the software, they hired another firm to watch us do it. This is called independent verification and validation, or IV&V. Their job was to make sure that we followed software-development “best practices” and that we built what we said we were going to build. But making matters worse, the company that won the IV&V contract, I’m told, also had bid on the project to build the software in the first place. It always seemed clear to me that they wanted to show us to be fools so that they could take over the project. They ran us ragged over every last minor detail.

The level of perfectionism in terms of “best practice” adherence was intense. Yet when we delivered the software, it had several usability challenges and outright bugs. Worse, it struggled to keep up with the load users placed on it. If you’ve ever built software, you know that these are typical challenges with Version 1.0 of anything. But the government was shocked, dismayed, and appalled. We spent the next several months issuing update releases to make it perform as it needed to. Of course, IV&V ran roughshod over us the whole way – but they were in the hot seat too because their “best practices” had failed to prevent these problems.

The process overhead

Process is tricky to apply well. Too little leads to chaos, too much adds needless cost and delay. I’m not anti-process – rather, I’ve built a career on bringing just the right level of process into a software development environment to make it effective. But most of the process we had to follow involved documenting our work to prove to the government that we had actually done it. This frequently hindered our ability to deliver software cost-effectively, and sometimes stood in the way of quality.

obamacare3

We bought a well-known software product that stored requirements and linked them to the code and the test cases so we could prove that we built and tested each requirement. This involved tracing every requirement to every line of code and every test case, an enormous task in and of itself. I personally created a traceability report each quarter and sent it to the government. All of this required a lot of work from skilled technical people, but in my judgment did not materially help us better build or test the software.

Our test cases were contractually required to be documented in such detail that a trained monkey could execute them. They were at the level of “Step 1. Type your username into the Login box. Expected result: Your username appears in the Login box. Step 2. Type your password into the Password box. Expected result: A row of asterisks appears in the Password box.” A test case that took fifteen minutes to execute could have taken two hours to write and could have been a dozen pages long. We had hundreds of test cases. Many test cases were not appropriate to be added to the regression test suite and be executed every release, so we spent a lot of time writing them to execute them a small handful of times.

It was supposed to be against the rules to write a bug report that had no associated test case. Testers would often stumble upon a bug by accident or find one while doing ad-hoc testing – and then find themselves in a conundrum. Writing the test case that led to the bug and tracing it back to requirements took time we frequently lacked at that point in the game. When the bug was serious enough, everybody looked the other way when it wasn’t associated with a test case. I wonder whether any of the testers avoided writing test cases by falsely associating the bug with an existing test case.

We did get one big break. We lobbied for, and to our astonishment successfully won, an exception to a standard practice: we did not have to print screen shots of the results of every test step. Other projects for which we had contracts had to do this. As you can imagine, managing all that paper slowed progress considerably. Those projects collected those screen shots into boxes, which were sent to offsite storage.

The mounting costs

All of these process steps meant spending more money, mostly in the form of human effort. There were other ways in which the government’s way of making software added costs to the project. Here’s a short, incomplete list:

  • Frequent, ongoing training about compliance with standards, which, amusingly, is where I learned about the Civil War fraud.
  • Entering time worked on three separate software systems – one for the project-management tool, one for government accounting, and one my employer used to manage time off. I spent an hour a week entering time.
  • A prohibition on open-source software. The government wanted all software used to be “supported,” meaning that there had to be a phone number to call for help. So we spent money on commercial tools that sometimes weren’t as capable as open-source versions. In a couple cases, the only tool or component available for a task was open source, and we couldn’t build the application without it. We did get the government to bend the rule for us in those cases, but it took heavily documented justifications and layers of approvals to make it happen.
  • Strict separation of duties to protect the government against a rogue contract employee from sabotaging the system. This meant, for example, that I couldn’t restart the computers we used for testing when they needed it, I knew how to do it, but I was not allowed. I had to write a request for an infrastructure engineer to do it, and then wait sometimes for days for it to reach the top of his priority list.

As you can see, there was nothing easy or inexpensive about this project. Yet we got it done and the software worked. It’s still in use today. We showed that it’s possible – just slow and expensive – to build software the government’s way.

So I have great empathy for those who built healthcare.gov. No doubt about it: the site failed, and they built it. But they must feel tremendous pressure right now as they scramble to both handle the heat they’re getting from the government and to rush fixes to the site so that it works well enough. But if their experience building that site was anything like my experience building government software, it’s hardly shocking that it launched with challenges.

Categories
Quality Testing

Giving testers less to do

By Jim Grey (about)

I hear legends of companies who hire nothing but programmers in their test departments, and rely almost entirely on code-based tests in their software development methodologies. I’ve never seen such a shop in person. Out here in the Midwest, most testing involves humans directly exercising user interfaces.

Eric Jacobson recently wondered aloud on his blog whether testers are simply too busy finding bugs through the UI to move into more technical or programmatic testing. My typical experience has been that Development doesn’t deliver software to QA that is solid enough that testers didn’t have to spend the bulk of their time making sure the UI and the immediate interface with the database are working.

For years my schtick was to join a company when it is transitioning from small to mid-sized, when it was feeling crushed by quality problems caused by mounting technical and defect debt. They had focused on getting to market fast but had grown a tangled mess of code. My response was always to grow the QA team, primarily by hiring automation engineers to build out large automated regression suites.

After doing that at a couple companies, I lost interest in the strategy. It was expensive, it took too much time, and it never moved the quality needle enough. It just seems absurd to me now to prop up years of thin development practices with more post-development testing, especially given that automated tests in QA generally work through the UI and are therefore brittle and slow.

flood
The view from my deck after a particularly heavy rain. You’d better believe the sump pump was running.

It’s like when my home’s crawl space used to flood after each heavy rain. A company that specialized in drying out crawl spaces recommended $6,000 in a French drain, multiple sump pumps, and encapsulation to move the water out and keep the moisture from seeping up into the house. But a buddy of mine who builds houses said, “You’ve got a negative grade around your foundation. Buy $300 in topsoil and a couple cases of beer. Invite all your friends over and issue them shovels. Fix the grading and you’ll keep the water from getting in.” I went with the topsoil and the friends. I also put in one sump pump, just in case. It runs pretty much only when the rain is torrential.

In case my admittedly imperfect metaphor isn’t obvious, the graded topsoil is the unit testing, and the sump pump is the lightweight QA automation solution. Let’s try preventing the bugs from getting in as much as we can, shall we? But let’s still check for the odd and extreme cases that are bound to get by.

This is the hierarchy of testing similar to the one Mike Kelly recommends in this blog post. (He’s building on the work of Brian Marick, by the way, who gives a framework for testing in this blog post.) Mike recommends building on the order of thousands of automated unit and component tests, hundreds of automated business-logic tests, and tens of UI-level automated tests. The unit tests run fast and frequently. The inherently slow UI automated tests run far less often.

That’s what I resolved to try after I lost my will to build huge automated regression suites in QA. I deliberately took a QA leadership role with a company transitioning out of its startup phase. The product wasn’t yet too large and too saddled with technical and defect debt; I felt like we could make up the lost ground. After some encouragement from me, the fellow who ran engineering began insisting that developers write automated unit tests for all new code. We started building each release into an environment where developers could perform rudimentary testing on it themselves with realistic data. Their goal was to make sure all the happy paths work, so that when my testers get in there they are not immediately stymied by obvious critical bugs.

Before we started to make this transition, I quietly started tracking a simple little metric. I counted the defects QA found, plus the defects created in the release that were found in production, and divided by the number of development hours in the release. The metric was a little mushy because I was working with estimated and not actual hours, and because I was having to make judgment calls about which defects in production were caused by the release and which were latent bugs. But it’s hard to ignore the order-of-magnitude improvement we got on this metric. We were tracking to about .3 defects per development hour in the couple releases before we made these changes, and within two releases we dropped to about .05-.09 defects per development hour and held steady.

This had incredible impact. Initial quality went way up in QA, meaning initial quality went way up in production. Just adding these two steps was like flipping a switch not only on many of the quality challenges we faced, but also on the amount of chaos and churn we experienced as an overall engineering team.

A side benefit was that developers seemed happier. It wasn’t that writing tests made them happy – it didn’t. They would rather have built more new stuff. But delivering better code into QA meant that they spent less time in the fix-test cycle and were interrupted by way fewer production crises. They told me that finally they could focus.

The reason why I titled this post as I did – and it’s meant to be tongue in cheek, by the way – is because my strategy means hiring more developers and fewer testers. But the benefit to testers is that they get to do far more interesting work, going deeper, thinking more creatively, and exploring more technical kinds of testing.

I can’t imagine ever moving to an all-code testing strategy. All automated testing can do is repeat series of actions. Skilled human testers can cope with complexity and adapt to change, gain and synthesize knowledge and apply it to their testing, know from experience where the product is likely to be broken, and explore the system creatively. The kinds of products I’ve always delivered and am likely to keep delivering will always need that.

Categories
Process Quality The Business of Software

“Software engineering” might be an oxymoron

By Jim Grey (about)

I’ve said it to my test teams many times: Making software isn’t quite engineering. Building a bridge – now that’s engineering. You determine how long the bridge needs to be, how much load it needs to carry, and what kind of bridge to build (steel truss, concrete arch, etc.), and from there it’s mostly mathematics and physics. Just run the calculations and you’re good.

We have bridge-building down. With a couple of notable exceptions, such as the Tacoma Narrows bridge which heaved and twisted and finally collapsed (video here), new bridges seldom fail. Old bridges fail sometimes, but it’s reliably due to accident or neglect.

The S Bridge at Blaine

My apologies to any civil engineers who stumble upon this post. I’m sure you’re cringing that I’m overlooking many subtleties of your discipline.

There’s nothing subtle, however, about how often software fails. Our users aren’t happy about it, but they aren’t surprised by it, either.

For any thing you ask a software developer to build, there will be a whole bunch of valid ways to do it, each with its own unique ways of creating failures. This is especially true when when that developer enhances existing software that he or she didn’t make in the first place. It’s tough to predict exactly how the enhancements will affect the rest of the software. The more lines of legacy code, the more time and analysis it takes to think that through.

If a developer had unlimited time and money, it might be possible to deliver perfect software. Ah, a developer can dream! But here’s where bridge-building and making software have an important thing in common: time and money are never unlimited.

I sympathize with the folks who call software a craft. People who make software use tools and knowledge in its design and construction. These are hallmarks of craft.

Another way that software is like craft is that it’s difficult to fully separate the design from the making. Even when one person designs the software and another writes the code, the coder has to make a bunch of lower-level design decisions along the way.

The software craftsmanship movement meets corporate resistance because revenue and profit ride on what we build. Our companies need to sell features to meet revenue projections, or deliver bug fixes to retain customers. That’s why timed delivery is so important: if you wait too long to deliver, the opportunity to grow or retain revenue begins to shrink.

Feeling pressure to deliver, yet knowing that if we deliver junk we’ll be in an even worse pickle, we tend to manage software-development projects like engineering projects. I think we feel like we have better control when we manage them that way. But that feeling of control can’t mask it: no matter how tightly you plan a software project, no matter how you shape your development and delivery processes to mitigate risk, no matter how much you try to predict the troubles you’ll encounter, you will discover things along the way can seriously derail those plans. It happens in two-week scrum sprints just as it does in ten-month waterfall projects. Discovery is simply endemic to software development.

As a software project manager, I try to build in buffers for the unknown. I also steer projects daily based on what we discover, adjusting plans and communicating impacts to whomever needs to know. I try to make sure our development practices deliver the best possible code to test, and then I try to arrange testing to find the worst bugs first so that near the hoped-for end, only minor bugs remain. Despite all that, important bugs still sometimes reach the user.

We ship when the software is good enough. What “good enough” means varies from context to context, but it is unfailingly short of perfect. Shipping at good enough means you succeeded.

If I delivered bridges that way, I’d never drive over one I built.

Categories
Managing People Quality

Four critical tips for getting your ideas implemented at work

By Jim Grey (about)

After eight years writing and editing software documentation, I itched to make software again, like I did in college. So I took a job with a software company as a tester.

My corporate mug shot from those days - yes, long hair and a beard
My corporate mug shot from those days – the Grizzly Adams years

The company made a sprawling product for an industry I knew nothing about, so I had lots to learn. Given my background, the first thing I did was reach for the manuals. They were incomplete, inaccurate, and poorly organized. There was online help, but it was unnavigable. Nobody was ever going to use the documentation to successfully learn the product. My boss managed the technical writers too, so I marched into his office to complain. I wasn’t delicate about it. “This stuff is terrible! I can’t believe you ship this to customers! It’s an embarrassment.”

He leaned back in his chair and calmly said, “What would you do to fix it?”

“I would throw it out and start over,” I began. And then over the next ten minutes, off the top of my head I outlined a project that would create new manuals and online help that would actually help users not just use the product, but get the best value from it.

Three days later, he called me back into his office. “Remember that thing you said you’d do with the documentation? You are now manager of the Documentation Department. Make it so.”

It was a bold move for him to take a gamble on me. I’d never managed people, and my project management experience was limited. What I didn’t know was that every year the company surveyed its users about product quality – and every year the documentation got the most complaints. My boss had been told to fix this problem, but had no idea how. Then I walked in with a solution that sounded like it just might work.

Most of this story is just the nuts and bolts of the project – hiring and coaching staff, creating plans and schedules, doing visual and information design for the new manuals and online help, managing the project, reporting to management, and even doing some of the writing myself. The details would be interesting only to another technical writer. Much of this was new to me, but I had excellent support from a boss who needed to see his gamble pay off. He also helped me navigate the inevitable office politics, including another manager who kept trying to torpedo my efforts. Also, the program manager helped me master the project management tools we used, none of which I had ever even seen before. My team and I worked on the project for a year and a half. It’s not often a technical writing team gets an opportunity to do a clean-sheet rewrite like this, and they were all enthusiastic about it. I worked hard to clear their roadblocks, respond quickly to their concerns, and generally be a good guy to work for, and it paid off in the excellent work they delivered. When we were done, we had written over 3,000 pages and had created a seven-megabyte context-sensitive online help system.

I was invited to demonstrate the new online help at the annual user conference. 600 people flew in from all over the United States, and there I was before them on the opening session’s main stage. My presentation was the last of a series about new features in the product. When I finished, to my astonishment the online help received enthusiastic applause – and then one person stood, and a few more, and several more, and soon the whole room was standing and applauding. That moment remains the pinnacle of my career; I can’t imagine anything else ever overtaking it. The icing on the cake was when I overheard the VP of Sales say to my boss, “All the blankety-blank new features we pushed you to put into the product, and everybody liked the blankety-blank online help the best! The online help! You’ve got to be blankety-blank kidding me!”

I used to think I was just a grunt paid to trade the words I wrote for a paycheck. Through this project I learned just how interdependent everyone is at a company, and how everybody is important. Specifically, I learned:

If you want to see your great ideas implemented, they need to solve a big problem the company thinks it has. The problems your company thinks it has may very well be different from the problems your company actually has. Frame your ideas in terms of solving the problems the company thinks it has.

When you’re doing something you’ve never done before, find people who can coach you through it. I don’t care how far down the ladder you are at your company, your success helps determine other peoples’. Look for someone who both knows how to do the thing you need to learn and whose success depends in part on yours – that last bit motivates them to help you. In my case, it was my boss and the program manager.

Work for people who clear roadblocks out of your way so you can be most effective. I now leave situations where the boss doesn’t help me in this way. It’s that critical.

Your success always depends on other people, so treat them well. In giving my team an exciting assignment and creating an environment in which they could focus, they happily turned out huge quantities of good work. Also, after we shipped the new documentation, I promoted every writer. They deserved it.

A footnote: That company went through tough times a few years later and so we all moved on, some for better positions and others (like me) because they couldn’t afford to pay us anymore. One of the writers who had worked for me called me one day seven years later, by which time I really had moved into software testing. She said, “We have an opening here for a test manager. I’d love to work with you again, and this is a good place to work. You really should apply.” I did, and I got the job. I found out later that just before my interview, she went to the VP and said, “He’s a great boss. You don’t want to let him get away.”

Sometimes the good things you do come back to you!

Categories
Managing People Quality

Multitasking hurts productivity

By Jim Grey (about)

“You multitask like a madman,” my boss said to me.

She meant it as a compliment, but it brought me down. I was exhausted, teetering on the edge of burnout precisely because I had been multitasking an enormous workload.

Not a skill to be praised

I managed 15 people across four teams: testers that delivered monthly bug-fix releases, test automation developers, performance testers, and technical writers. My teams were solid and I had great leads in place, which freed me to work with a security-testing vendor to start doing regular penetration tests of our product, and with a translation company to translate our product user interface into five languages each release. I had a lot going on, but I was handling it.

But then the company decided to lean hard into more international markets. The executive team asked me to gather quotes to translate our product UI into even more languages, and also to translate our giant online help system, which we had only ever offered in English. The costs were an order of magnitude more than the executive team imagined, and so I was called into endless meetings and hallway discussions to provide more data as the executives squabbled with each other over strategy. This all sucked down more than a third of my time – but I could never focus on this work for more than ten or twenty minutes because I was still managing four teams with leads who had questions and needed me to remove roadblocks.

My performance began to suffer. While I had my eye on one ball, another would drop. I started making silly mistakes. It all wore me down to a nub. To keep sane, I ended up not asking, but telling my boss to take things off my plate so I could survive. I really wanted the translation stuff to go, as I didn’t enjoy it very much. But instead she gave the technical writing and bug fix teams to other managers.

I really mean task switching, not multitasking

Everybody calls what I was doing multitasking, but it really wasn’t. Real multitasking is when we do more than one thing at a time, such as driving and talking, or walking and chewing gum. But when two things come along that require focused attention, most of us can’t do them simultaneously. We work on one task, and then we work on the next. That’s really called task switching, and it happens every time we stop testing a feature release to test an emergency hotfix, or even get interrupted to answer a question. I’ve just called it multitasking here so far so that Google’s sweet, sweet searches can find this post.

Task switching makes tasks take longer overall. It also hinders learning – all that switching from one task to another keeps things from sticking in our brains. It really is better to work on, and finish, one thing at a time. We are so much more productive that way.

The hidden costs of task switching

Say you’re working on task A when task B arrives. If you want to avoid task switching, you finish task A and then work on task B.

mt1

But say task B is hot, and your boss needs you to work on it right now. Task B goes out sooner, at the cost of delaying task A.

mt2

But there’s a hidden cost: unless a task is automatic or menial, it takes time to get oriented to it, even when you’re returning to it after only a brief interruption. You must at least try to remember where you left off. That orientation time delays overall completion.

mt3

This cost mounts the more you switch tasks. If you switch repeatedly among tasks A, B, and C, not only do all tasks finish later, but tasks A and B finish much later.

mt4
Task switching hinders learning

dq-treats-wafflecone-plaincone
Dairy Queen photo

One of my first jobs was working the counter at a Dairy Queen. It took me a couple weeks to learn the technique for creating their soft serve’s signature shape, but then I could do it without even thinking about it. It had become a habit.

Making software isn’t the same as making ice-cream cones. Being effective and productive is much more about deepening skills and knowledge than about building habits.

Single-tasking helps deepen skills and knowledge because it stimulates the hippocampus, which is part of the brain that puts information in long-term memory. Task switching hurts this because it stimulates the basal ganglia, which is the part of the brain that is good at building habits.

In software development, you simply get better at what you do faster when you single task.

What to do then?

It’s impossible to entirely eliminate task switching – emergencies will arise, questions will need to be answered. And I do believe in the power of collaboration to deliver better software. But it’s a good investment to minimize task switching as much as you can.

If you’re in management, make singletasking a value. Demonstrate it by removing obstacles so people can focus on one thing for long periods:

  • Can you have “do not disturb” periods or let people work from home when they need to complete a critical task?
  • Can you give people private offices?
  • Can you schedule meetings that involve your team members so they don’t interrupt work as much, such as first thing in the morning, just before lunch, or at the end of the day?
  • Can you organize your teams so that you have people dedicated to handling customer emergencies (in a prioritized way, so they can focus on one at a time) and people dedicated to building new product?

If you’re not in management, you can still do a lot to clear your decks so you can concentrate:

  • Adjust your work schedule so that you arrive earlier or stay later than most others. (My normal work hours have been 7:30 to 4:30 for more than 20 years. The first 60-90 minutes of the day are my most productive because few others are in the office.)
  • Stop responding to e-mail as it arrives; instead, set aside specific times when you read and respond to it.
  • Work out a do-not-disturb convention with your team. In one group I worked with, we placed a little blue flag on our desk when we needed to go heads down.
  • Ask if you can work from home when you really need to concentrate.
  • At least put on your headphones, which many people interpret as a sign that you don’t want to be disturbed.

And of course, when your boss pulls you in too many directions, be sure to ask him or her to help you prioritize your work so you can focus on one thing at a time, or to assign work across your team in ways that balances the load. Multitasking alone doesn’t usually lead to burnout, but it absolutely brings you there faster when you have too much on your plate.

Categories
Quality

The difference between quality and excellence

By Jim Grey (about)

My long career in software development was briefly interrupted in the mid 1990s when I took a job editing technology books. My first project was editing a new edition of one of the publisher’s biggest sellers. I drew this plum assignment not for my l33t editorial skills, but for being the new guy. The author had a reputation for running his editors ragged, and the other editors were glad to scrape this book onto me.

I never understood why, because editing the author’s work was a pleasure. His writing was clear, engaging, and funny. When I made suggestions for improvement, he gladly took most of them. He even called me to discuss and improve on a few of them. He did require a lot of attention, all of it for the good of his book, as he sweated every detail. For example, I spent hours on the phone with him poring over proofs, which are draft printouts of the book after it’s been laid out. It’s the last stage before the book is printed, and he used this time to polish his work further. He sometimes rewrote entire paragraphs to make them funnier (as humor was his book’s hallmark) or reworked graphics to make them clearer, all of which never ceased to thrill the overworked layout department.

When we were done, we had a book to be proud of. I displayed my copy prominently on my bookshelf. It then sold a bazillion copies.

My next assignment was to edit a thick book about a communications technology that was still popular then. This author handed in cumbersome and clumsy text full of basic writing errors. His humor was lame and sometimes offensive. His technical explanations were incorrect and incomplete. I spent hours hammering his work into something marginally usable. He ignored most of my suggestions and avoided taking my calls.

After he had handed in 100 of the book’s 800 pages, he announced that he was done writing. I was incredulous as he explained that the remaining 700 pages would be reprinted (and poorly written) documentation from shareware related to this technology. What laziness! What gall! I accosted the acquisitions editor – that’s the guy who hired this author – and raised an unholy ruckus. I said, “This book will be useful to nobody!” He shrugged. “It’s his book. Is it on schedule?”

I spent the next several weeks with my stomach knotted from anger and disgust as I edited those 700 pages. I pinched my nostrils shut as I sent the chapters to layout. I suppressed my gag reflex as I reviewed the proofs. I rolled my eyes when my copy of the finished book arrived. I hid it in a dusty and forgotten corner of my bookshelf. Then I succeeded for several weeks at forgetting the whole sordid ordeal until I received a letter from somebody who actually bought the book. He wrote something that knocked me out of my chair:

“Dear Sir. I was trying to figure out this communications technology when I found your book. I wanted to tell you that it was exactly what I needed. I played with a couple of the programs the book described and, with the book’s help, got one of them running. Thank you for publishing this book. Sincerely, Some Reader.”

I was humbled. No, I was shamed. Mr. High-and-Mighty Editor thought that the author created a steaming pile of feces while giggling at the teller’s window as he cashed his advance check. Yet somebody found the book to be exactly what he needed.

I started to see that maybe I wasn’t the final arbiter of quality, that maybe quality is what meets the customer’s needs. I’ve carried this critical lesson into every job I’ve had since.

But now, many years hence, I have learned another lesson from these two books.

mfd3e

That first book was Macs For Dummies, Third Edition, by David Pogue, a keystone of the juggernaut Dummies franchise. More recently, you might have seen David’s technology column in the New York Times, or his acclaimed The Missing Manual series of books, or maybe the stories he does for CNBC and for CBS News Sunday Morningor the four-part series he did for NOVA on PBS. David has done very well for himself since his Dummies days. He has worked very hard for it, leveraging every opportunity with his characteristic energy, wit, and grace. He could have gone a long way on those traits alone. But his ability to do top-flight work truly distinguishes him.

I haven’t been very kind to the other author here so I won’t reveal his name or the title of his book, which sold poorly despite the one fan letter. I’ve encountered him here and there over the years and he has always seemed very happy. But he has not achieved a hundredth of what David Pogue has.

The new lesson? Something modest may meet a customer’s need. But it sure is satisfying – and the hard work sure worth it – when you can really delight the customer. And David Pogue’s case shows that talent and hard work can still really pay off.

Categories
Career Managing People Teambuilding

I believe in “A teams” over “A players”

By Jim Grey (about)

I’ve heard it again and again at work. “We need to hire a real A player for this job, a total rock star.”

crayolaqa
A test team I once belonged to, dressed as crayons for Halloween. I’m the gray crayon, natch. Our colorful dunce caps do not mean we weren’t A players!

This statement usually comes at a time some critical task or function isn’t being done well (or at all) and it’s causing projects to fail. “If we can just bring in a super-skilled specialist,” the thinking goes, “it would solve all of our problems!”

Sometimes this gets stretched into a one-size-fits-all approach to hiring. “Let’s hire only A players,” someone proclaims, “and then get out of their way and let them perform.”

No doubt about it: A players are extremely talented and deeply experienced. They are heavily self-motivated and especially hardworking. They are creative problem solvers who focus on getting the job done.

But don’t assume that putting A players on the job is like sprinkling magic fairy dust that makes problems go away. That’s setting them up to fail – and setting your company up to fail, too. Companies are much better served building high-performing teams.

A players are no substitute for leadership. The most important step in that leadership is to help your people form solid teams. I’ve been in software-company leadership roles for more than 15 years now. I’ve delivered many, many successful software projects with teams made mostly of B players. But those successes came after company leadership:

  • Created a shared, common vision that everybody rallied around and focused on
  • Built a process framework within which team members worked, which set standards for workflow, quality, and completion
  • Praised and rewarded team members for jobs well done
  • Hired for fit within the company culture, as well as for skill

A players are hard to find. A reason why I often hire B players is because most people aren’t A players. I’d say less than one in ten people I’ve ever worked with are that good. I make software in Indianapolis, which we sometimes call the Silicon Cornfield. Many of the truly outstanding geeks move to California, North Carolina, or Texas, where the opportunities are greater. But even in those places, there are only so many A players to go around. Sooner or later you have to hire B players too. Those B players will work best under strong leadership and in highly functioning teams.

A players often have the biggest egos. A little swagger is part of the A-player territory. If you don’t lead well and help them gel into a team, conflicting egos will put your projects at risk.

A long time ago I followed rec.music, a once-popular Internet forum about music. In a recurring discussion thread, members wrote about which musicians they’d put in the best supergroup ever. The debate raged – Eric Clapton on guitar, and Neil Peart on the drums, and Paul McCartney on bass, … no no, Phil Collins on drums and Jeff Beck on guitar! …no! It must be John Paul Jones on bass!

It was fun to fantasize about such things. But do you really think a band with some of the biggest egos in music would gel? I’m reminded of We Are the World, the 1985 charity song recorded by a supergroup of pretty much every popular musician of the time. The famous story goes that someone taped a sign that read, “Check Your Egos At the Door” on the recording-studio entrance – but that didn’t stop arguments over many of the recording’s details, with at least one musician walking out and not returning.

Don’t let that be your team.

Still, A players can be mighty useful. There are times when it’s right to hire A players. Here are the times when I’ve settled for no less than an A player:

  • Lead roles – I needed someone to figure out some thorny problems, and to set the pace and point the way for the team.
  • Lone wolves – I needed someone for a highly specialized job where I was unlikely to need more people in that role for a long time, especially a role where I lacked the skills to do it myself and therefore would have a hard time managing its details.

Really, I’ve never not hired an A player just because he or she was an A player. Who wouldn’t want their skill and determination on the team? I’ve only passed on A players when they would be a poor cultural fit in my company and in my team.

Categories
The Business of Software

Much ado about Flickr

By Jim Grey (about)

I was shocked when I logged into Flickr last week and found an entirely new interface.

I'm staying right here at Flickr
My Flickr homepage.

My shock turned to disappointment and sadness that some of my contacts were super angry about the change, left strongly worded comments on their photostreams, and immediately moved their photos to other services.

make software products for a living; I’ve seen firsthand how interface changes can alienate users. They become comfortable with a product’s features and usage, even when they’re flawed. They don’t want to learn anything new (which often masks a fear that they can’t learn something new).

At the same time, Flickr (and Facebook and any other thing you do on the Web) is a product, built by a company that is trying to make money in an ever-changing landscape.

I’ve seen it often, and it’s happened at companies where I’ve worked: A company builds a good product that takes off. Success causes the company to grow or to be sold to a larger company. And then some scrappy startup company builds a product in an overlapping market that becomes a new darling. By then, the big company is so invested in what it’s always done that it struggles to adapt to the shifting market.

From where I sit, it looks like all of this happened to Flickr. Founded in 2004, Flickr quickly became arguably the king of the hill among photo-sharing sites. Web giant Yahoo! quickly noticed and, in 2006, bought the fledgling company. Success!

But consider all that’s happened in photography and on the Web since 2006. Most people had just discarded their film cameras for digital cameras. Soon cameras in phones became good enough for casual, everyday use; many of them are now very good. Users found it easy to share their photos across any number of the social networks that had emerged – primarily Facebook, which was founded in 2004, too, but also on upstart Instagram. Today, the three cameras that take the most photos uploaded to Flickr are all iPhones.

The market has shifted. It was a matter of time before Flickr either responded or became a niche product of ever decreasing importance. This new interface is its bid to stay relevant. I’m impressed with Yahoo! for moving Flickr so boldly.

I think that if people give the new interface a chance, it will work for most of them. I’ve heard complaints about slowness; I advise patience as Yahoo! would be foolish not to address legitimate performance problems. I’ve heard complaints about how crowded the interface feels; I’m also sure Yahoo! will tweak the new interface over time for better usability.

Another source of uproar is that advertising now adorns Flickr pages. I hate Web ads too, but really, they are the major way many Web products make money.

I sympathize a little with one complaint: all of us who bought Flickr Pro accounts for unlimited photo uploads now feel kind of let down, given that everybody gets a terabyte of storage now. That much storage might as well be unlimited; you could upload one photo a day for the rest of your life and never run out of space. But Flickr is letting us cancel our Pro accounts with a pro-rated refund, or keep Pro at its rate of $25 per year and never see an ad. Anybody who doesn’t have Pro already will have to pay $50 per year for that same privilege. I think this is a reasonable trade.

Flickr’s real mistake might be in underestimating how attached its users were to the old interface. But if my experience is any indication, perhaps that mistake won’t be fatal. Of my contacts, about five percent of them have moved to other services. I’ll miss seeing their photos. I wonder if they’ll soon miss the rest of the Flickr community.

Cross-posted from my personal blog, Down the Road.

Categories
Process

A thing called platform support and why Android makes me nuts

By Jim Grey (about)

“Our product is a Web app,” the CEO said. “Why wouldn’t it work flawlessly on an iPad?”

The answer, of course, was that we didn’t build it with the iPad in mind. When we got it to work on our PCs on Chrome, we declared it done. But some guy at one customer site decided to run it on his iPad, and he was upset because it didn’t work “at all.” Support escalated the case to Engineering, where one of the testers poked around the product on her personal iPad. She found that she could log in, navigate, and enter data on any screen. “It was a pain. I’d rather do it on my laptop any day. But it did work,” she said.

So we were puzzled by our user’s bad experience. And then we learned that he had an original iPad. Our tester had a fourth-generation iPad – on more modern hardware and the latest version of iOS. Unless we wanted to shop for an old iPad on eBay, we were never going to get to the bottom of this complaint. The user wasn’t thrilled when we told him that we couldn’t support his old tablet.

Crushed under the weight of platform support

If you don’t tell your users what they can use to access and run your product, they will run it on whatever they want. When it doesn’t work, they’ll demand you support it. This will crush you. You need to set some limits.

This isn’t meant to be done by fiat. You need to think it through: What are your customers likely to want to use? You need to be there. How will they access your product – devices, operating systems, browsers? If your product will be installed on their equipment, what hardware (or virtual servers), operating systems, and databases are they likely to use? If your product interfaces with other products, what versions will your users want to use?

What you’re doing is limiting the edge cases, like that user’s original iPad. You’re also limiting the list to what you can meaningfully test.

Creating a matrix of supported platforms

Create a spreadsheet that says your product runs on and with these versions of these other products. If you support several versions of your product, the list of supported platforms will probably vary a little version to version.

Give the spreadsheet to support and sales and say, “Use this to set expectations. If they want to use something that’s not on this list, try to talk them into sticking to the matrix. If they insist on going their own way, tell them we might not help them if they have problems.”

This isn’t a once-and-done job. New versions of your supported platforms will be released. Your users will want to use entirely new products with your product. You’ll have to decide when and whether to add them to your matrix. You’ll need to remove old platforms from the matrix sometimes. You need to keep gathering information and keep updating the matrix.

You can’t support it all: why I hate Android
IMG_5629 sm
I used to have a Palm Pre (right). I loved it, but it was on nobody’s platform matrix! My iPhone 5 (left) is on everybody’s matrix.

Still stinging from the customer who had the bad iPad experience, the CEO asked, “Do we have any idea how the mobile Web site behaves on Android? It needs to run on Android, too. It needs to run on any device they have in their hands.”

The problem with Android is that there are eleventeen base versions in active use, each of which a vendor can customize for their devices. And the devices themselves run the processing-power and screen-resolution gamut. When I explained that I’d need to buy a hundred devices and hire twenty testers to test them all, the CEO became less excited about Android.

You can limit the scope of Android, and any other platform with a lot of permutations, by deciding which permutations you’ll support and testing only those. As I write this, the 2.3.x (Gingerbread) versions of Android are the most widely used. You can buy a couple of popular devices on that version and test just those.

The only challenge is that where there are lots of permutations, you’ll have plenty of users on unsupported permutations who aren’t going to be happy.

This reminds me of the browser interoperability problems from several years ago, and why companies usually limited their Web apps to work only on Internet Explorer. Browsers are better now, and it’s more common for Web apps to work on any modern browser. Here’s hoping Android settles down as browsers have.

You can’t test it all: Supported vs. Certified

The items on your supported platforms matrix needs to be things with which your product actually works. As much as possible, design your product up front to work with them. But if you’ve just created a supported platforms matrix for a product that’s been around a while, the best you can do is test your product with those supported platforms to find out where the bugs are. The knee-jerk approach is to do a “full” (whatever that means) regression test on every platform.

One place I worked, our supported platforms matrix grew huge. When we printed it, we needed a jeweler’s loupe to read it. For each version of our product, there were fifty or more supported platforms. I couldn’t afford the army of testers I’d need to do “full” regression passes on them all.

But all that testing really wasn’t necessary. You can look at what’s new in the latest version of a supported platform and make a pretty good assessment of the risk to your product, and size your testing accordingly. For example, a product I used to work on ran on the Oracle and SQL Server databases. When Oracle 11g Release 2 (11.2.0.1) came out, it offered enough new stuff that we weren’t sure how it would run with our product. So in our next release, we did our normal release testing, including regression, on that database. We also ran some special tests that exercised our most complicated SQL queries on it. We found and fixed some bugs that way.

When Oracle issued 11.2.0.2, we looked at what was new in that release and saw no reason to believe that any of it would create new bugs in our product. Our experience was that if a database version was going to throw bugs in our product, those bugs would be foundational and would appear right away under even light testing. We had a bug-fix release coming up, so we installed 11.2.0.2 on that test server and tested the bug fixes on it. The release sailed through testing, so we shipped it and added 11.2.0.2 to the supported platforms matrix.

Platforms that got the so-called “full” test treatment were listed as “Certified” on the matrix. What we told our customers was that we had “performed a certification test” on the platform, had fixed the bugs we found, and were highly confident of a good experience on that platform.

Platforms that got light testing were listed as “Supported” on the matrix. Sometimes our analysis showed low enough risk that we did no testing and still called the platform Supported. We told our customers that we expected that they would have a good experience on that platform.

The bottom line for both Certified and Supported was that if the customer experienced a problem, we’d take their call and resolve their issue.

The bigger bottom line is that you want a manageable list of platforms with and on which you believe your users will have good experiences using your product, a list that won’t kill you to keep up with. Through understanding what your customers want, limiting the list to the most common and important platforms, and varying the depth of testing based on risk, you can keep up.

Categories
Process

Even a mediocre plan will work if everybody follows it

By Jim Grey (about)

I have been astonished in my life by how few problems are truly unsolvable. I have also noticed that, most of the time, when a problem ends up not being solved it is for one of two reasons: people deny the problem, or they won’t work together on the solution.

There are any number of ways to make software. All of them involve programming, of course; it’s the core activity. But there are other jobs to perform, and the bigger the software being developed, the more people you need to perform them. It becomes necessary to organize the people and form work processes. The ways to do that range from loose and chaotic to structured but flexible to ponderous and formal. They all have their strengths and weaknesses; they all can work. I’ve made software in all of these situations.

Perfect gravel road
Even a gravel road will get you there.

One of my past employers sequestered its programmers in a room to hack code together as fast as they could. When they were done, a couple people would play with the product for a week or two in hopes of finding some bugs, which the programmers might or might not fix. Then we shipped it to five customers, who upon installing it invariably found that it failed spectacularly. When we fixed all the problems they found, we chose five more customers. We repeated this process until the software stopped failing, and then announced to the rest of our customers that it was ready.

At the other end of the scale, another past employer had such a heavy software development process that following it felt like slipping into a straitjacket. There were reams of documentation to write and keep with layers of approval at each step. I led the test team. We had to take a screen shot of every test step, print it, and put it in a folder. At the end of a project, we had boxes and boxes and boxes of printouts, hundreds of pounds worth, that we would send to an offsite storage facility. This was in case our only customer, which happened to be the U.S. government, wanted to audit us.

Both companies are still in business.

A third past employer carefully hired smart people and trusted them to know what to do. Because they hired well, this worked for a long time. But as the company grew, this approach became more and more difficult to manage. Product quality became a problem. A big part of the software was an accounting package, and in one fateful release the general ledger simply would not balance. This affected every customer, and to make a long story short, it took the company almost a year to fix it. Several customers quit us.

Too often it takes great pain to drive important change. This pain made us face that we needed more structure to deliver successfully. So we hired a consultant to guide us toward a better software development process. He showed us a way with just enough checks and balances that we could have greater confidence in our work without being too bogged down. We also hired a consultant to help the management team (of which I was a part) work more collaboratively. She taught us how to lead the company through this transition.

It worked. In the very first product release under the new process, quality problems fell off dramatically and we delivered on time for the first time anybody could remember.

The more important hire was the second consultant. The new methodology was good, but it wasn’t magic, and it might not have worked for us if we had not done a good job helping the team through the changes. You see, a few people didn’t understand why the old way couldn’t still work and still others thought our new process wasn’t going far enough. Our task was to overcome the resistance and get everybody singing from the same sheet of music, and it was the hardest thing we did. But we pulled it off.

It’s a rare organization that faces its failings and works to change. It’s a rarer organization still that drives change collaboratively.