Categories
Quality The Business of Software

Obamacare, healthcare.gov, and how government software gets made

A harrowing tale.

By Jim Grey (about)

I was not surprised when I heard that the Obamacare Web site, healthcare.gov, crashed and burned right out of the gate.

But I was disappointed. Regardless of what I think of the Affordable Care Act, it’s the law. I wanted its implementation, including healthcare.gov, to go well.

Still, I wasn’t surprised because I know how government software gets made.

obamacare2

Several years ago I worked in middle management for a company that built a government Web application related to health-care customer service. I was in charge of testing it to make sure it worked. It is probably not going out on a limb to say that the people who built healthcare.gov experienced many of the same kinds of things I experienced on that project.

Let me be plain up front: I was a poor fit for government software development. I was too free-wheeling and entrepreneurial for the control-and-compliance environment that government contracting encourages. I find it difficult to write about the experience without showing my frustrations with its realities. But I think I understand those realities well and objectively.

The government doesn’t know how to do anything. They hire it all out, and then they manage and administer the process. As a result, on this project they relied heavily on compliance with “best practices,” as if those practices contained some sort of magic that delivered quality software. They don’t, of course; the government was shocked when Version 1.0 of our software had typical quality problems right out of the gate. Those practices served primarily to leave an audit trail the government could follow.

In the end, the project was a success. Despite Version 1.0’s glitches, which we quickly fixed, the software was immediately put to use and led to productivity improvements over an older, green-screen system. I spoke with many of the software’s users, and despite a few grumbles most of them liked using it.

But this was one mighty expensive piece of software to build, from winning the contract to defining what the software should do to building and maintaining the software. Here’s why.

The bid process

I was hired after we won the contract, but I heard stories about the bid process. We had no experience building software on this scale, but we wanted into the lucrative cost-plus government contracting business for its guaranteed profit margins. So we offered a lowball bid aimed at getting the government’s attention, not at what it actually was going to take to build the software. And then to our surprise we won the business. After the elation wore off, we were left with an “oh shit” feeling – we needed to actually build the software for that amount. How the heck would we pull that off?

We finished Version 1.0 on my watch, but I don’t know whether we delivered it within budget. It seemed to me, however, that the bid process encouraged underbidding and overspending.

The requirements process

When you make something for the government, they want to know exactly what they’re getting, in excruciating detail. So we started by writing the biggest, thickest requirements document I’ve ever seen. We weren’t building this software from scratch – we bought what was then the leading customer-relationship-management software platform and used it’s software-development toolkit to heavily customize it for our needs. But we had to write highly detailed specifications anyway.

obamacare1

Requirements gathering was more about navigating choppy political waters and brokering compromise than about specifying usable, stable, and scalable software. To develop the requirements, we flew in representatives from every company that would use the software and put them into a big room to hash out how it would work. But all of these companies were themselves government contractors. Their people all knew each other – and, frequently, competed against each other for contracts. Some of them competed against us trying to win this contract. The room was thick with mistrust and agenda,

The building process

The government lives in constant fear of being screwed by its contractors. It goes back to Abraham Lincoln’s time, when rampant fraud among suppliers threatened the Civil War effort. (Seriously. Gunpowder cut with sawdust. Uniforms that dissolved in the rain. Read about it here.)

So not only did the government hire us to build the software, they hired another firm to watch us do it. This is called independent verification and validation, or IV&V. Their job was to make sure that we followed software-development “best practices” and that we built what we said we were going to build. But making matters worse, the company that won the IV&V contract, I’m told, also had bid on the project to build the software in the first place. It always seemed clear to me that they wanted to show us to be fools so that they could take over the project. They ran us ragged over every last minor detail.

The level of perfectionism in terms of “best practice” adherence was intense. Yet when we delivered the software, it had several usability challenges and outright bugs. Worse, it struggled to keep up with the load users placed on it. If you’ve ever built software, you know that these are typical challenges with Version 1.0 of anything. But the government was shocked, dismayed, and appalled. We spent the next several months issuing update releases to make it perform as it needed to. Of course, IV&V ran roughshod over us the whole way – but they were in the hot seat too because their “best practices” had failed to prevent these problems.

The process overhead

Process is tricky to apply well. Too little leads to chaos, too much adds needless cost and delay. I’m not anti-process – rather, I’ve built a career on bringing just the right level of process into a software development environment to make it effective. But most of the process we had to follow involved documenting our work to prove to the government that we had actually done it. This frequently hindered our ability to deliver software cost-effectively, and sometimes stood in the way of quality.

obamacare3

We bought a well-known software product that stored requirements and linked them to the code and the test cases so we could prove that we built and tested each requirement. This involved tracing every requirement to every line of code and every test case, an enormous task in and of itself. I personally created a traceability report each quarter and sent it to the government. All of this required a lot of work from skilled technical people, but in my judgment did not materially help us better build or test the software.

Our test cases were contractually required to be documented in such detail that a trained monkey could execute them. They were at the level of “Step 1. Type your username into the Login box. Expected result: Your username appears in the Login box. Step 2. Type your password into the Password box. Expected result: A row of asterisks appears in the Password box.” A test case that took fifteen minutes to execute could have taken two hours to write and could have been a dozen pages long. We had hundreds of test cases. Many test cases were not appropriate to be added to the regression test suite and be executed every release, so we spent a lot of time writing them to execute them a small handful of times.

It was supposed to be against the rules to write a bug report that had no associated test case. Testers would often stumble upon a bug by accident or find one while doing ad-hoc testing – and then find themselves in a conundrum. Writing the test case that led to the bug and tracing it back to requirements took time we frequently lacked at that point in the game. When the bug was serious enough, everybody looked the other way when it wasn’t associated with a test case. I wonder whether any of the testers avoided writing test cases by falsely associating the bug with an existing test case.

We did get one big break. We lobbied for, and to our astonishment successfully won, an exception to a standard practice: we did not have to print screen shots of the results of every test step. Other projects for which we had contracts had to do this. As you can imagine, managing all that paper slowed progress considerably. Those projects collected those screen shots into boxes, which were sent to offsite storage.

The mounting costs

All of these process steps meant spending more money, mostly in the form of human effort. There were other ways in which the government’s way of making software added costs to the project. Here’s a short, incomplete list:

  • Frequent, ongoing training about compliance with standards, which, amusingly, is where I learned about the Civil War fraud.
  • Entering time worked on three separate software systems – one for the project-management tool, one for government accounting, and one my employer used to manage time off. I spent an hour a week entering time.
  • A prohibition on open-source software. The government wanted all software used to be “supported,” meaning that there had to be a phone number to call for help. So we spent money on commercial tools that sometimes weren’t as capable as open-source versions. In a couple cases, the only tool or component available for a task was open source, and we couldn’t build the application without it. We did get the government to bend the rule for us in those cases, but it took heavily documented justifications and layers of approvals to make it happen.
  • Strict separation of duties to protect the government against a rogue contract employee from sabotaging the system. This meant, for example, that I couldn’t restart the computers we used for testing when they needed it, I knew how to do it, but I was not allowed. I had to write a request for an infrastructure engineer to do it, and then wait sometimes for days for it to reach the top of his priority list.

As you can see, there was nothing easy or inexpensive about this project. Yet we got it done and the software worked. It’s still in use today. We showed that it’s possible – just slow and expensive – to build software the government’s way.

So I have great empathy for those who built healthcare.gov. No doubt about it: the site failed, and they built it. But they must feel tremendous pressure right now as they scramble to both handle the heat they’re getting from the government and to rush fixes to the site so that it works well enough. But if their experience building that site was anything like my experience building government software, it’s hardly shocking that it launched with challenges.

By Jim Grey

Writer. Photographer. Leader of geeks.

15 replies on “Obamacare, healthcare.gov, and how government software gets made”

I listened to a news piece on NPR where a government rep was brought in to discuss the problems with the Affordable Care Act site. Even through the show’s host kept coming back to variations on the common accusation “Wasn’t this tested?” the rep calmly explained that, while there were certainly software glitches, a big part of the problem came from the way they modeled the anticipated load. They looked at other Medicare websites and assumed that the number of visits and timing of those visits would mirror Medicare enrollment periods. Instead, visits adopted a very un-Medicare profile and the site were quickly swamped.

My favorite part of all that (even more than how testing wasn’t the scapegoat) was that there was an legitimate attempt to come up with a best-guess analog to use for modeling load. Even if that guess was wrong, it was something to base decisions on. The next thing I’d want to know is if the site was *designed* and *engineered* with that load in mind, or if it was just built, and then the load testing team used Medicare site traffic to come up with a test approach.

Like

Thanks for that perspective, Rick. The average person out there just can’t know that testing is not a magic filter that catches every possible problem. I would be very surprised if healthcare.gov was designed with a particular load in mind.

Like

It’s like they brainstormed the most ineffective way that it was still possible to write software. I think I’d rather build a great wall or a pyramid. This explains the DMV too. I’ll never look at a .gov the same way again.

Like

Reblogged this on Down the Road and commented:

It’s been in the news lately: the healthcare.gov site, a component of the Affordable Care Act, fell flat on its face at launch. It was unable to handle the crush of people seeking health-insurance information. I have empathy for the people who built the site, because I’ve built government software before and it was way harder than it needed to be. Here’s my story, at my software-development blog.

Like

I follow quite a number of software testing bloggers and you’re the first who has referenced this project. I’m surprised nobody else has mentioned it this yet, although considering what I’ve been reading in the traditional media I’m skeptical of a lot of what’s been published so far, so maybe they’re waiting for the dust to settle.

One media story said it wasn’t tested. Another said no testing was done until 5 days before. Another said no load testing. Another said HHS didn’t do end-to-end testing. Others are blaming it on hiring a Canadian company or the contractors they hired. It cost $93 million or $600 million. They used 10 year old technology. The web site sends 56 Javascript files. The Spanish site has never worked. At least one true thing mentioned in the media is that the phone number for the site is 1-800-F1UCKYO.

It really seems like a classic FUBAR software project: late requirements from too many different sources, ham-strung project management, architecture issues, overworked contractors, too many questions that don’t get answered, pointless testing, backlog of defects, fix one defect and uncover ten more, people quitting, late night screaming phone calls, political egomaniacs who don’t understand that software is difficult to build, nervous breakdowns, threats of lawsuits, etc. We know the drill, right?

Worst of all is that nobody could give on the “drop dead date” for individuals, even though the employer mandate was delayed. Going to market with a crappy, bug-ridden product has ruined and killed many companies. It’ll be interesting to see what happens in the coming months and years, especially when the site works better and more people encounter sticker stock.

As far as our craft goes, I suspect that in the next 6 to 12 months or so we’ll be treated to numerous stories by various software evangelists who will proclaim that their chosen methodology or hybird methodology or tool or process or whatever saved the day for the health exchange web sites. Whether or not any of this is true is beside the point. Be on the lookout for snakeoil!

Like

I find it amusing that media stories seem to focus on testing, as if testing were a magic filter through which perfect software passes. I’ve worked with enough software executives who thought that, however, that I should not be surprised when the media, which knows nothing about software development, reports it.

I saw my first story today that a chosen methodology would have prevented these problems. Here it is:

http://eliassenblog.wordpress.com/2013/10/22/an-agile-approach-would-have-eliminated-the-technical-problems-surrounding-the-acas-website/

I think this site is selling Agile services, so it’s not surprising that they are saying Agile would have solved these problems. But it makes me wonder if the person who wrote the article ever made software for the government before to understand the context in which they would be practicing Agile. My post here explains the context I lived in while making that CRM Web app for the government. If I had tried to pitch Agile to them, they would have looked at me as if I had sprouted a second head. When I’ve practiced Agile, it has been in a context where some level of trust is extended to the dev team. That is simply not the way the government rolls on its contracts, because of its abject fear of being bamboozled.

Like

I’m not surprised that people are doing this, although Paul Fleming Jr doesn’t indicate whether he knows for certain if Agile or Waterfall or whatever was used by CGI Federal in the development of the Healthcare.gov web site. (My guess is that whatever was initially used, it scaled and morphed into a nightmare FUBAR scenario…) I haven’t read about what development method was used for Healthcare.gov in any of the reports I’ve read so far, although outside of tech publications I would not expect it to be referenced or analyzed. Quite a lot of the open CGI Federal development jobs in the DC area use “Agile” in their title or descriptions, so call me skeptical about Mr Fleming’s post. This idea that the Feds are still requiring “waterfall” all the time is just not true. Plus, the notion amongst Agilists that “waterfall” is somehow fundamentally bad is a ploy that has always soured me on the more dogmatic. I am not dismissing Agile or Scrum or iterative development hybrids, but too many of these Agile consultants carry a hammer around all the time and everything looks like a nail to them.

Like

Good point about the feds not necessarily eschewing agile now. I worked on my government project long enough ago that agile was still new and scary to them. That might have changed.

Like

This story on 10/16 in Bloomberg BusinessWeek by Paul Ford references “sprint cycles” for the front-end web work on Healthcare.gov by Development Seed. http://www.businessweek.com/articles/2013-10-16/open-source-everything-the-moral-of-the-healthcare-dot-gov-debacle#p2 So it’s almost with certitude that Agile/Scrum was used.

The Washington Post had a roundup of comments concerning Waterfall vs Agile: http://www.washingtonpost.com/blogs/wonkblog/wp/2013/10/27/reader-mail-can-agile-development-work-for-government-anyway/ Again, nobody quoted really seems to know what was used. I sent writer Lydia Depillis the Bloomberg article.

Eric “Lean Startup” Ries had a quote in Politico about how he suspected this was a Waterfall project: http://www.politico.com/story/2013/10/obamacare-affordable-care-act-tech-no-easy-fix-for-silicon-valley-98875.html – I followed up with him and he admitted he had no knowledge whether it was Waterfall or Agile. I sent Ries and Politico writer Jessica Meyers the Bloomberg article.

Like

Thanks for the links. This multi-contractor approach strikes me as an obvious challenge because of the opportunity for “not our problem”-ing things. I dunno, now I’m seriously armchair quarterbacking.

Like

[…] It’s been in the news lately: the healthcare.gov site, a component of the Affordable Care Act, fell flat on its face at launch. It was unable to handle the crush of people seeking health-insurance information. I have empathy for the people who built the site, because I’ve built government software before and it was way harder than it needed to be. Here’s my story, at my software-development blog. […]

Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.