16K COVID-19 Cases Go Missing In UK Due To Government's Use Of Excel CSVs For Tracking

from the excel? dept

Yes, yes, you’re sick of hearing about COVID-19. Me too. But the dominant force of 2020 continues to provide news, often times with a technology focus. This mismanaged pandemic has already given us an explosion of esports, students gaming remote learning systems, and enough dystopia to make George Orwell vomit in his grave.

But to really get your anger bubbles gurgling, you need turn only to the myriad of ways far too many governments have taken a moment that requires real leadership and forethought, and pissed it all down their legs. America appears to be trying to lead the charge in this, with our shining city on the hill mostly being illuminated by headlights of cars carrying sick passengers looking to get tested for this disease. Still, we’re not alone when it comes to sheer asshatery. The UK recently managed to lose thousands of COVID-19 cases… because it was tracking them in Excel CSVs.

The issue was caused by the way the agency brought together logs produced by commercial firms paid to analyse swab tests of the public, to discover who has the virus. They filed their results in the form of text-based lists – known as CSV files – without issue.

PHE had set up an automatic process to pull this data together into Excel templates so that it could then be uploaded to a central system and made available to the NHS Test and Trace team, as well as other government computer dashboards.

Public Health England (PHE) decided to put all of this information into a file using the XLS format. XLS was first introduced in 1987 and was replaced by the XLSX format over a decade ago. Putting aside the use of Excel to monitor positive COVID-19 cases in a major industrialized nation for just a moment, just the use of an antiquated format managed to lose PHE over sixteen thousand positive cases.

How? Well, XLS has restrictions as to how many rows of data it can record.

As a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of. And since each test result created several rows of data, in practice it meant that each template was limited to about 1,400 cases.

When that total was reached, further cases were simply left off.

Which means the people that had COVID-19 weren’t tracked for contact tracing. The government and its people didn’t have a complete picture as to either the total case count for the disease, nor its positivity rate. In other words, the agency in charge of national health failed to keep the nation informed as to its risk exposure because it didn’t know how to properly use a common office application that it repurposed to record COVID-19 data.

Labour’s shadow health secretary Jonathan Ashworth said lives had still been put at risk because the contact-tracing process had been delayed.

“Thousands of people [were] blissfully unaware they’ve been exposed to Covid, potentially spreading this deadly virus at a time when hospital admissions are increasing,” he told the House of Commons. “This isn’t just a shambles. It’s so much worse.”

The UK’s Health Secretary told the House of Commons that PHE had decided to replace the use of Excel, or what he called a “legacy system”, two months ago. But apparently PHE hadn’t gotten around to it yet.

And still hasn’t, actually. In fact, PHE’s plan to temporarily fix all of this is… more Excel!

To handle the problem, PHE is now breaking down the test result data into smaller batches to create a larger number of Excel templates. That should ensure none hit their cap.

But insiders acknowledge that the current clunky system needs to be replaced by something more advanced that excludes Excel, as soon as possible.

When you hear complaints that governments are not taking this pandemic seriously, this is what they mean.

Filed Under: , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “16K COVID-19 Cases Go Missing In UK Due To Government's Use Of Excel CSVs For Tracking”

Subscribe: RSS Leave a comment
34 Comments
Anonymous Coward says:

The UK recently managed to lose thousands of COVID-19 cases… because it was tracking them in Excel CSVs.

This is a misreading. The very next paragraph says the CSV files (presumably not having anything to do with Excel) were fine. The format to blame is XLS; or, better, the software to blame is Excel (any reasonable software would warn the user before throwing away part of their data).

Still, CSV often leads to trouble, because it’s really a family of formats. RFC 4180 attempts to standardize it, but if nobody specified this as the format to use, probably at least one piece of software will be using something subtly different—subtle enough that things will appear to work at first, until there’s a string containing a comma, backslash, double-quote, or newline; or the first line starts with "Sep=". And you could still be screwed if someone opens that in Excel, since it doesn’t default to 4180.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Re: Re:

The claim is that Excel is limited to 65535 or 65536 records in certain circumstances. One would think there would be some hard-to-ignore errors/warnings when dealing with larger datasets, e.g.:

  • ERROR: you’re trying to save a file with more rows that this format allows; delete some data so there are no more than X rows and try saving again, or save in XLSX instead
  • WARNING: the file you’re trying to open has Y rows, and Excel can only load X: cancel / load the first X rows only

But, apparently, Excel sometimes silently truncates the datasets. If so, that’s bad design.

This comment has been deemed insightful by the community.
PaulT (profile) says:

Re: Re: Re: Re:

It’s been a while since I’ve seen spreadsheets that large, due to thankfully being employed in areas where people are capable of picking the right tool for the job. But, I do recall having seen such errors in the past.

I’d say it’s more likely that the errors you described were being displayed, but the monkeys using the spreadsheets just ignored them. I hate to side with Microsoft, but if the choice is between assuming competence on their part or on the part of people who thought that a spreadsheet was the correct tool for this job, I’ll side with them.

Anonymous Coward says:

Re: Re: Re:2 Re:

I’d say it’s more likely that the errors you described were being displayed, but the monkeys using the spreadsheets just ignored them.

If so, those monkeys are masters of spin to be claiming it’s a Microsoft problem. It seems rather unlikely to me, but I don’t have a copy of Excel to check with.

Were they errors, they couldn’t have just been "ignored", because Excel would’ve aborted the saving/loading and nobody would have received the truncated data. Warnings could be ignored, but software should make it hard to do by accident and should make the consequences obvious (e.g., it shouldn’t be an OK/Cancel dialog defaulting to OK).

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Re: Re:2 Re:

The standard error when Excel fails reading a file says that there is an error with the file (not Excel) and would you like Excel to fix the file. Most people just click Yes and Excel throws away stuff until it can read the file.

Recent case in point: was using XLSX but kept getting errors with an outputted file with 150,000 rows. Turned out XLSX wasn’t having a problem with the row count since it will let you have a million rows in the new format, but it was choking on the fact that it only allows 65,535 hyperlinks in a single sheet. And it kept saying the file had an error. No, the file was perfect, Excel just failed and blamed everyone else. How did Microsoft ever become the software giant it is today with bullshit like this?

PaulT (profile) says:

Re: Re: Re:3 Re:

"How did Microsoft ever become the software giant it is today with bullshit like this?"

With office suites, they indulged in a lot of deliberate sabotage that prevented competitors like Novell from having their software work properly on Windows 95, leading customers to buy Microsoft Office instead…

For some time they had a monopoly in the space, and while free alternatives do exist now, since they had to reverse engineer Microsoft’s file formats to be compatible they don’t always work 100% or offer the same collaboration tools, so corporate environments refuse to switch.

Anonymous Coward says:

Re: Re:

That’s why they really should just use xml. It’s easy to parse. There’s plenty of tools available for it so that you can transform the data into whatever format you need it. Plus, it’s extensible. so if you find you need some additional information, you don’t need to "add another column" possibly messing up existing parsers.

Also, I really suspect the name of the file where they stored the data was called "Excel.ppt".

Anonymous Coward says:

Re: Re: Re: Re:

The problem wasn’t the CSV file, but rather the import into an old excel format, and xml would not fix that problem.

It kind of would—as I recall, Excel can’t import XML files. While not a CSV problem per se, it’s really easy to import a CSV to Excel incorrectly. XML is harder to deal with in general, which should also make it harder for non-programmers to fuck up (although there’s no shortage of opportunity for programmers to fuck it up).

Paul says:

Dev: To create a quality database for tracking these test results will take 6 weeks and X amount of money

Project Manager: Piffle, I can knock something up in Excel before lunch

Dev: Have at it then, just don’t come to me when it goes wrong

*some weeks passes*

Project Manager: Oi, dev, why didn’t you warn me about this

This comment has been deemed insightful by the community.
PaulT (profile) says:

Re: Re:

"Dev: Have at it then, just don’t come to me when it goes wrong"

It’s much easier now that everyone’s working remotely, but I quickly learned to routinely demand that any decisions were made via email, ticketing or IM and not verbally. It’s amazing how many times that’s saved my skin when some middle manager tried blaming techs for not doing something that they had explicitly ordered not to be done. There’s not a company I’ve worked for where some penny pinching management type hasn’t ignored all warnings to save a few hundred, then went into full blame mode when a preventable outage cost thousands.

Anonymous Coward says:

Re: Re: Re:

<OT>

Documenting decisions, incidents and actions helps with vermin like lawyers too. Lawyers hate opponents and individuasl who document out the wazoo. They like everything to be verbal, that way everything is "he said, she said" in front of the judge. Arguing is easy-peasy.

On the other hand, document everything, and in writing ask for clarification/confirmation. That way in front of a judge, there is a document and the lawyer has to explain why he didn’t clarify/explain when invited to.

Not a cure all for the scum of law. However, fighting someone who documents thoroughly adds significantly to a lawyers costs and often isn’t worth it. Even better, on RARE occasions, proper documentation can lead to a scum-bag lawyer getting what they deserve; unfortunately not often enough due to the lawyer conspiracy of "Professional Courtesy".

Similarly, managers hate fighting documents. Documenting isn’t fun, but unless you are one of those who go through life lucky, then it’s better to be prepared. Documents which exist but weren’t needed are a small waste of time. Documents who don’t exist and are needed can make a big difference in the course of one’s future.

</OT>

Scary Devil Monastery (profile) says:

Re: Re: Re:

"I quickly learned to routinely demand that any decisions were made via email, ticketing or IM and not verbally."

This, right there. Always make sure there’s a correspondence chain detailing who asked for what and when.

Because if such correspondence doesn’t exist, it’ll be the fault of the grunt who did the work, not the one who ordered the work done to those specifications.

jilocasin (profile) says:

nuclear facepalm

Wow just doesn’t seem appropriate enough.

  • Using a spreadsheet when a database should be used
  • Using an antiquated spreadsheet when modern versions are available
  • Losing track of thousands of Covid-19 cases, causing more misery and death

Priceless

If it was a matter of cost, someone should introduce them to LibreOffice. The standard .ods format can handle 1,048,576 (2^20) rows and it’s free.

This comment has been deemed insightful by the community.
Toom1275 (profile) says:

Top-rated comment on this from Ars, by deet:

It’s nice to be in IT and to know the right way to do everything and to have everything you need in order to do whatever you want.

It sucks when your job is to get data from one organization to another organization and you’re crushed under a pile of conflicting demands where the stakes are high, there’s no budget or time to go hiring a database application developer, everyone else has kicked the can down the road, and you’re the last person willing to figure out some way — any way — of getting the task done because it must be done and nobody really cares how you do it, as long as you get it sent and can move on to the next thing, which is also on fire. Your only hope in that situation is that any errors can be sorted out later.

R2_v2.0 (profile) says:

Not too much self reflection here

Tech people on here who have worked with large orgs and government should really understand how ‘new and emerging requirements’ get handled in these situations.
Firstly, almost no development will be getting done in-house. Your new reporting solution will need to go to an external party who is just dying to milk time and money from your government department.
People are here talking about databases which might be part of the solution but then how are you extracting data and where’s it going to? I mean, do people really think the contact tracers have SQL access/skills?
No, contact tracers are a hastily assembled call-centre group with no CRM system who probably work by having a team-leader assigning cases out of a spreadsheet (yes, a spreadsheet).
The real story here is how, in the era of ubiquitous IT, it is still almost impossibly hard for technology solutions to respond quickly to changing circumstances in a way which is reasonable for front-line workers, manages data security, is flexible and is accurate. And that’s not even considering cost.

I’m not suggesting Excel is the ‘right way’ but there are very good reasons it gets used.

As a footnote, I notice with interest mention of the XLS ‘format’. I’d be interested to know more about that, it’s a common data-analyst trick to add an XLS extension to a tab-delimited text file. File associations will cause this to open in a recipients Excel with a warning about file format. In this scenario you are not limited in the number of rows but I wonder what happens if someone tries to save said file. I imagine it will apply the ‘correct’ XLS format and truncate the file.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...