Scientists Forced To Change Names Of Human Genes Because Of Microsoft's Failure To Patch Excel

from the code-is-law dept

Six years ago, Techdirt wrote about a curious issue with Microsoft's Excel. A default date conversion feature was altering the names of genes, because they looked like dates. For example, the tumor suppressor gene DEC1 (Deleted in Esophageal Cancer 1) was being converted to "1-DEC". Hardly a widespread problem, you might think. Not so: research in 2016 found that nearly 20% of 3500 papers taken from leading genomic journals contained gene lists that had been corrupted by Excel's re-interpretation of names as dates. Although there don't seem to be any instances where this led to serious errors, there is a natural concern that it could distort research results. The good news is this problem has now been fixed. The rather surprising news is that it wasn't Microsoft that fixed it, even though Excel was at fault. As an article in The Verge reports:

Help has arrived, though, in the form of the scientific body in charge of standardizing the names of genes, the HUGO Gene Nomenclature Committee, or HGNC. This week, the HGNC published new guidelines for gene naming, including for "symbols that affect data handling and retrieval." From now on, they say, human genes and the proteins they expressed will be named with one eye on Excel's auto-formatting. That means the symbol MARCH1 has now become MARCHF1, while SEPT1 has become SEPTIN1, and so on. A record of old symbols and names will be stored by HGNC to avoid confusion in the future.

So far, 27 genes have been re-named in this way. Modifying gene names in itself is not unheard of. The Verge article notes that, in the past, names that made sense to experts, but which might alarm or offend lay people, are also changed from time to time:

"We always have to imagine a clinician having to explain to a parent that their child has a mutation in a particular gene,” says [Elspeth Bruford, the coordinator of HGNC]. "For example, HECA [a cancer-related human gene] used to have the gene name 'headcase homolog (Drosophila),' named after the equivalent gene in fruit fly, but we changed it to 'hdc homolog, cell cycle regulator' to avoid potential offense."

It is nice to know that we won't need to worry about serious problems flowing from Excel's habit of automatically re-naming cell entries. But it's rather troubling that Microsoft doesn't seem to have thought the problem worthy of its attention or a fix, despite it being known for at least six years. It shows once again how people are being forced to adapt to the software they use, rather than the other way around. Or, as Lawrence Lessig famously wrote: "code is law

Follow me @glynmoody on Twitter, Diaspora, or Mastodon.

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: autoconversion, dates, excel, gene names, genes, spreadsheets
Companies: microsoft

Reader Comments

Subscribe: RSS

View by: Thread

  1. icon
    ECA (profile), 12 Aug 2020 @ 3:13pm

    Re: Re: Format

    MS and older formats..

    Long ago,
    the Amiga had a very neat trick. The First part of loading hte file, TOLD the prog/Data what format it was in.

    If you know windows and dos, it can have any ext. and Still not be able to decide what the format is, unless its told. with The thought of Loading and excel deciding what to do...its MS..THEY decide what to do.
    Iv loaded Standard files into MS office and other MS products and the prog tends to DO ITS OWN THING.. Where If I used an Alt. Program designed for the ORIGINAL Formats, it never had a problem. It was Like files in/exported back and forth to Apple.
    Postscript was Wonderful for about a mess.

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here

Subscribe to the Techdirt Daily newsletter

Comment Options:

  • Use markdown. Use plain text.
  • Remember name/email/url (set a cookie)

Follow Techdirt
Special Affiliate Offer

Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it

Email This

This feature is only available to registered users. Register or sign in to use it.