Scientists Forced To Change Names Of Human Genes Because Of Microsoft's Failure To Patch Excel

from the code-is-law dept

Six years ago, Techdirt wrote about a curious issue with Microsoft’s Excel. A default date conversion feature was altering the names of genes, because they looked like dates. For example, the tumor suppressor gene DEC1 (Deleted in Esophageal Cancer 1) was being converted to “1-DEC”. Hardly a widespread problem, you might think. Not so: research in 2016 found that nearly 20% of 3500 papers taken from leading genomic journals contained gene lists that had been corrupted by Excel’s re-interpretation of names as dates. Although there don’t seem to be any instances where this led to serious errors, there is a natural concern that it could distort research results. The good news is this problem has now been fixed. The rather surprising news is that it wasn’t Microsoft that fixed it, even though Excel was at fault. As an article in The Verge reports:

Help has arrived, though, in the form of the scientific body in charge of standardizing the names of genes, the HUGO Gene Nomenclature Committee, or HGNC. This week, the HGNC published new guidelines for gene naming, including for “symbols that affect data handling and retrieval.” From now on, they say, human genes and the proteins they expressed will be named with one eye on Excel’s auto-formatting. That means the symbol MARCH1 has now become MARCHF1, while SEPT1 has become SEPTIN1, and so on. A record of old symbols and names will be stored by HGNC to avoid confusion in the future.

So far, 27 genes have been re-named in this way. Modifying gene names in itself is not unheard of. The Verge article notes that, in the past, names that made sense to experts, but which might alarm or offend lay people, are also changed from time to time:

“We always have to imagine a clinician having to explain to a parent that their child has a mutation in a particular gene,? says [Elspeth Bruford, the coordinator of HGNC]. “For example, HECA [a cancer-related human gene] used to have the gene name ‘headcase homolog (Drosophila),’ named after the equivalent gene in fruit fly, but we changed it to ‘hdc homolog, cell cycle regulator’ to avoid potential offense.”

It is nice to know that we won’t need to worry about serious problems flowing from Excel’s habit of automatically re-naming cell entries. But it’s rather troubling that Microsoft doesn’t seem to have thought the problem worthy of its attention or a fix, despite it being known for at least six years. It shows once again how people are being forced to adapt to the software they use, rather than the other way around. Or, as Lawrence Lessig famously wrote: “code is law“?

Follow me @glynmoody on Twitter, Diaspora, or Mastodon.

Filed Under: , , , , ,
Companies: microsoft

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Scientists Forced To Change Names Of Human Genes Because Of Microsoft's Failure To Patch Excel”

Subscribe: RSS Leave a comment
46 Comments
Anonymous Coward says:

Re: Format

From what I understand, that may not work in all cases.
Depending on how the document is saved, that data may not be preserved. So if somebody else open up that document on another computer, it would change the names to dates before anybody could adjust the column… and changing the column type after the fact won’t change the data back.

This comment has been deemed insightful by the community.
Mike (profile) says:

Re: Re: Format

You are correct. A very common means of sharing data is the CSV (comma seperated values) file format. The file is structured so that commas seperate columns, and line breaks seperate rows.

The CSV is a very file-size efficient way of sharing data sets, but it stores literally no formatting information.

Excel is usually registered as the default application to open CSVs, and because there is no meta information about the data in the cells, Excel tries to be clever with its parsing of the information.

This leads to all sorts of fun issues – strings being interpreted as dates like in the article above, phone numbers being interpreted as numbers (and consequently losing their leading 0 if they’re not in international format), fractional prices losing their precision when being rounded to 2 places, etc.

This comment has been deemed insightful by the community.
James Burkhardt (profile) says:

Re: Format

Except that Excel will drop that choice whenever possible.

I am an accountant. I use .csv files to upload large transactions into my accounting software, and requires very specific formatting to acomplish.

Notably I need to not use the ‘date’ format in excel which stores the date as a number instead of the actual mm/dd/yy formatting, and I needed leading zeros without any puncutation in one column.

If I used plain text and saved my CSV file, excel would helpfully save space by removing very critical leading zeros. If I loaded the file, it would strip the dates out of the file and force them into date format. It fought me. I have since switched to Libre office which gives me the option to set formatting before i load a file, and will accept leading zeros being saved to CSV format.

Its an issue with any data set saved as a CSV or other file type that doesn’t carry format. Its most critical with large data sets that benefit in size from a lack of formatting. Excel auto format on load kills many of its best uses for me. Just formatting the sheet once doesn’t help plain data files like CSV when excel gets its hands on it. All it takes is one intern opening the file and autosave catching the changes for you to lose lots of data.

Its a toggle, Microsoft. a free software company has better CSV auto-format code then you. Get it together.

TripMN says:

Re: Re: Format

I was just bitching on another social media site today about Excel’s poor choices in everything. Today’s gripe was about the welcome screen that helpfully shows you all of your recent files but neglects to show the file extension so you can’t tell which is which.

Also high on my aggravation list, the ISO-8601 date format that the entire world uses (YYYY-MM-DD) is not a standard date format type and has to be manually added every time you want it to make your dates look like that.

And yes, every time you load any kind of file or copy/paste any data into it, it tries to auto-format the data using what could be best described as the "best guess of a 5th grader currently flunking math and science".

Yeah, this topic makes this developer a bit not happy.

ryuugami says:

Re: Re: Re: Format

And yes, every time you load any kind of file or copy/paste any data into it, it tries to auto-format the data using what could be best described as the "best guess of a 5th grader currently flunking math and science".

Auto-format itself wouldn’t be much of a problem, if it weren’t for the completely brain-dead handling of the result: Excel tries to auto-format, AND DISCARDS PRE-FORMATTED DATA, with no way to get it back.

You’d think a ~35 years old application whose primary purpose is organizing data wouldn’t work so hard on actively destroying said data.

Anonymous Coward says:

Re: Re: Format

Its a toggle, Microsoft. a free software company has better CSV auto-format code then you. Get it together.

Surely you know that every MS product is required to have a totally insane and broken solution to a problem which has either been solved sensibly for decades by everyone else, or which only exists as a problem because of other insane design decisions.its in their deign requirements, right next to the one forbidding compliance with the user interface standards.

Scary Devil Monastery (profile) says:

Re: Re: Format

"Its a toggle, Microsoft. a free software company has better CSV auto-format code then you. Get it together."

Hey, don’t give them ideas. MS never managed to create the illusion of "choice" without including that damn paperclip to provide you with "helpful" suggestions. No one wants Clippy back.

ECA (profile) says:

Re: Re: Format

MS and older formats..

Long ago,
the Amiga had a very neat trick. The First part of loading hte file, TOLD the prog/Data what format it was in.

If you know windows and dos, it can have any ext. and Still not be able to decide what the format is, unless its told. with The thought of Loading and excel deciding what to do…its MS..THEY decide what to do.
Iv loaded Standard files into MS office and other MS products and the prog tends to DO ITS OWN THING.. Where If I used an Alt. Program designed for the ORIGINAL Formats, it never had a problem. It was Like files in/exported back and forth to Apple.
Postscript was Wonderful for this..talk about a mess.

This comment has been deemed insightful by the community.
Thad (profile) says:

I think it’s worthwhile to ask what percentage of users type in "SEPT1" and expect it to be interpreted as a date versus the percentage who type it in and don’t. I don’t know if there’s data for that, but I would expect the majority of people typing "SEPT1" to mean September 1st. In which case no, I don’t think the correct solution is for Microsoft to modify Excel’s behavior in a way that’s unexpected and inconvenient for a larger number of users for the sake of a smaller minority, I think it’s for the smaller minority to learn how to change the default settings in Excel.

This comment has been deemed insightful by the community.
Ben (profile) says:

Re: Re: what percentage of users

Microsoft should make this behaviour turn on and off-able. (And store that state in the spreadsheet, not just the options of the installation)
Then those that need the stability of their data to be maintain can get that behaviour, and those who like the convenience of using the most widely use spreadsheet application in the world can have it hold their hand for them.

PaulT (profile) says:

Re: Re: what percentage of users

"It’s hardly Microsoft’s fault that geneticists chose names that look like dates"

It’s hardly geneticists fault that Microsoft tries to double guess every decision they make and insert their own interpretation without permission. It should be easy to turn off Microsoft’s automatic alteration of your data, and it’s their fault if they make design choices that ensure that it’s not an easy choice to specify.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Re:

No, but the correct solution is for Microsoft to incorporate persistence in the .xls file format for retaining preferences set during document creation. If you designated a column as plaintext when creating the spreadsheet, it should REMAIN plaintext when opened on other systems by your coworkers.

This comment has been deemed insightful by the community.
Queex (profile) says:

Re: Re:

Autoconverting to a datatype in a spreadsheet, whatever the type, is a defect, plain and simple. The fact that it’s often absurdly aggressive in its conversion makes it even worse. Your example – SEPT1 – should not even be considered valid for conversion into a date as it doesn’t have a year. Any assumption as to what year to use – i.e. this year, the next September to occur – is a very strong assumption. Plus Excel will interpret a whole bunch of separators as being an attempt at a date, even those that don’t fit the current locale.

A lack of a conversion can always be fixed after the fact. An unwanted conversion often cannot.

Excel isn’t alone in this defect, most spreadsheets mimic this defective behaviour. It’s an antifeature, and should be purged ASAP, from every piece of software with it.

Tanner Andrews (profile) says:

Re: Re:

what percentage of users type in "SEPT1" and expect it to be interpreted as a date

Expect somewhere in the neighborhood of epsilon. That is a mighty odd way of typing a date. I would expect users to give a space, and most of them to leave out the `t’.

And remember, we are not actually looking at user keyed input here. We are looking at data from a file. What percentage of useful programs would store a date as "SEPT1"? I would expect roughly zero.

Someone at Microsoft is immune to brain injury.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Autoconverting to a datatype in a spreadsheet, whatever the type, is a defect, plain and simple.

Yes, certainly–the way Microsoft does it. (Of course "the way Microsoft does it" is, in the programming world, a euphemism for "ostentatiously wrong".) Merely autoconverting would just be wrong with a capital "R". The ostentation consists in refusing not to preserve the error across saves.

Some real programming languages carry the string as entered UNTIL it is used in computation that requires a specific data type. (Javascript does this–and no quibbling about "real": we’re comparing to spreadsheets here.) This approach can be made to work–nothing is mangled until it’s absolutely necessary to mangle it.

I suspect Microsoft fell into the "premature optimization" trap–which is a temptation even to real programmers: Let’s make the buggy program fast (or in small memory), then (maybe) try to fix the bugs. The fix always turns out to be much harder (more likely to break something else) and generally cancels out the speedup of the optimization or worse. (Microsoft is not keen on fixing bugs that the user can theoretically work around–they are considered user errors. (Which isn’t WRONG … if you include the choice of software in the first place.)

ryuugami says:

Re: Re:

Though in recent years MS has been gradually moving towards Apple-style totalitarian control (and combining it with Google-style adware and spyware, for good measure), I’d say at this point they’re still an order of magnitude better than Apple.

That advantage is shrinking, but sadly not because Apple is getting better…

Anonymous Coward says:

Excel is not the problem; your browser is!

I feel one of the most common sources of corrupted data is when you find a cvs file on the internet and want to save it on your computer.

But unfortunately, in order to help you, the browser does not simply save the file locally. Instead it launches Excel and hands it the file. And as a naive user, you immediately save the file from within Excel, not realising you just saved a bunch of autoconverted and possibly corrupted data…

Had the browser just saved the file locally, that data would be fine and you could try to your hearts content to import or open it in a number of applications, formats, with or without any automatic conversion or datatype detection, until you were satisfied that your software of choice had interpreted the data correctly.

Main workaround: don’t let the browser open an application to handle csv files.
In fact, don’t let the computer in general (I’m looking at you, Explorer) use any ‘default’ application to open any csv file which attempts to automagically convert or interpret your data! It will probably fail at some point, and possibly without you realising at that time.

Anonymous Coward says:

Re: Excel is not the problem; your browser is!

In IT we like to refer to this problem with error codes like —
ID-10T
or
E.R.B.C.A.K.

You can blame Microsoft for a lot of things, but the settings you have on your browser of what to automatically do (like opening an APP instead of saving the file) is totally your problem, not theirs.

Tanner Andrews (profile) says:

I could go on a 10-page rant about the never ending ribbon changes

Back in the day, we had the option of using a cloth ribbon instead of a carbon ribbon. You did not have to change it very often, and it was really up to you as to how faint you were willing for your type to be as the ink was transferred from the ribbon to the paper.

Ask your office supply vendor about cloth ribbons. For some people, this may be an upgrade.

Bartonktj (user link) says:

how to tell if a vietnamese girl likes you

online dating sites Email Tips

have you been dating online and are frustrated with poor responses to your emails?

Or not only that, Women not replying to initial emails at all? This will make you give up isn it? particularly if you are recently divorced, seperated and single. I know the feeling and have been there and felt the pain.

very, I have good news for you because consider I am giving away extremely valuable information for free. If you go through the steps I about to outline, you will have a lot of success with online dating. These online dating tips will triple your response and dates through the internet dating sites. Let go to them. these guidelines are specific to the FIRST contact email.

You must state at the beginning of your email

That her profile shows off.

Angel64, I Mick and I writing as you sound interesting to me and your profile stands right out of the others. Then mention a few phrases of what you liked. Without gushing or sucking up in fact.

This will obviously please the lady and create her own a fixation with you. She will feel special and rightly so as you have opted to contact her. She is the lucky one to get your attraction.

Be sure post and lightheartedly poke fun of something in [url=https://www.love-sites.com/signs-that-you-can-recognise-when-a-vietnamese-lady-is-into-you/%5Dhow to tell if a vietnamese girl likes you[/url] her profile.

You take pleasure in Neil Diamond and ABBA?
] [Just kidding]

continuously put a smiley or [Just joking] In brackets so as not to offend. This says you are playful and not a soft guy that afraid to have a great time. She appreciate your energy.

Mention something specific in her profile that you found significant.

You worked working in london for 12 months, I was there for two many years. When were you in that respect?

This proves to her you have read her profile in

Detail and have something in keeping. this creates repore and her interest immediately.

Ask something appealing about her.

Said you do dance dvds, What style and design. could it possibly be salsa? How often do you visit dance. I similar to the club down on East esplanade on Friday nights. there is a great Latin band and free lessons from 7pm until 8pm.

This gives her a specific question to deal to.

Avoid uncertainties that are mundane and common like, your own colour?

Pick something she mentioned on her profile and ask her particulars on that.

the guidelines presented are in sequence. wind up and keep it brief. Four or Five sentences is enough. Don go on about on your and spill your guts to her. Finish with a line that says something similar to this.

Don we chat on the phone for a while and become familiar each other. It saves time beginner’s guide to long emails. Here my total number, Mick

02 47476534. I home looking for 6.30 and you catch me in a moment. Or here is my cell phone 098975643.

that’s the plan. These are very helpful and valuable tips that can save you time, Money and misery. you’ll get THREE times the replies and dates by using these tips.
[—-]

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...