The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale

from the what's-in-a-name dept

Maybe someday AI will be sophisticated, nuanced, and accurate enough to help us with platform content moderation, but that day isn’t today.

Today it prevents an awful lot of perfectly normal and presumably TOS-abiding people from even signing up for platforms. A recent tweet from someone unable to sign up to use an app because it didn’t like her name, as well as many, many, MANY replies from people who’ve had similar experiences, drove this point home:

Facebook, despite its insistence on users using real names, seems particularly bad at letting people actually use their real names.

But of course, Facebook is not the only instance where censorship rules based on bare pattern matching interfere not just with speech but with speaker’s ability to even get online to speak.

This dynamic is what’s known as the Scunthorpe Problem. Scunthorpe is a town in the UK whose residents have had an appallingly difficult time using the Internet due to a naughty word being contained within the town name.

The Scunthorpe problem is the blocking of e-mails, forum posts or search results by a spam filter or search engine because their text contains a string of letters that are shared with another (usually obscene) word. While computers can easily identify strings of text within a document, broad blocking rules may result in false positives, causing innocent phrases to be blocked.

The problem was named after an incident in 1996 in which AOL’s profanity filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town’s name contains the substring cunt. Years later, Google’s opt-in SafeSearch filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names.

(A related dynamic, the Clbuttic Problem, creates issues of its own when, instead of outright blocking, software automatically replaces the allegedly naughty words with ostensibly less-naughty words instead. People attempting to discuss such non-purient topics as Buttbuttin’s Creed and the Lincoln Buttbuttination find this sort of officious editing particularly unhelpful…)

While examples of these dynamics can be amusing, each is also quite chilling to speech, and to speakers wishing to speak.

It’s not something we should be demanding more of, but every time people call for “AI” as a solution to online content challenges these are the censoring problems the call invites.

A big part of the problem is that calls for “AI” tend to treat it like some magical incantation, as if just adding it will solve all our problems. But in the end, AI is just software. Software can be very good at doing certain things, like finding patterns, including patterns in words (and people’s names…). But it’s not good at necessarily knowing what to make of those patterns.

More sophisticated software may be better at understanding context, or even sometimes learning context, but there are still limits to what we can expect from these tools. They are at best imperfect reflections of the imperfect humans who created them, and it’s a mistake to forget that they have not yet replicated, or replaced, human judgment, which itself is often imperfect.

Which is not to say that there is no role for software to help in content moderation. The things that software is good at can make it an important tool to help support human decision-making about online content, especially at scale. But it is a mistake to expect software to supplant human decision-making. Because, as we see from these accruing examples, when we over-rely on them, it ends up being real humans that we hurt.

Filed Under: , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale”

Subscribe: RSS Leave a comment
46 Comments
Mason Wheelersays:

Facebook, despite its insistence on users using real names, seems particularly bad at letting people actually use their real names.

I remember the story, from around 7-8-ish years ago, of a guy named Mark Zuckerberg who had a heck of a time signing up for a Facebook account, because its automated filters kept flagging him as fraudulently attempting to impersonate their founder, despite multiple manual interventions and appropriate documentation provided that yes, this was in fact his real, legal name.

Christensonsays:

Duplicate Problem

I once accidentally collected the prescription for a namesake of mine (first and last name) in a CVS pharmacy. The birthdate straightened it out.

But dayum, don’t you think I should be able to sign my name Tom, Dick, or Harry?? lol (or Blue, here’s grinning at TD!)

And Facebook, grow the fuck up, or I’ll have to shove something in someone’s Scunthorpe, just like in a Philip K Dick novel involving Wang computers, or was that an ee cummings poem?

Anonymoussays:

Re: Duplicate Problem

There was a story I saw online about someone who found two records in their student database, differing only by sex. Same name, birthdate, address. It ended up being two married students?last name and address shared due to marriage, and shared birthdates happen when most people start at the same age.

Handles are probably better than "real" names at avoiding these problems.

Re: Re: Duplicate Problem

Indeed, I have to bowdlerize my own name on some platforms because their net nanny doesn’t like “Cockcroft.”

One twerp on Twitter told me I should change it, but hell, no. It’s my name and it’s up to all the stupid little weenies to grow the hell up. Then go look up British place names to find more things to be artificially offended about. The seaside ones are the funniest.

PaulTsays:

Re:

Typically, it’s so that other people can actually find you, since the entire point of social networking is to converse with people who know you IRL.

If you don’t care for that, fair enough, but it’s no mystery why people who want to talk to family and friends they may have previously lost contact with wish to make themselves easy to find.

PaulTsays:

Re: Re: Re:

What happens when you search on Facebook?

I have a very common name and you couldn’t find me on Google very easily, but if you search for my name on Facebook you will see me listed along with a recognisable photo. I’ll probably come up fairly early in the list if we were to share some contacts. I’ve caught up with a lot of lost acquaintances I made pre-social media that way, which may not have happened had I used some kind of unique pseudonym (since people who had lost contact wouldn’t know what to search for).

I do also know people who use pseudonyms exclusively on there, but they tend to be the people deliberately trying to keep old friends away from them, which is not the majority in my experience.

Ninjasays:

It gets particularly annoying when you are playing a goddamn single game that MUST be connected and you can’t go silly on names.

Old but gold: http://www.cracked.com/blog/5-reasons-diablo-iii-represents-gamings-annoying-future/

Ppl need to stop being stupid moralists. Dicks, pussies and other bodily functions should have stopped being taboo for a long time now. Facebook and other platforms overmoderating are just a symptom of our stupid moralism.

Marksays:

I am the author of an open source program used by several thousand people worldwide in the science and engineering fields. I often get emails from people with questions about use or some feature of the program. Recently I had an exchange with a gentleman from Belgium (?) with the unfortunate last name of Niggerman. His emails were always filtered to the “Deleted” folder despite there being no rules set to do so. I could not even whitelist his email address.

Also, remember that story about some Christian oriented browsing / publishing filter that changed well know runner Tyson Gay’s name to Tyson Homosexual and actor Dick van Dyke’s name to Penis van Lesbian?

And who could forget the kerfuffle over the naming of the Harry Baals Government Center. https://en.wikipedia.org/wiki/Harry_Baals

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...
Older Stuff
13:40 It's Great That Winnie The Pooh Is In The Public Domain; But He Should Have Been Free In 1982 (Or Earlier) (35)
12:06 Norton 360 Now Comes With Crypto Mining Capabilities And Sketchy Removal Process (28)
10:45 Chinese Government Dragnet Now Folding In American Social Media Platforms To Silence Dissent (14)
10:40 Daily Deal: The 2022 Ultimate Cybersecurity Analyst Preparation Bundle (0)
09:29 A Fight Between Facebook And The British Medical Journal Highlights The Difficulty Of Moderating 'Medical Misinformation' (9)
06:29 Court Ruling Paves The Way For Better, More Reliable Wi-Fi (4)
20:12 Eighth Circuit (Again) Says There's Nothing Wrong With Detaining Innocent Minors At Gunpoint (15)
15:48 China's Regulatory War On Its Gaming Industry Racks Up 14k Casualties (10)
13:31 Chinese Government Fines Local Car Dealerships For Surveilling While Not Being The Government (5)
12:08 Eric Clapton Pretends To Regret The Decision To Sue Random German Woman Who Listed A Bootleg Of One Of His CDs On Ebay (29)
10:44 ICE Is So Toxic That The DHS's Investigative Wing Is Asking To Be Completely Separated From It (29)
10:39 Daily Deal: The 2022 Complete Raspberry Pi And Arduino Developer Bundle (0)
09:31 Google Blocked An Article About Police From The Intercept... Because The Title Included A Phrase That Was Also A Movie Title (24)
06:22 Wireless Carriers Balk At FAA Demand For 5G Deployment Delays Amid Shaky Safety Concerns (16)
19:53 Tenth Circuit Denies Qualified Immunity To Social Worker Who Fabricated A Mother's Confession Of Child Abuse (35)
15:39 Sci-Hub's Creator Thinks Academic Publishers, Not Her Site, Are The Real Threat To Science, And Says: 'Any Law Against Knowledge Is Fundamentally Unjust' (34)
13:32 Federal Court Tells Proud Boys Defendants That Raiding The Capitol Building Isn't Covered By The First Amendment (25)
12:14 US Courts Realizing They Have A Judge Alan Albright Sized Problem In Waco (17)
10:44 Boston Police Department Used Forfeiture Funds To Hide Purchase Of Surveillance Tech From City Reps (16)
10:39 Daily Deal: The Ultimate Microsoft Excel Training Bundle (0)
09:20 NY Senator Proposes Ridiculously Unconstitutional Social Media Law That Is The Mirror Opposite Of Equally Unconstitutional Laws In Florida & Texas (25)
06:12 Telecom Monopolies Are Exploiting Crappy U.S. Broadband Maps To Block Community Broadband Grant Requests (7)
12:00 Funniest/Most Insightful Comments Of 2021 At Techdirt (17)
10:00 Gaming Like It's 1926: Join The Fourth Annual Public Domain Game Jam (6)
09:00 New Year's Message: The Arc Of The Moral Universe Is A Twisty Path (33)
19:39 DHS, ICE Begin Body Camera Pilot Program With Surprisingly Good Policies In Place (7)
15:29 Remembering Techdirt Contributors Sherwin And Elliot (1)
13:32 DC Metro PD's Powerful Review Panel Keeps Giving Bad Cops Their Jobs Back (6)
12:11 Missouri Governor Still Expects Journalists To Be Prosecuted For Showing How His Admin Leaked Teacher Social Security Numbers (39)
10:48 Oversight Board Overturning Instagram Takedown Of Ayahuasca Post Demonstrates The Impossibility Of Content Moderation (10)
More arrow
This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it