Content Moderation Case Studies: Stopping Malware In Search Leads To Unsupported Claims Of Bias (2007)

from the bias-here,-bias-everywhere dept

Summary: As detailed in a thorough oral history in Wired back in 2017, it?s hard to overstate the importance of Google?s Safe Browsing blocklist effort that began as a project in 2005, but really launched in 2007. The effort was in response to a recognition that there were malicious websites out there that were attempting to trick people into visiting in order to install various forms of malware. Google?s Safe Browsing list, and its corresponding API (used by pretty much every other major browser, including Safari, Firefox and more) has become a crucial part of stopping people from being lured to dangerous websites that may damage or compromise their computers.

Of course, as with any set of filters and blocklists, questions are always raised about the error rate, and whether or not you have too many false positives (or false negatives). And, not surprisingly, when sites are added to the blocklist, many website operators become upset. Part of the problem was that, all too often, the websites had become compromised without the operator knowing about it — leading them to claim they were falsely being blocked. From the oral history:

One interesting thing that happened was related to how we communicated with web masters who were affected by Safe Browsing alerts. Because very quickly when we started looking into the problem of how users might be exposed to malware on the web, we realized that a lot of it came from websites that were actually benign, but were compromised and started delivering malware via exploits. The site owners or administrators typically did not realize that this was happening.

Of course, at the same time, when sites are being put on blocklists when they don?t even realize that their sites are compromised, they often assume that there is some sort of bias against them or the content of their websites.

Matt Cutts, who for many years was in charge of stopping ?web spam? at Google, wrote a few blog posts in the early years responding to people who accused Google of blocking websites whose content they disagreed with, with Matt explaining how he responds to such complaints. In that linked post, Cutts responds to complaints from a vocal Google critic, who claims that it was blocking the website of a think tank that had expressed opinions that were different than Google?s regarding net neutrality.

Cutts points out that this is not accurate and that Google had only blocked a specific page on that think tank?s website, and the reason was that that page itself was compromised and visiting it would install malware on the visitor?s computer:

If you visit pff.org/issues-pubs/, you?ll see that it?s a web form. It looks like pff.org stored their data in a SQL database but didn?t correctly sanitize/escape input from users, which led to a SQL injection attack where regular users got exposed to malicious code. As a result, normal users appear to have loaded urls like hxxp://www.ausbnr .com/ngg.js and hxxp://www.westpacsecuresite .com/b.js < --- Don't go to urls like this unless you are 1) a security researcher or 2) want to infect your machine. Notice that even in this case, Google didn't flag the entire pff.org site, just the one directory on the site that appeared to be dangerous for users. I never like it when people accuse Google of flagging a site as malware just because we don't like it for some reason. The bright side of this incident is that pff.org will find out about a security hole on their site that was hurting their users (it looks like pff.org has disabled the search on the vulnerable page in the last few hours, so it appears that they're responding quickly to this issue). Flagging malware on the web doesn't earn any money for Google, but it's clearly a Good Thing for users and for the web. I'm glad we do it, even if it means that sometimes we have to write a generic malware post to debunk misconceptions.

Decisions for Google:

  • How do you determine which sites are included in the Safe Browsing blocklist?
  • Should the blocklist cover entire domains, or just specific pages within a domain?
  • How should Google inform website operators that their sites are distributing malware?
  • How do you determine which sites are deliberately distributing malware, and which are simply compromised?
  • How should Google respond to accusations of false positives?

Questions and policy implications to consider:

  • If Google is too transparent, will that actually help those with malicious intent get around the blocklists?
  • Since Google is not making money from this blocklist, how many resources will it take to keep the list accurate and to handle appeals and questions from blocked sites?
  • Should a single company be making these decisions?

Resolution: Google took a number of steps to try to alleviate the concerns, starting initially with a diagnostic tool for sites that were put on the blocklist.

Over time that expanded to Google?s Search Console for websites. From Wired?s oral history of Safe Browsing, here is Panos Mavrommatis, the Engineering Director of the program describing how and why they built the Search Console:

In our first interactions with web masters they would often be surprised. So we started building tools dedicated to web masters, now called Search Console. The basic feature was that we would try to guide the web master to the reason that their website was infected, or if we didn?t know the exact reason we would at least tell them which pages on their server were distributing malware, or we would show them a snippet of code that was injected into their site.

Along the way, they have faced mistakes and errors, including a technical error in 2009 that, for a time, labeled every single Google search result an error.

Google also put together an appeals process for those who felt their site did not belong on the blocklist, and enlisted a third party, the non-profit StopBadware.org which was formed out of Harvard?s Berkman Klein Center in 2006, to help with reviewing appeals when a site gets listed in Safe Browsing.

This partnership represents an early version of having an independent third party help review content moderation choices on a platform.

Filed Under: , ,
Companies: google

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Content Moderation Case Studies: Stopping Malware In Search Leads To Unsupported Claims Of Bias (2007)”

Subscribe: RSS Leave a comment
6 Comments
Sok Puppette says:

it’s hard to overstate the importance of Google’s Safe Browsing blocklist effort […] has become a crucial part of stopping people from being lured to dangerous websites that may damage or compromise their computers.

I always turn it off. Never been "lured" yet.

I think you will find that it is very, very easy to overstate the importance of something like that. You’ve done it without even trying.

Anonymous Coward says:

"You are a viscious demon-spawned malefactor" — no evidence of bias under any circumstances. Because, after all, there are persons in every large group of that description.

"All blondes/demagogues/plutocrats/elbonians/quintarians/… are VDSM’s — always bias under every circumstance. Because, after all, there are persons in every large group that do not fit that description.

The fact is, a brunette demagogic lower-slobbovian pantheist can list four reasons why he might have been banned. But they are almost certainly all wrong, unless he can show that ALL OTHER b/d/ls/p’s were banned. The most likely cause is, he’s one of the b/d/ls/p’s who fall into the VDSM category. Without the showing, we should all be assuming automated/inaccurate VDSM checks. And even most publicised suspicious-sounding statistical tests aren’t valid, and most non-mathematicians can’t tell which ones.

Anonymous Coward says:

It doesn’t work to block them because they form a global network from NY to serbia to iran to china and back to CA and right down to south Africa and brazil

Google should do more to block cyber attacks related to extremist content…

And rip out Microsoft’s internet explorer lasers like it was supposed to in the 90s

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...