Content Moderation Case Studies: Copyright Claims On White Noise (2018)

from the white-noise-is-public-domain dept

Summary: Every platform hosting user generated content these days is pretty much required (usually by law) to have policies in place to deal with copyright-infringing material. However, not all content on these platforms is covered by copyright, and that can potentially lead to complications, since policies are often built off of the assumption that everything must be covered by some form of copyright.

Australia-based music technologist Sebastian Tomczak, who has a PhD in computer generated music, created from scratch a 10 hour ?low level white noise? recording, which he placed on YouTube. He created the file himself, then made a video version of it, and posted it to YouTube. In early 2018, he discovered that there had been five separate copyright claims on the video from four separate copyright holders.

Each of the claims argued that other videos of white noise held the copyright on white noise, and that Tomczak?s video infringed on their own. Amusingly, each claim designates which short segment of the 10 hour video infringes on their own work — even though the entire 10 hours is literally the same white noise.

None of the claims demanded that Tomczak?s video be taken down, but rather sought to ?monetize? it under YouTube?s ContentID offering, which allows copyright holders to leave up videos they claim are infringing but divert any advertising revenue to the copyright holder.

Somewhat incredibly, one copyright holder claims that Tomczak?s video infringes on two separate videos of their own, both of which also offer white noise.

One company involved ? Catapult Distribution ? say that Tomczak?s composition infringes on the copyrights of ?White Noise Sleep Therapy?, a client selling the title ?Majestic Ocean Waves?. It also manages to do the same for the company?s ?Soothing Baby Sleep? title. The other complaints come from Merlin Symphonic Distribution and Dig Dis for similar works .

It appears that all of the claims were automated claims, using various services that scan videos for similarities. However, it does not appear that any of those services first check if the originating videos actually involve a valid copyright in the first place. Instead, they often are based on an entire account, and just search for any similar videos, whether or not there is a valid copyright.

Decisions to be made by YouTube:

  • Is white noise even covered by copyright?
  • Should the platform allow users to claim the monetization rights on other similar videos in which there is no valid copyright?
  • If there are multiple copyright claims (and monetization claims) on the same video, how is it determined who has the rights and who gets to monetize?
  • Should automated systems be allowed to make copyright claims without any regard to actual copyright status?

Questions and policy implications to consider:

  • If copyright laws and policies are built on the assumption that every piece of content is covered by copyright, how should internet websites deal with situations in which there does not appear to be a valid copyright?
  • What are the long term implications of automated systems that do not involve any actual lawyers or experts reviewing either copyright takedown or monetization requests?

Resolution: Tomczak seemed to find the situation more amusing than anything else and noted that he?d received a few similar notices in the past. He expected that after contesting these claims, YouTube would likely drop them:

?In any of the cases where I think a given claim would be an issue, I would dispute it by saying I could either prove that I have made the work, have the original materials that generated the work, or could show enough of the components included in the work to prove originality. This has always been successful for me and I hope it will be in this case as well.?

Indeed, a few days after he contested the claims (and those claims received widespread press attention), YouTube did release all of the claims on the white noise video. Tomczak has separately argued that this case — even with the final outcome — suggests that parts of the system need to change.

“Hopefully cases like these with the white noise, which shows how sort of broken their copyright system is, can shed some light on it or get YouTube to think about changing their system,” he said.

Originally posted on the Trust & Safety Foundation website.

Filed Under: , , , ,
Companies: youtube

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Content Moderation Case Studies: Copyright Claims On White Noise (2018)”

Subscribe: RSS Leave a comment
43 Comments
This comment has been deemed insightful by the community.
Anonymous Coward says:

Is white noise even covered by copyright?

Is white noise even covered by copyright?

Compendium of U.S. Copyright Office Practices (3rd ed.)
Chapter 300 – Copyrightable Authorship: What Can Be Registered

313.2 Works That Lack Human Authorship

[T]he Office will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author.

Examples:

• A claim based on a mechanical weaving process that randomly produces irregular shapes in the fabric without any discernible pattern.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Re: Is white noise even covered by copyright?

… would have to be governed by Australian copyright laws…

Not in the United States. A foreign copyright in the United States is still governed by U.S. copyright law. Material that is not copyrightable within the U.S. is still not copyrightable in the U.S.

For instance, it doesn’t matter where Naruto is domiciled. The macaque does not have an enforceable copyright in the United States. Period.

ECA (profile) says:

Re: Re: Re: Is white noise even covered by copyright?

The problem here comes with the RIAA and MPAA spreading around the world.
It wasnt BAd in the past, when someone could jump to japan and get a Bunch of Anime CHEAP, or INDIA and Bollywood, and bring them to the USA, and we didnt worry about OTHERS CR. AND THEY DIDNT WORRY ABOUT OURS.
And how tech got spread around the world.

For some reasoning, this is whats happening NOW with international CR, and the USA corps Bitching that China is stealing Them.
China has been BUYING TONS of them from around the world, and Most USA corps with CR, have to use it/Supply it to have things made. Which means China gets a copy. What happens after that Point is interesting.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Is white noise even covered by copyright?

Irrelevant question.

Copyright trolls only care about money and control. In this case money. They want profits, and because they can leverage ContentID to send profits their way for nothing, they do so. It doesn’t matter that the content isn’t copyrightable. What matters is that the ContentID system will gladly give them money if they demand it. So long as the benefit of them doing this outweighs the penalties, they will write off the penalties as the cost of doing business.

This comment has been flagged by the community. Click here to show it.

Anonymous Coward says:

Re: Signal processing police calls.

From the (linked above) TorrentFreak article:

“… The video was created by generating a noise waveform of 10 hours length using the freeware software Audacity and the built-in noise generator. The resulting 10-hour audio file was then imported into ScreenFlow, where the text was added and then rendered as one 10-hour video file,” [Sebastian Tomczak] explains.

Anonymous Coward says:

Re: Signal processing police calls.

You are correct in that two samples of random noise will not be literally the same. Use of the word literally here is the classic definition that does not include figuratively.

If one were to look at the characteristics of each sample, one could make the claim that they were characteristically the same. They used the same pseudo random number generator and the same application with the same inputs, etc.

It’s not the same as being the same – lol

Anonymous Coward says:

Re: Re: Signal processing police calls.

They used the same pseudo random number generator

If the run (recording) time is long enough, those generators repeat their output; as they do if started with the same initial values. I suspect 10 hours is long enough for several repeats of the generated sequence.

Anonymous Coward says:

Re: Re: Re: Signal processing police calls.

Actually, repeating over a 10 hour period is unlikely. Assuming a 32 bit PRNG, we have a period of 4,294,967,296 cycles. Assuming CDROM quality of 44,100 samples per second, that would take 97,391.55 seconds before repeating. Or about 27 hours. If a larger PRNG was used, the the period would be larger. For instance, the period of a 64 bit generator would be over 13 million years.

Anonymous Coward says:

Re: Re: Re:2 Signal processing police calls.

Assuming a 32 bit PRNG…

The glibc sources are, of course, widely available. Here are current sources for:

From the comments on lines 67-69 of random.c (which are repeated on lines 68-70 of random_r.c):

By default, the package runs with 128 bytes of state information and generates far better random numbers than a linear congruential generator.

Do browse the source to see the rest of that comment, including the discussion of the period of the generator.

Anonymous Coward says:

Re: Re: Re:3 Signal processing police calls.

The glibc sources are, of course, widely available.

Although, looking more carefully at Sebastian Tomczak’s explanation, and particularly with regards to his use of ScreenFlow, it appears likely that he was running that software on a Mac platform. Obviously, that does not necessarily mean that he ran Audacity on Mac. But it tends towards that guess.

Anonymous Coward says:

Re: Re: Re:3 Signal processing police calls.

And the point is?

Given the sources you’ve indicated, the PRNG used has far more than 32 bits of state and therefore a period far exceeding the rather minimal period stated a few posts back. Using the default if 128 bytes and assuming one ling word used for management data, the period would be on the order of 2^960, or about 10^289. Call it about 10^277 years. And that’s the low end. If those 64 bits of management data were used to increase the period, it has an upper limit of 2^1024, or 10^308, or about 10^296 years as the upper limit. In any case, 32 bits of state gives a period well beyond 10 hours and increasing the state size just increases the period to ludicrous durations.

Anonymous Coward says:

Re: Re: Re:4 Signal processing police calls.

Given the sources…

Continuing to quote from the comment in random_r.c, now from lines 85-86:

The total period of the generator is approximately deg*(2**deg – 1)

Then looking down at the actual code on lines 121-2, it looks to me like a default of 128 bytes of state corresponds to a degree 31 polynomial(*). Plugging that back into the formula given in the comment would be 31*(2**31 – 1) or roughly 2**5 * 2**31 = 2**36.

(*) Clearly, in a LP64 model, 128 bytes isn’t going to store 31 longs sucessfully, although it will store 31 int32_t’s. The comment about “longs” just seems to differ from the actual code in that model. Presumably the comment dates from an era when (I)LP32 was the prevalent model.

Anonymous Coward says:

Re: Re: Re: Signal processing police calls.

… those generators repeat their output

A very quick browse through the current Audacity source turns up line 127 in noise.cpp:
buffer[i] = mAmp * ((rand() / div) - 1.0f);
At first glance, this looks to me like the white noise generator uses the system rand() function.

But note that this is the first time I’ve ever looked at the Audacity source, and I may be mis-reading it horribly. And, even if I’m reading the current source correctly, this is certainly a later version than was used a couple years ago.

Anonymous Coward says:

Re: Re: Re: Signal processing police calls.

Modern pseudo random number generation includes the possible use of noise encountered in the hardware to augment the sequence in addition to other sources. The specific item used here may allow such options, I do not know, but the claim was not limited to this instance.

Anonymous Coward says:

Re: Re: Re:2 Signal processing police calls.

… noise encountered in the hardware to augment the sequence…

When using the rand() function from the standard library, the C standard requires the sequence to be repeatable.

Since the C standard documents are not freely available, here’s a link for the final C11 committee draft. See “7.22.2 Pseudo-random sequence generation functions”. Or, more conveniently in this case, see the POSIX standard, since in this respect, POSIX is “aligned with the ISO C standard.”

The srand() function uses the argument as a seed for a new sequence of pseudo-random numbers to be returned by subsequent calls to rand(). If srand() is then called with the same seed value, the sequence of pseudo-random numbers shall be repeated. If rand() is called before any calls to srand() are made, the same sequence shall be generated as when srand() is first called with a seed value of 1.

(Emphasis.)

I do recognize that your “claim” is much more hand-wavy and mushy about prng’s in general, than intended to address this specific instance.

But in this specific instance, where Audacity is using rand(), while non-repeatable hardware-generated randomness might be used to seed the prng, the standards practically prohibit hardware noise augmentation of the resulting sequence.

Anonymous Coward says:

Re: Re: Re:3 Signal processing police calls.

For what it’s worth ….

OP comment to which I replied:
"If there are substantial "literally the same" parts, we are not talking about white noise."

Sorry for the hand wavy and mushy text as you put it, but I was not addressing the specific usage in this case because the OP did not address the specific usage in this case.

Subsequent comments pointed out that depreciated versions of pseudo random number generation repeat over time. I pointed out this particular problem has been addressed.

Anonymous Coward says:

Re: Re: Re:4 Signal processing police calls.

… versions of pseudo random number generation repeat over time.

It’s probably worth clearly distinguishing between repeatable sequences and repeating cycles within seqences.

  • A repeatable sequence may conceivably be infinite in length without any internal cycles. For example, the sequence of digits or bits of π can be generated and regenerated repeatably, out to any length, even though that potentially-infinite sequence contains no cycles within it.
  • On the other hand, again for example, a sequence generated by ordering a finite set of elements necessarily has fixed length. If that sequence is expanded out to a greater length by repeating it as a sub-sequence of the expanded sequence, then by construction, that expanded sequence contains repeating cycles.

In many applications, neither of those attributes are necessarily “problems”.

Especially for simulation applications, the capability to repeat a sequence in a later computation run has been considered important enough to require it in standards documents.

In other applications, though, particularly cryptography, totally different qualities may indeed be more important. In cryptographic applications, usually it’s most important that the sequence is, in some sense, “unpredictable”. Often, the desired cryptographic qualities of random numbers may have formal definitions that are ill-fit to any pseudo-random number generation process. There are quite a few cryptographic “problems”.

For a white noise application, I’d think that the most important quality would be that the sequence is fairly gaussian-distributed in both time and frequency domains (within bandwidth limitations). Although, I do suspect that some rather non-gaussian distributions may sound “whitish” enough for casual listening.

Anonymous Coward says:

Re: Re: Re:5 Signal processing police calls.

It is also worth pointing out that random number generation is not limited to software based systems.

"In computing, a hardware random number generator (HRNG) or true random number generator (TRNG) is a device that generates random numbers from a physical process, rather than by means of an algorithm. Such devices are often based on microscopic phenomena that generate low-level, statistically random "noise" signals, such as thermal noise, the photoelectric effect, involving a beam splitter, and other quantum phenomena. These stochastic processes are, in theory, completely unpredictable, and the theory’s assertions of unpredictability are subject to experimental test. This is in contrast to the paradigm of pseudo-random number generation commonly implemented in computer programs. "

Hardware random number generator

This comment has been deemed insightful by the community.
That One Guy (profile) says:

'You know what, looks like we were mistaken...'

You could solve a massive amount of copyright related issues and abuses if you simply made the law equal, such that the penalties for issuing bogus claims were treated and punished just as harshly as claims of infringement.

This would not only make people much more careful regarding what they claimed was infringement but by making the penalties equal it would also provide an incentive to bring penalties for infringement down to sane levels, because someone sending out claims would always have to face the possibility that they might be on the receiving end of the penalty.

This comment has been deemed insightful by the community.
Rekrul says:

A couple years ago, I filmed my back window, at night in the summer just to record how loud the insects were. I uploaded it to YouTube and was immediately informed that the audio had been muted due to a copyright claim.

There was literally no other sound other than the insects and a slight hum from my computer’s fans.

Anonymous Coward says:

Re: Re: Re: Re:

Yes, but the suggested incorrection of "everything is copyrighted" to "everything is copywrote" would also replace the correct past participle form with a simple past tense form.

Apparently everything is now copyrighted (copywrote?)

For "right", the past tense form and the past participle form are both "righted", but for "write", one is "wrote" and the other is "written". Right?

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...