Awesomeness: Millions Of Public Domain Images Being Put Online

from the go-use-them dept

Here’s some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He’s been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.


To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.

As part of the process, the software recognised which parts of a page were pictures in order to discard them.

Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.

Already over 2.6 million images have been posted to Flickr in this manner — all completely in the public domain. From a historical perspective, the images are fascinating — and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:




There seem to be lots of images of musical scores, sewing machines, individual portraits, building and machinery. Each Flickr page associated with the image gives information about the book, including the text before and after the image, which is pretty cool. The one (only slightly) annoying thing is that on the Flickr pages, rather than saying these are public domain images, it says that there are “no known copyright restrictions.” While that’s accurate, and a potentially reasonable hedge against some miraculous finding that says these images are covered by copyright, it’s really too bad that it’s so problematic to come out and say “this is in the public domain, do whatever the hell you want with it.”

Filed Under: , , , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Awesomeness: Millions Of Public Domain Images Being Put Online”

Subscribe: RSS Leave a comment
15 Comments
That One Guysays:

Come one, come all, and place your bets!

While awesome for archival purposes if nothing else, I give it a week at most before some bot starts tagging and demanding pictures be removed and claiming that at least some of them are still under copyright, followed shortly thereafter(assuming Flickr doesn’t just pull them immediately), by the ones running the bot doubling down and insisting that yes, they do indeed own the rights to the images, and will be filing a lawsuit if they aren’t taken down immediately.

Because when there’s absolutely no penalty for copyfraud, well, why not try to claim everything you can, on the off chance that at least some of the claims will stick and/or the target will pay up?

bobsays:

Lather, rinse, repeat

It’s interesting that Leetaru has taken on images. He is a major force behind GDELT, the Global Database of Events, Language, and Tone which uses automated techniques to mine news sources for event summaries (among other things).

Unlike GDELT, here all the source material is demonstrably public domain, so publishing the image extracts (in whatever form) should not cause any hiccoughs.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...
Older Stuff
13:40 It's Great That Winnie The Pooh Is In The Public Domain; But He Should Have Been Free In 1982 (Or Earlier) (35)
12:06 Norton 360 Now Comes With Crypto Mining Capabilities And Sketchy Removal Process (28)
10:45 Chinese Government Dragnet Now Folding In American Social Media Platforms To Silence Dissent (14)
10:40 Daily Deal: The 2022 Ultimate Cybersecurity Analyst Preparation Bundle (0)
09:29 A Fight Between Facebook And The British Medical Journal Highlights The Difficulty Of Moderating 'Medical Misinformation' (9)
06:29 Court Ruling Paves The Way For Better, More Reliable Wi-Fi (4)
20:12 Eighth Circuit (Again) Says There's Nothing Wrong With Detaining Innocent Minors At Gunpoint (15)
15:48 China's Regulatory War On Its Gaming Industry Racks Up 14k Casualties (10)
13:31 Chinese Government Fines Local Car Dealerships For Surveilling While Not Being The Government (5)
12:08 Eric Clapton Pretends To Regret The Decision To Sue Random German Woman Who Listed A Bootleg Of One Of His CDs On Ebay (29)
10:44 ICE Is So Toxic That The DHS's Investigative Wing Is Asking To Be Completely Separated From It (29)
10:39 Daily Deal: The 2022 Complete Raspberry Pi And Arduino Developer Bundle (0)
09:31 Google Blocked An Article About Police From The Intercept... Because The Title Included A Phrase That Was Also A Movie Title (24)
06:22 Wireless Carriers Balk At FAA Demand For 5G Deployment Delays Amid Shaky Safety Concerns (16)
19:53 Tenth Circuit Denies Qualified Immunity To Social Worker Who Fabricated A Mother's Confession Of Child Abuse (35)
15:39 Sci-Hub's Creator Thinks Academic Publishers, Not Her Site, Are The Real Threat To Science, And Says: 'Any Law Against Knowledge Is Fundamentally Unjust' (34)
13:32 Federal Court Tells Proud Boys Defendants That Raiding The Capitol Building Isn't Covered By The First Amendment (25)
12:14 US Courts Realizing They Have A Judge Alan Albright Sized Problem In Waco (17)
10:44 Boston Police Department Used Forfeiture Funds To Hide Purchase Of Surveillance Tech From City Reps (16)
10:39 Daily Deal: The Ultimate Microsoft Excel Training Bundle (0)
09:20 NY Senator Proposes Ridiculously Unconstitutional Social Media Law That Is The Mirror Opposite Of Equally Unconstitutional Laws In Florida & Texas (25)
06:12 Telecom Monopolies Are Exploiting Crappy U.S. Broadband Maps To Block Community Broadband Grant Requests (7)
12:00 Funniest/Most Insightful Comments Of 2021 At Techdirt (17)
10:00 Gaming Like It's 1926: Join The Fourth Annual Public Domain Game Jam (6)
09:00 New Year's Message: The Arc Of The Moral Universe Is A Twisty Path (33)
19:39 DHS, ICE Begin Body Camera Pilot Program With Surprisingly Good Policies In Place (7)
15:29 Remembering Techdirt Contributors Sherwin And Elliot (1)
13:32 DC Metro PD's Powerful Review Panel Keeps Giving Bad Cops Their Jobs Back (6)
12:11 Missouri Governor Still Expects Journalists To Be Prosecuted For Showing How His Admin Leaked Teacher Social Security Numbers (39)
10:48 Oversight Board Overturning Instagram Takedown Of Ayahuasca Post Demonstrates The Impossibility Of Content Moderation (10)
More arrow
This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it