Harvard Opens Up Its Massive Caselaw Access Project

from the good-to-see dept

Almost exactly three years ago, we wrote about the launch of an ambitious project by Harvard Law School to scan all federal and state court cases and get them online (for free) in a machine readable format (not just PDFs!), with open APIs for anyone to use. And, earlier this week, case.law officially launched, with 6.4 million cases, some going back as far as 1658. There are still some limitations -- some placed on the project by its funding partner, Ravel, which was acquired by LexisNexis last year (though, the structure of the deal will mean some of these restrictions will likely decrease over time).

Also, the focus right now is really on providing this setup as a tool for others to build on, rather than as a straight up interface for anyone to use. As it stands, you can either access data via the site's API, or by doing bulk downloads. Of course, the bulk downloads are, unfortunately, part of what's limited by the Ravel/LexisNexis data. Bulk downloads are available for cases in Illinois and Arkansas, but that's only because both of those states already make cases available online. Still, even with the Ravel/LexisNexis limitation, individual users can download up to 500 cases per day.

The real question is what will others build with the API. The site has launched with four sample applications that are all pretty cool.

  • H2O is a tool that law professors can use to easily create casebooks for students in various areas of law. Anything published on H2O gets a Creative Commons license and can then be shared widely. I wonder if professors like Eric Goldman, who offers an Internet Law Casebook, or James Grimmelmann, who has a different Internet Law Casebook, will eventually port them over to a platform like H2O.
  • A wordcloud app that currently shows the "most used words" in California cases in various years. Here, for example, are the word clouds in California cases from 1871... and 2012. See if you can tell which one's which.


  • Caselaw Limericks that appears to randomly generate what it believes is a rhyming limerick from the case law. Here's what I got:

Her son Julius is a confirmed thief.
He did not turn over a new leaf.
The vessel, not.
the parking lot.
Respondent concedes this in its brief.

    The quality overall is... a bit mixed. But it's fun.
  • And, finally, in time for Halloween, Witchcraft in Law, which totals up cases that cite "witchcraft" by state.

Hopefully this inspires a lot more on the development side as well.

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: caselaw, caselaw access project, legal data, public info, public records, transparency
Companies: harvard, lexisnexis, ravel

Reader Comments

Subscribe: RSS

View by: Thread

  1. icon
    keithzg (profile), 31 Oct 2018 @ 4:16pm

    Re-publishing and archiving

    In terms of PACER, I think the main effort has been https://free.law/recap/ ?

    And yeah, I was thinking, someone should definitely coordinate this. I'd certainly run a CRON job on one of my systems to pull down another 500 downloads per day, orchestrated to avoid duplication of effort by some central server like how bitcoin mining pools work.

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here

Subscribe to the Techdirt Daily newsletter

Comment Options:

  • Use markdown. Use plain text.
  • Remember name/email/url (set a cookie)

Follow Techdirt
Special Affiliate Offer

Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat

Warning: include(/home/beta6/deploy/itasca_20201215-3691-c395/includes/right_column/rc_promo_discord_chat.inc): failed to open stream: No such file or directory in /home/beta6/deploy/itasca_20201215-3691-c395/includes/right_column/rc_module_promo.inc on line 8

Warning: include(): Failed opening '/home/beta6/deploy/itasca_20201215-3691-c395/includes/right_column/rc_promo_discord_chat.inc' for inclusion (include_path='.:/usr/share/pear:/home/beta6/deploy/itasca_20201215-3691-c395:/home/beta6/deploy/itasca_20201215-3691-c395/..') in /home/beta6/deploy/itasca_20201215-3691-c395/includes/right_column/rc_module_promo.inc on line 8
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it

Email This

This feature is only available to registered users. Register or sign in to use it.