Harvard Opens Up Its Massive Caselaw Access Project

from the good-to-see dept

Almost exactly three years ago, we wrote about the launch of an ambitious project by Harvard Law School to scan all federal and state court cases and get them online (for free) in a machine readable format (not just PDFs!), with open APIs for anyone to use. And, earlier this week, case.law officially launched, with 6.4 million cases, some going back as far as 1658. There are still some limitations -- some placed on the project by its funding partner, Ravel, which was acquired by LexisNexis last year (though, the structure of the deal will mean some of these restrictions will likely decrease over time).

Also, the focus right now is really on providing this setup as a tool for others to build on, rather than as a straight up interface for anyone to use. As it stands, you can either access data via the site's API, or by doing bulk downloads. Of course, the bulk downloads are, unfortunately, part of what's limited by the Ravel/LexisNexis data. Bulk downloads are available for cases in Illinois and Arkansas, but that's only because both of those states already make cases available online. Still, even with the Ravel/LexisNexis limitation, individual users can download up to 500 cases per day.

The real question is what will others build with the API. The site has launched with four sample applications that are all pretty cool.

  • H2O is a tool that law professors can use to easily create casebooks for students in various areas of law. Anything published on H2O gets a Creative Commons license and can then be shared widely. I wonder if professors like Eric Goldman, who offers an Internet Law Casebook, or James Grimmelmann, who has a different Internet Law Casebook, will eventually port them over to a platform like H2O.
  • A wordcloud app that currently shows the "most used words" in California cases in various years. Here, for example, are the word clouds in California cases from 1871... and 2012. See if you can tell which one's which.

   

  • Caselaw Limericks that appears to randomly generate what it believes is a rhyming limerick from the case law. Here's what I got:

Her son Julius is a confirmed thief.
He did not turn over a new leaf.
The vessel, not.
the parking lot.
Respondent concedes this in its brief.

    The quality overall is... a bit mixed. But it's fun.
  • And, finally, in time for Halloween, Witchcraft in Law, which totals up cases that cite "witchcraft" by state.

Hopefully this inspires a lot more on the development side as well.

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: caselaw, caselaw access project, legal data, public info, public records, transparency
Companies: harvard, lexisnexis, ravel


Reader Comments

Subscribe: RSS

View by: Thread


  1. identicon
    Anonymous Coward, 31 Oct 2018 @ 2:51pm

    Re: Non-Free

    6b: By submitting your User Content to the H2O Services, you also agree to allow H2O to license your Content under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 License

    That only means that people other than the copyright holder are not allowed to use the data in a book, if the obtain it under that license. The contributor is free to license or sell their own works as part of a commercial enterprise. Similarly, anybody with a commercial enterprise in mind is free to approach the copyright holder to obtain a license that permits commercial use, they just have to live with and compete with the creative commons version.


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here



Subscribe to the Techdirt Daily newsletter




Comment Options:

  • Use markdown. Use plain text.
  • Remember name/email/url (set a cookie)

Follow Techdirt
Insider Shop - Show Your Support!

Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Recent Stories
.

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.