Top 10 Open Dataset Resources on Github

The top open dataset repositories on Github include a variety of data, freely available for use by researchers, practitioners, and students alike.

Over the past several months we have had a look at a number of top Github repository collections, such as:

This post will be a bit different, in that we are looking at the top open dataset repositories that Github has to offer. The post was inspired by the Github Open Data Showcase, which is good, but which is not very large. Ideally, I would like to make a list of the top open datasets on Github, period; however, this gets tricky, since searching for "open data," or any variant of this search term, is going to lead to complications on a site set up with the explicit goal of sharing open source projects and their data.

I decided to take the offerings in this showcase which were not explicitly noted as being out of date and add in 3 additional strictly-dataset repos with the highest numbers of stars I could find from simple search, rank them all accordingly, and present them here. We have found at KDnuggets that datasets are one of the most sought-after pieces of the data science puzzle for many readers, and hopefully this fresh batch (at least, fresh from our perspective) is of use to some of our readers.

We are currently conducting our latest Annual KDnuggets Analytics Software Poll, and so the particular percentages from last year may change, but we know that open source tools have been used by 73% of data scientists in the past 12 months. While this number reflects software, and not data, it is easy to surmise that open data is a heavily-relied upon commodity in data science and related data-oriented disciplines for research, practice, and production alike, for myriad reasons.

So here they are, the open dataset repos with the highest number of stars as of the time of writing.

Brought to us by Xiaming (Sammy) Chen, this seems to be the undisputed leader of the open dataset collections available on Github. This curated list is organized by such topics as biology, sports, museums, and natural language, and appears to include several hundred datasets. Most are free, but there is a disclaimer at the top of the list that some are not.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Making AI software smarter by adding human feedback

1 Mar, 2018

On the surface, artificial intelligence voice assistants like Siri, Alexa, Cortana and Google Assistant seem smart, connected and somewhat human. …

Read more

3 keys to keep your data lake from becoming a data swamp

16 Jun, 2017

For years, buoyed by technologies like Apache Hadoop, organizations have been seeking to build data lakes — enterprise-wide data management …

Read more

Three Ways to Generate Profit With the Data You Already Have

19 Jul, 2017

Andrew Roman Wells is the CEO of Aspirenta, and Kathy Williams Chiang is VP, Business Insights, at Wunderman Data Management.  …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.