Here are 13 resources on Machine Learning data sets.
Landsat 8 data is available for anyone to use via Amazon S3. All Landsat 8 scenes from 2015 are available along with a selection of cloud-free scenes from 2013 and 2014. All new Landsat 8 scenes are made available each day, often within hours of production. MathWorks has created a freely-downloadable tool for accessing, processing, and visualizing Landsat on AWS data in MATLAB. With this tool, you can create a map display of scene locations with markers that show each scene’s metadata.
NASA NEX is a collaboration and analytical platform that combines state-of-the-art supercomputing, Earth system modeling, workflow management and NASA remote-sensing data. Through NEX, users can explore and analyze large Earth science data sets, run and share modeling algorithms, collaborate on new or existing projects and exchange workflows and results within and among other science communities.
The 1000 Genomes Project is an international collaboration which has established the most detailed catalogue of human genetic variation, including SNPs, structural variants, and their haplotype context. The final phase of the project sequenced more than 2500 individuals from 26 different populations around the world and produced an integrated set of phased haplotypes with more than 80 million variants for these individuals. The Amazon mirror contains the complete data set from the project and the data can be found at: s3.amazonaws.com/1000genomes.
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.