As our world continues to become more data-focused, more and more datasets are at our disposal to analyze. This provides a lot of opportunities to explore and build up data science skills. Below you’ll find a sampling of fun datasets to get you started.
A great starting place is Kaggle.com. Kaggle has a LOT of datasets available for download and exploration. In addition to these datasets, they also offer forums for discussion around the data, and competitions often featuring prizes.
- Spotify Daily Top 200 Charts for over 3 years
- Netflix Original Films & IMDB Scores
- Heart Attack Analysis
- Daily Index Prices for Multiple Stock Exchanges
Data.gov has over 300,000 datasets available for download and use. They provide an online repository of policies, tools, case studies, and other resources to support data governance, management, exchange, and use throughout the federal government.
Amazon Web Services (AWS)
AWS is more known for their cloud-computing services, but they also offer a variety of datasets hosted through their cloud. This provides an easy way to try out AWS services like SageMaker, Glue, Rekognition, Lex, Comprehend, Polly, etc.
- Corpus of web crawl data from over 50 billion web pages
- Japanese dictionaries and word embeddings for use with Natural Language Processing (NLP)
- eBird Status and Trends
You never know what you’re going to find on r/datasets/. Reddit provides an opportunity to see what others are looking for and what they have done with a particular dataset.
- Treatment database with patients, medications, and treatments
- Sentiment analysis
- fivethirtyeight datasets
Tidy Tuesday is a project from the R4DS Online Learning Community and R for Data Science textbook. Each Tuesday a raw dataset is posted along with a chart or article related to the dataset and you are tasked with exploring the data. While their focus is on R, there’s no reason other programming languages could be used. They also have archives of past weeks’ datasets.
Don’t Delay, Explore Today!
As you can see, there are a variety of datasets at your fingertips just waiting to be explored! Don’t let the stress of not knowing where to start keep you from digging into some data sources. Happy exploring!