Data Science: Introduction

A dataset (also spelled ‘data set’) is a collection of raw statistics and information generated by a research study. Datasets produced by government agencies or non-profit organizations can usually be downloaded free of charge. However, datasets developed by for-profit companies may be available for a fee.

Most datasets can be located by identifying the agency or organization that focuses on a specific research area of interest. For example, if you are interested in learning about public opinion on social issues, Pew Research Center would be a good place to look. For data about population, the U.S. government’s Population Estimates Program from American Factfinder would be a good source.

An “open data” philosophy is becoming more common among governments and business organizations around the world, with the belief that data should be freely accessible. Open data efforts have been led by both the government and non-government organizations such as the Open Knowledge Foundation. Learn more by exploring The Open Data Handbook. There is also a growing trend in what is being called “Big Data”, where extremely large amounts of data are analyzed for new and interesting perspectives, and data visualization, which is helping to drive the availability and accessibility of datasets and statistics.

For information about citing data sets, please see this post from the APA Style Blog: How to Cite a Data Set in APA Style.

Google Dataset Search is a search engine across metadata for millions of datasets in thousands of repositories across the Web. Similar to how Google Scholar works, Google Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.

Dataset Search can be useful to a broad audience, whether you're looking for scientific data, government data, or data provided by news organizations. Simply enter what you are looking for, and the results will guide you to the published dataset on the repository provider’s site.

Resources: Primary

Resource Description
ACM Digital Library Full-text of the Association for Computing Machinery (ACM) publications (doesn't include books). View Full Description
APA PsycInfo A comprehensive database for the field of psychology and psychological aspects of related disciplines. View Full Description
IEEE Xplore Digital Library Electrical and computer engineering journals, conference proceedings and standards from IEEE/IET. View Full Description
Web of Science Search all the databases on the Web of Science platform. View Full Description

Resources: Secondary

Resource Description
MathSciNet Reviews, with abstracts, to the world's literature in mathematics and related areas. View Full Description