Library Digital Collections

Data from: Exploration and Explanation in Computational Notebooks

View Collection Items

Collections »

Data from: Exploration and Explanation in Computational Notebooks

About this collection

Extent

1 digital object.

Cite This Work

Rule, Adam; Tabard, Aurélien; Hollan, James D. (2018). Data from: Exploration and Explanation in Computational Notebooks. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0JW8C39

Description

In July 2017, our team queried, downloaded, and analyzed approximately 1.25 million Jupyter Notebooks in public repositories on GitHub. By our calculation this was about 95% of all Jupyter Notebooks publicly available on GitHub at the time. This dataset includes:
~1.25 million Jupyter Notebooks
Metadata about each notebook
Metadata about each of the nearly 200,000 public repositories that contained a Jupyter Notebook
Top level README files for nearly 150,000 repositories containing a Jupyter Notebook

In addition to this core data, these data include:
A smaller, starter dataset with 1000 randomly selected repositories containing ~6000 notebooks
CSV files summarizing and indexing the notebooks, repositories, and READMEs
Log files documenting when each file was downloaded
Scripts for our initial analysis of the dataset

Date Collected
  • July 2017
Date Issued
  • 2018
Creators
Funding

This research was funded by NSF grants #1319829 and #1735234 as well as NLM grant #T15LM011271.

Topics

Format

View formats within this collection

Language
  • English
Related Resources