Library Digital Collections

Data from: Exploration and Explanation in Computational Notebooks

View Collection Items

Collections »

Data from: Exploration and Explanation in Computational Notebooks

About this collection

Extent

1 digital object.

Cite This Work

Rule, adam; tabard, aurélien; hollan, james d. (2018). data from: exploration and explanation in computational notebooks. uc san diego library digital collections. https://doi.org/10.6075/j0jw8c39

Description

In july 2017, our team queried, downloaded, and analyzed approximately 1.25 million jupyter notebooks in public repositories on github. by our calculation this was about 95% of all jupyter notebooks publicly available on github at the time. this dataset includes:
~1.25 million jupyter notebooks
metadata about each notebook
metadata about each of the nearly 200,000 public repositories that contained a jupyter notebook
top level readme files for nearly 150,000 repositories containing a jupyter notebook

in addition to this core data, these data include:
a smaller, starter dataset with 1000 randomly selected repositories containing ~6000 notebooks
csv files summarizing and indexing the notebooks, repositories, and readmes
log files documenting when each file was downloaded
scripts for our initial analysis of the dataset

Date Collected
  • July 2017
Date Issued
  • 2018
Creators
Funding

This research was funded by nsf grants #1319829 and #1735234 as well as nlm grant #t15lm011271.

Topics

Formats

View formats within this collection

Language
  • English
Related Resources