Text and Data Mining Resources

The Library provides access to a variety of text and data mining resources for faculty and students. Whether you’re taking your first steps into text mining, or want access to thousands of documents for your research, the Library has resources that can help!

New resource! Nexis Data Lab is a new text and data mining platform for users to work with LexisNexis data. The platform provides a Jupyter Notebook environment, supporting both R and Python code. Nexis Data Lab is strong on news content (newspapers, newsletters, wires, transcripts, and related press) and Company & Financial data will be available through the platform by the end of this month.

UC San Diego currently has a small number of Nexis Data Lab accounts for use by faculty and students. For more information or to request access to an account, contact Data Science Librarian Stephanie Labou.

ProQuest’s TDM Studio opens up millions of newspaper articles, dissertations, and primary sources to text and data mining. It provides both Python and R interfaces alongside push-button data visualizations for research as well as teaching and learning. Request an account directly to get started with TDM Studio today.

With Gale Digital Scholar Lab, users can explore UC San Diego holdings from Gale Primary Sources using digital humanities tools – no programming knowledge required! Rediscover and interpret the past through analysis and visualization of historical texts, including newspapers, books, archival collections, and more. Users will need to create a personal Digital Scholar Lab account online to begin selecting and analyzing materials with this platform.

Constellate, the new text analysis platform from ITHAKA, is a platform for teaching, learning, and performing text analysis. With Constellate, users can build datasets from JSTOR or Portico content, receive some instant visualizations of data, and take advantage of the Constellate Python tutorials to learn text analysis methods. To use this platform, visit the Constellate website, select “log in through your institution”, and log in with your UC San Diego SSO credentials.

UC San Diego affiliates are also able to attend Constellate text analysis classes online:

  • Python Basics is a four day, one week class running the week of April 4 to help users get started writing Python code 
  • Tokenize your own Texts is a three day, one week class running the week of April 11 to introduce users to processes and methods for tokenizing texts and creating a dataset that is compatible with existing Constellate Notebooks