Text analysis is more popular than ever, but finding ready-to-analyze text datasets, especially for news coverage, can be tricky.
In this workshop, presenters will provide an overview of the TDM Studio platform and explain how to search for content in newspaper databases, create a text corpus and import a clean and ready-for-use formatted corpus into a Jupyter Notebook for further analysis.
Presenters will also demonstrate how to analyze this text content with Python, focusing on ngrams, sentiment analysis and data visualization. During this workshop, attendees can follow along using the same dataset or create and analyze their own text corpus. No experience is required. Registration is required and limited to UC San Diego affiliates.
What is TDM Studio?
More than 200 library-licensed ProQuest content products, including government, archival, dissertation and news databases, are available for analysis with R or Python through ProQuest TDM Studio. News databases available for analysis in TDM Studio include historical full text (and content up to the present) of major dailies including the New York Times, Washington Post, Wall Street Journal and others.
- Stephanie Labou, Data Science Librarian, UC San Diego Library
- Brandon Williams, TDM Studio Technical Consultant
- John Dillon, Manager, Product Management for TDM Studio