# UCSD MAS-DSE 260
## Mapping the U.S. Non-Profit Space : Classy Data Analysis Project
Advisors: Ben Cipollini and Ilkay Altintas

### CONFIGURATION
```
make build
make start
make stop
```

### MODULES
In broswer window go to:  `localhost:8888`

### APP
In broswer window go to:  `localhost:80` or `localhost`

### INTRO
The social sector is typically viewed in terms of nonprofit
organizations and the cause categories they belong to. It’s
clear, however, that while younger generations are active in
social causes, they think more in terms of current events and
social causes organizations - so much so, that new donor
churn now peaks at over 80%.

It is not clear, however, how to lay out a common “social
space”, where the organizations that drive social change and
potential donors could connect, find organization and cause
recommendations, and where discovery of new causes - and
news events within causes - could be facilitated.

In this project, we build this social space from the ground up.
We use a combination of government IRS form 990 (returns
for nonprofits) data along with external textual information
(i.e. social media) to create a robust semantic space.

### TEAM
Budget Manager: Jeet Nagda
Management of AWS funds and cloud-related activity
Project Manager: Howard Tai
High-level project planning and assignment of responsibilities
Project Coordinator: Erin Hansen
Stakeholder correspondence and record-keeping during meetings
Report Manager: Carlos Pimentel
Tracking and coordinating report deliverables
Record Keeper: Juan Reyes
Management of Github repo and Docker environment

### ARCHITECTURE
Our data pipeline was divided into different modules to accomplish discrete tasks. In our final product, we used Python and Docker throughout as our common language and platform. The purpose of our first module (Form Data Processor) was to gather and fetch information from the AWS repository using the IRSX tool for each of the organizations in a giant manifest. We had an additional experimental module (Website Data Processor) which additionally scraped the HTML text of the organization’s website listed on the form 990. The XML payload was parsed and stored as a document in a MongoDB instance. We then had a clustering module (Cluster Processor) which read data samples from the MongoDB instance to create three labels or cluster IDs for each organization: one for each axis of comparison. These labels were loaded back into the MongoDb document for persistence. Finally our last module (App Web Server) reads from the database to create visualizations for high level queries, or a given input organization


### DOI
Identifier: doi:10.6075/J079431XIdentifier
https://doi.org/10.6075/J079431X

Creators:
* Tai, Howard;
* Hansen, Erin;
* Nagda, Jeet;
* Pimentel, Carlos;
* Reyes, Juan

```
Title:	Classy Data Analysis: Mapping the U.S. Non-Profit Space
Publisher:	UC San Diego Library Digital Collections
Publication year:	2019
Resource type:	Dataset/Dataset
```