Introduction
========================================


This project covers a range of approaches to persona-based chatbots. Various parts of the code base are provided to enable an understanding of the project's topics. The final Rachelbot is API and app are contained the respective directories model-api and chatbot-app. The provided model is one of many that were created as part of the process. Further development of the project post-submission is expected. 

Here is an overview of the directories:

* adem-scoring: this contains an operationalization of the pre-trained model provided by Michael Noseworthy implementing the approach outlined in "Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses" by
Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau (2018). Michael's code runs on Theanos, and so has a very particular setup that I've created using docker. 

* chatbot-app: this app is a Flask application for interacting with the trained Rachelbot. It needs to be run with model-api.

* data-cleaning: contained in this directory is all code related to the processing of the Rachel data for visualization, modeling and analysis.

* model-api: a simple Flask API that enables communication to and from the Rachelbot model.

* model-search: a review of the most comprehensive architecture search conducted during the project. It is built using the library ludwig. This is of special interest to deep learning researchers.

* references: this contains the original code by Thomas Wolf and Hugging Face for importing a pre-trained universal transformer for transfer learning (i.e. TransferTransfo). 

* visualization: LDA and wordcloud exploratory data analysis is contained in this folder.

Setup Environments
========================================


With the exception of the adem-scoring directory, all code was executed in a Windows 10 (Professional) environment. You will likely need to GPU with compute capacity 5.5+ in order to run the model-api, though the chatbot-app was specifically designed to run on a machine without it.

Main Environment
The main environment for this code is called 'transfer'. It is a Python 3.6 environment. The libraries in transfer-environment are contained in the file 'transfer-environment.yaml'. You will need to setup a conda environment with these library versions. Code in the following directories is executed in the main environment:
* chatbot-app
* data-cleaning
* model-api
* visualization

Model Search
The model search makes use of a library that imposes special requirements (ludwig). In order to conduct the model search, you will need to create a conda environment using the libraries contained in model-search-environment.yaml. Note you will want to run the following commands to ensure that you have downloaded spacy in the model-search environment:
conda install -c conda-forge spacy
python -m spacy download en

This applies only to this directory:
* model-search

ADEM Environment
The ADEM environment requires a Linux build and a very particular setup in order to generate scores. The scorer is a deep neural network with many parameters. Be forewarned that it can be 8 hours+ to score a dataset. To setup the docker container and score a model, you will need to do some work. To score a trained model, you'll need the following:

- Ensure that you have a dataframe with at least three columns -- the model input, the true label (what Rachel said) and the model's predictions
- Save this into a csv called prediction_df, a sample of which is provided in adem-scoring/adem-docker/dir-1/adem-scorer
-- conv_input is the name of the inputs to the trained model
-- output is the thing that Rachel said in the scripts (i.e. your target prediction)
-- prediction is what the trained model actually predicted
- Follow the detailed instructions in the adem-scoring-procedure.txt file.

Instructions
========================================


Starting the Rachelbot
- Ensure that you are in the transfer environment
- Ensure that ports 5300 and 5500 are available on your device
- CD into model-api and run the command 'python api.py', which will start the API (port 5500)
- CD into chatbot-app and run the command 'python apps.py', which will start the app (port 5300)
- In your browser, go to localhost:5300 and start chatting!

Visualizations and Data Cleaning
You can view these in simple notebooks. CD to the relevant directory and enter the command 'jupyter notebook'. The wordcloud visualization is the most interesting, which provides context about the language between rachel and her interlocutors. Warning: Phoebe's wordcloud may have some... special qualities.

The Model Search
This repository contains some of the model searching that was conducted as of the project. This was a grid search of architectural features (directionality, encoder type, search type, attention, pre-trained embedding and A large number of models were trained using the ludwig library. A model and architecture summary is provided in two CSV files:
* model-performance indicates how individual models performed
* architecture-performance indicates how a feature (like beam search) compared in the models that it was in

To reduce the total number of models, two tiers of search took place. You can visualize how each model did by going into the tier and model (i.e. t1-search/t1-trained/t1_gbb) and running the visualization command.

ADEM Scorer
Once you have a model that you'd like to score, you can use the ADEM scorer to evaluate it. This requires that you first create a csv file with the model inputs, true labels and model predictions. Use the template and instructions in the ADEM Environment. You will get a dataframe of scores back from which you can construct an average. A score of 5 on ADEM means a person would very likely see this as a human-generated response whereas a 1 means that human would likely regard it as machine-generated in a Turing test environment. The overall metric has a r2 of 0.422 to actual human predictions, which is a much better than other metrics like perplexity and BLEU.

Additional Work
========================================


There is additional work for the project, so don't be shy about reaching out to Alexander Orona.