# IOT Fever  Analysis Report

TemPredict launched in March 2020 as a collaboration between UCSF, UCSD, and Finnish wearable company Oura. The objective is to identify physiological signals from the wearable and provide early alerts for symptoms and diagnoses for COVID infection. The study's first phase ended on Nov 30th, 2020, with ~65,000 active participants. These participants shared data from their Oura ring from January 2020, answering onboarding, daily and monthly surveys about demographics, symptoms, and diagnoses, and other relevant information. Oura ring collects the person’s physiological data like heart rate, respiratory rate, skin temperature, metabolic equivalent of tasks. This data is stored, managed, and analyzed at SDSC. Our study focuses on developing an architecture that supports the different systematic exploration of approaches and performs comparison between them. We are analyzing the data, extracting new features, and building various algorithms that can be used for the early detection of COVID-19. In order to detect the onset of infection, we define a healthy window for each individual. This healthy window is derived by analyzing the daily rhythm of the physiological signals for every individual. In order to avoid false detection, the model calculates a dynamic baseline for each individual. Higher order features like ratios of temperature and activity, heart rate and its variability, deviations from the baseline, etc., are identified. Various ML models like Random Forest, XGBoost, CWT, Adaboost were trained, tested, and evaluated.


## Getting Started
```
virtualenv setup to launchpipeline


1. Execute "git pull"  command on iot_covid_analysis
2. Run "launch_pipeline_vi_env.sh" in command line from "git/iot_covid_analysis/" 
3. Enter the virutal environment name
    ** Enter Virtual Env name: <<name>> **
4.  Enter VM folder path
    **Enter VM folder: <vmfolder path>**
5. It will lauch "ML_Pipeline.ipynb" in jupyter notebook.
   If there are any error during lauch of notebook, please launch it manually.
    

```

```
1. "components" folder has entire pipeline code.
2. "dashboard" folder has separate a "README.MD" which explains how to run dash app in virtual environment.
2. Sample notebooks are available under "test_notebooks" folder
4. "ML_Pipeline.ipynb" under "src" folder helps to execute pipeline and evaluate the model.
```


| Configsetting | datatype |Description |
| --- | ----------- | ----------- |
| test_run | bool | consider only 10PIDs for the process|
| apply_pid_filter | bool |consider only these pids (**src\data\px242_pids.csv**). |
| merge_new_baselines | bool |merge dynamic baseline PIDs (**src\data\pickles\<prefix>\<rr,hrv,met>>_df_BL_Generated_by_Weekday.pkl**). |
| run_from_local | bool | run app in local machine or cluster |
| db_user_name | string | database username |
| db_password | string | database password |
| offline_db | bool | use filebased db instead of awesomeDB|
| local_base_data_folder | string | location of data folder when **run_from_local** config is true|
| server_base_data_folder | string | location of data folder when **run_from_local** config is false|
| local_dashboard_base_data_folder | string | location of dashboard data folder when **run_from_local** config is true|
| server_dashboard_base_data_folder | string | location of dashboard data folder **run_from_local** config is false|
| baseline_days_to_consider | int | # of days to consider as baseline |
| cvd_days_to_consider | int | # of days to consider as cvd |
| cvd_sliding_window_start | int | days prior to dx date |
| cvd_sliding_window_end | int | days after to dx date |
| label_pre_window_interval_from_dx_date | int | set covid label - #of buffer days prior to dx_date/px_date |
| label_pre_window_interval_from_dx_date | int | #of buffer days afterto dx_date/px_date |


```json

{

    "test_run":false,
    "apply_pid_filter":true,
    "merge_new_baselines":true,
    "run_from_local": true,
    
    "db_user_name":"",
    "db_password":"",
    "offline_db":true,
    "local_base_data_folder":"/home/gitrepo/iot_covid_analysis/data/",
    "server_base_data_folder":"<<jupyterhub shared folder>>",

    "local_dashboard_base_data_folder":"/home/gitrepo/iot_covid_analysis/src/dashboard/data/",
    "server_dashboard_base_data_folder":"<<jupyterhub shared folder>>",

    "dask_partitions":8,
    "dask_workers":8,

    "baseline_days_to_consider": 19,
    "cvd_days_to_consider": 19,
    "cvd_sliding_window_start": -14,
    "cvd_sliding_window_end": 7,

    "label_pre_window_interval_from_dx_date" : 2,
    "label_post_window_interval_from_dx_date" : 2
}
```
## About the app

The project has built a platform that facilitates systematic explorations of physiological data for various purposes. Furthermore, the code is completely modularized and can be extended for any future work.

A single Dashboard to analyze the physiological data in the time and frequency domain has helped the Tempredict team to evaluate their algorithm to derive physiological max data (Px Date). The Tempredict project running for the last year has developed a model for the covid prediction that only uses preliminary features. This product enables Tempredict to easily explore and analyze the physiological data in one single dashboard. It also allows Tempredict and other future research projects to rapidly explore various models in parallel and evaluate and compare the results in one dashboard. This platform will rank the features for various models and help to identify and select the best suitable features for the models.

The project code has been shared with the public, enabling other researchers, students, and data scientists to leverage this codebase for further research or extend the analysis for other medical detections. Clinicians currently don’t have the historical physiological signal data of individuals and the necessary tools to analyze the individuals’ daily rhythm. This product will help clinicians analyze individuals’ physiological signals over time in one single dashboard. 

## Folder structure

iot_covid_analysis (Root Folder)
    | data (input and output Folder)
        |  pickles   (*********output Folder - output artifacts have input physiology information which can not be shared as part of public library******) 
           *******Due to data restrictions of "Timescale DB of sdsc.edu" no input files have been provided. 
           *******however if the user has nautilus environment access then this source will work from there
        |  timescaledb (********Input Folder - Needed database connection which can not be shared as part of public library********)
           *******Due to data restrictions of "Timescale DB of sdsc.edu" no output files have been provided
           *******however if the user has nautilus environment access then this source will work from there

    | src (python files for pipeline)
        | components (data preprocessing and modeling components for pipeline)
        | cwt (CWT source)
        | dashboard (dashboard application)
            ./README.md  (dashboard readme)
            ./rundash.sh  (dashboard bashscript to launch virtual env)
            ./main_app.py (dashboard main app)
    | docs (document folder)
         ./fever_analysis_final_report_dse_2021_mas_group2.pdf
         ./fever_analysis_final_report_dse_2021_mas_group2_Poster.pdf

    ./launch_pipeline_vi_env.sh (pipeline shell script to launch virtual env)
    ./README.md (pipeline readme)