Projects
and Publications

Awaiting Publication

Machine Learning Prediction of Patient Outcomes

We sought to compare the predictive performance of Logistical Regression, Decision Trees, Naive Bayes, Random Forests and XGBoost techniques to predict the occurrence of Hospital Acquired Pressure Injuries (HAPI) from 57,227 hospitalizations, containing 241 positive cases, acquired from Dartmouth Hitchcock Medical Center from April 2011 to December 2016. The five classifiers were  trained to predict HAPI incidence and performance was assessed using the C-statistic or Area Under the Receiver Operating Curve (AUC).  

https://www.medrxiv.org/content/10.1101/2020.03.29.20047084v2

We also provided means to visually assess factors important to every patient’s prediction, regardless of the modeling approach, through Shapley Additive Explanations. 

Statistics

Outlier Detection

In this project I used Python and Gaussian anomaly detection to search for Outliers using the Thyroid data set from UC Irvine Machine Learning Repository. 

In practice, this means making the computer make a probable diagnose of Hypothyroidism without knowing any of the threshold laboratory values used by Doctors to estimated Thyroid function:

https://nbviewer.jupyter.org/github/jorflima/Python-Code/blob/master/Gaussian_Anomaly_Detection .ipynb

This same multivariate method is very useful to detect cases of anomalies in healthcare possibly linked to abuse, fraud, low quality or high costs.

Machine Learning

Renal DIsease Classification

On this project I used the R statistical program coupled with the H20 library for machine learning to predict the stages of Renal disease based on patient characteristics and the Charlson comorbidity index.  In other words, I tried to answer the question: Do patients with additional diseases are more likely to have a more advanced stage of kidney insufficiency? And if so, can I use that information for prediction? 

I compared a Random Forest Models vs a Simple Neural Network for classification:

https://github.com/jorflima/R-Code/blob/master/Bayes_Forests_Neural_Nets_H2O.Rmd

Big Data

Data Wrangling

Sometimes just opening the file and manipulating the data can be a challenge in itself!

Using a laptop and R with the aid of libraries such as dplyr I did ELT (Extraction, Tranformation and Loading)  on the 2016 National Impatient Sample (NIS) dataset. Here I selected the correct ICD 10 codes, removed unnecessary variables and created new ones to prepare the data for future analytical modelling.

The NIS is part of the HCUP and is the largest publicly available all-payer inpatient care database in the United States usually containing between 7 and 8 million records according to the year.  

https://github.com/jorflima/R-Code/blob/master/Chronic_Kidney_Disease_Project.Rmd

Publication

Home cARE VISITS effectiveness in a hMO

A retrospective cohort study was performed to assess the impact of a Case Management Home Care Program supplied by the Unimed-BH medical cooperative on hospitalization-free survival time among eligible patients 60 years or older. A Cox proportional hazards model was fitted to assess the impact of home visits by health professionals on hospitalization-free survival time in a sample of 2,943 elders, while adjusting for patient age, physical dependence, medicines, feeding route, pressure ulcers, supplemental oxygen therapy, cognitive impairment, outpatient visits, and hospitalizations in the preceding quarter.

https://pubmed.ncbi.nlm.nih.gov/25402253/

Risk factors for shorter hospitalization-free survival time were: degree of physical dependence, enteral nutrition, supplemental oxygen therapy, pressure ulcers, and hospital admissions in the previous quarter. Higher rates of home visits by physicians and nurses showed a protective dose response effect on hospitalization-free survival time.