Major depressive disorder prediction using data science

Vincent Peter C. Magboo, Ma. Sheila A. Magboo


Background:  Major depressive disorder is a mood disorder that has affected many people worldwide. It is characterized by persistently low or depressed mood, anhedonia or decreased interest in pleasurable activities, feelings of guilt or worthlessness, lack of energy, poor concentration, appetite changes, psychomotor retardation or agitation, sleep disturbances, or suicidal thoughts.

Objective:  The objective of the study was to predict the presence of major depressive disorder using a variety of machine learning classification algorithms (logistic regression, Naive Bayes, support vector machine, random forest, adaptive boosting, and extreme gradient boosting) on a publicly available depression dataset.

Methodology:  After data pre-processing, several experiments were performed to assess the recursive feature elimination with cross validation as a feature selection method and synthetic minority over-sampling technique to address dataset imbalance. Several machine learning algorithms were applied on an anonymized publicly available depression dataset. Feature importance of the top performing models were also generated. All simulation experiments were implemented via Python 3.8 and its machine learning libraries (Scikit-learn, Keras, Tensorflow, Pandas, Matplotlib, Seaborn, NumPy).

Results:  The top performing model was obtained by logistic regression with excellent performance metrics (91% accuracy, 93% sensitivity, 85% specificity, 93% recall, 93% F1-score, and 0.78 Matthews correlation coefficient). Feature importance scores of the most relevant attribute were also generated for the best model.

Conclusion: The findings suggest the utility of data science techniques powered by machine learning models to make a diagnosis of major depressive disorders with acceptable results. The potential deployment of these machine learning models in clinical practice can further enhance the diagnostic acumen of health professionals. Using data analytics and machine learning, data scientists can have a better understanding of mental health illness contributing to prompt and improved diagnosis thereby leading to the institution of early intervention and medical treatments ensuring the best quality of care for our patients.


Major Depressive Disorder (MDD) prediction; machine learning; recursive feature elimination with cross validation (RFE-CV); synthetic minority over-sampling technique (SMOTE); feature importance

Full Text:



  • There are currently no refbacks.

Print ISSN: 2704-3517; Online ISSN: 2738-042X