Social media recommendation algorithms are steadily improving—they are nowadays more likely to show content that users will interact with. This is by recommending contents based on the previous interactions of users. However, a disadvantage of this is, what would be the resulting recommendations to a person who is depressed that mostly interacts with depressing content? In the scenario that it would recommend triggering content, what could prevent these recommendations?
Every individual’s life is immeasurable in terms of worth. Depression and suicide are topics that should be taken seriously. More often than not, people focus on the physical being instead of the mental being. For some people, going to professionals and even guidance counselors for mental health concerns are considered taboo or looked down upon. Thus, people may result in posting their thoughts anonymously online as a means to cope with negative thoughts.
The use of social media in this case, Reddit, is part of an individual’s online self-presentation and may paint a bigger picture that cannot be found in a face-to-face setting. After a successful model has been created with the capability to classify text whether it is related to suicide or depression in nature, this study is a supplement to the prevention of suicide and may provide assistance to mental health concerns like depression.
Those individuals that fall into the generated classes can be presented with appropriate suicide prevention Web pages or mental health resources. Additionally, social media algorithms could be prevented in suggesting triggering content to these people. For future studies, suicide prevention and mental health assistance can be improved through the implementation of the results of this project.
- Extract the folder from the zipped file that you can download through this DownGit link.
- Launch
Jupyter notebook
orJupyterLab
. - Navigate to the project folder containing main.ipynb.
- Open
main.ipynb
. This contains the data pre-processing and cleaning, and the Exploratory Data Analysis. - To see the model training and tuning, open
ModelingPT1.ipynb
andModelingPT2.ipynb
. - To play around with the models, open
ModelPrediction.ipynb
.
This Github Repository contains five Jupyter notebooks, and three CSV files.
Jupyter notebooks | Description |
---|---|
main.ipynb |
Main notebook that also holds the Data Cleaning and Pre-processing, and EDA |
ModelingPT1.ipynb |
Notebook that holds the training and tuning of BERT and RoBERTa models |
ModelingPT2.ipynb |
Notebook that holds the training and tuning of Logistic Regression, Multinomial Naive Bayes, and Random Forest Classifier models |
ModelingPT2_Lemmatized.ipynb |
Notebook that holds the training and tuning of Logistic Regression, Multinomial Naive Bayes, and Random Forest Classifier models with the lemmatized dataset |
ModelPrediction.ipynb |
Notebook that allows the prediction of user-inputted text using all of the trained models |
Running the main.ipynb
notebook will result in the creation of three CSV files. The three CSV files holds the dataset that was used to train the models. However, the main difference of the first two files is the presence of unnecessary character sequences (i.e., hashtags, media links, square brackets, usernames, retweet tags), while the third file is the lemmatized version of the second file.
CSV files | Description |
---|---|
cleaned_data.csv |
Dataset with unnecessary character sequences |
cleaned_data_with_char_seq_removal.csv |
Dataset without the unnecessary character sequences |
lemmatized_with_char_seq_removal.csv |
Lemmatized version of the dataset without the unnecessary character sequences |
You can try out our trained models in the ModelPrediction.ipynb
notebook! Note that the Random Forest classifiers are not included in this.
For the model with the highest accuracy (i.e., RoBERTa), it can be accessed in this website
.
- Jean Pauline Gozon
- Gillian Nicole Jamias
- Andrea Jean Marcelo
- Anton Gabriel Reyes
- Francheska Josefa Vicente