Author: Chris Gian Project Description:
- Build an API that will determine whether content is appropriate where appropriate is defined as whether it violates a terms of service agreement you would see a typical social media company have.
Spotify® Support Community Terms! Spotify® Support Community Guidelines !
8. Always use an appropriate and respectful language when you post information in the Community. Avoid racist, sexist, abusive, harassing, defamatory, pornographic, threatening, obscene, condescending or otherwise offensive language that could be considered detrimental to other users, or Spotify employees or moderators.
9. Do not post information or create threads for the promotion or advertisement of commercial products or services.
From the above, types of content that violate the following will be removed unilaterally:
- racist, sexist, abusive, harassing, defamatory, pornographic, threatening, obscene, condescending, offensive
From this list, I will target the most discrete of these categories:
- Racism, sexism, threatening
Given above objectives the following data sources will be used:
- General Resource:
- Sexism:
- Automatic Misogyny Identification
- notes: need to get password from that team.
- Automatic Misogyny Identification
- Online Bullying:
- Fox News Hate Speech:
- Internal Validity: Can the model moderate content within the confines of this experiment (train-test split / K-Folds)?
- External Validity: Can the model moderate content outside of the experiment (real-world data)?
- Measures:
TBD
TBD
- identify quality datasets (one, two, three)
- create method to extract csv of ids and labels and pass through twitter api