Text moderation SoFIFA

Text Moderation Overview

We are currently using third-party AI text moderation tools that help us to detect different kinds of undesirable content, including but not limited to: sexual, hate, violence, bullying, promotions, and links to external sites. The AI text moderation tools support over 15 languages and the text models are also trained to understand the semantic meaning of different emojis.

Text Classification

Classes are ordered by severity ranging from level 3 (most severe) to level 0 (benign).

Sexual

3: Intercourse, masturbation, porn, sex toys and genitalia
2: Sexual intent, nudity and lingerie
1: Informational statements that are sexual in nature, affectionate activities (kissing, hugging, etc.), flirting, pet names, relationship status, sexual insults and rejecting sexual advances
0: The text does not contain any of the above

Hate

3: Slurs, hate speech, promotion of hateful ideology
2: Negative stereotypes or jokes, degrading comments, denouncing slurs, challenging a protected group's morality or identity, violence against religion
1: Positive stereotypes, informational statements, reclaimed slurs, references to hateful ideology, immorality of protected group's rights
0: The text does not contain any of the above

Violence

3: Serious and realistic threats, mentions of past violence
2: Calls for violence, destruction of property, calls for military action, calls for the death penalty outside a legal setting, mentions of self-harm/suicide
1: Denouncing acts of violence, soft threats (kicking, punching, etc.), violence against non-human subjects, descriptions of violence, gun usage, abortion, self-defense, calls for capital punishment in a legal setting, destruction of small personal belongings, violent jokes
0: The text does not contain any of the above

Bullying

3: Slurs or profane descriptors toward specific individuals, encouraging suicide or severe self-harm, severe violent threats toward specific individuals
2: Non-profane insults toward specific individuals, encouraging non-severe self-harm, non-severe violent threats toward specific individuals, silencing or exclusion
1: Profanity in a non-bullying context, playful teasing, self-deprecation, reclaimed slurs, degrading a person's belongings, bullying toward organizations, denouncing bullying
0: The text does not contain any of the above

Drugs

3: Descriptions of the acquisition of drugs and text that explicitly promotes, advertises, or encourages drug use
2: References to past drug acquisition or use as well as descriptions of recreational use that do not promote drugs to others
1: Language around drugs that is neutral or informational, discouraging, or ambiguous in meaning
0: The text does not contain any of the above

Child Exploitation

3: Asking for or trading child pornography(cp) or related links, mentioning proclivity for cp, identifiably underage users soliciting sex or pornography, roleplay involving children, mentions of sexual activity or sexual fetishes involving children
0: The text does not include any of the above
-1: Language of input is not supported
Child Safety (Beta)
3: Content that contains a direct or indirect threat of physical violence to children in a school or school-related setting
0: The text does not include any of the above
-1: Language of input is not supported
Self Harm (Beta)
3: Content related to promoting, planning, or carrying out suicide, non-suicidal self harm (cutting, burning, etc.), and behaviors associated with eating disorders
0: The text does not include any of the above
-1: Language of input is not supported
Gibberish
3: keyboard spam and phrases or words that are completely incomprehensible (Ex: "kgvjbwklrgjb", "ef2$gt rgbu").
0: The text does not include the above
-1: Language of input is not supported
Spam
3: The text is intended to redirect a user to a different platform, including email addresses, phone numbers, and certain links
0: The text does not include the above OR includes a link to a allowlisted domain (i.e., popular, reputable sites)
-1: Language of input is not supported
Promotions
3: Asking for likes/follows/shares, advertising monthly newsletters/special promotions, asking for donations/payments, advertising products, selling pornography, giveaways
0: The text does not include the above
-1: Language of input is not supported