Content Moderation Simulation - Trust & Safety Professional Association

Content moderation—the process of reviewing content created and shared by users to see if it complies with a digital platform’s policies—is a challenging and complex undertaking, especially at scale. But before we dive deeper into content moderation, let’s take a moment to review some of the key takeaways covered in early T&S Curriculum chapters:

Industry Overview

Trust & Safety (T&S) is a term to describe the teams at internet companies and service providers that work to ensure users are protected from harmful and unwanted experiences
T&S has evolved from removing bad content to having a much larger role in creating the content and product policies governing a company’s products and services, as well as developing the tools, systems, and techniques for enforcing those policies
The advent of social media has increased the need for user-generated content (UGC) to be moderated and this is a huge focus of T&S work.

Creating and Enforcing Policy

Policy is the set of rules and guidelines that a platform uses to govern the conduct of its users.
Policy often exists in two forms: an external document providing an overview of the company’s expectations of user behavior, and an internal document detailing exactly how and when to apply the policies in making specific decisions.
Policies change in scope over time due to a variety of factors, including but not limited to: a change in societal or user behavior, the regulatory landscape, to improve detection capabilities, etc.
Policy models vary in their scope and different types are used by different websites/platforms. Similarly, the abuse that policy intends to mitigate varies across platforms, but some examples include Violent and Criminal Behavior, Regulated Goods and Services, Offensive & Objectionable Content, User Safety, Scaled Abuse, Deceptive & Fraudulent Behavior.

Content Moderation and Operations

Content moderation is the process of reviewing online user-generated content (UGC) for compliance against a digital platform’s policies regarding what is and is not allowed to be shared on their platform.
Moderation is done through human review or automated tooling, or a combination of both.
Content moderation results in a few different potential enforcement actions, such as: content deletion; banning; temporary suspension; feature blocking; reducing visibility; labeling; demonetization; withholding payments; referral to law enforcement
Content moderation is important for many reasons, including ensuring safety and privacy, supporting free expression, and generating trust, revenue, and growth.

Policy Tiers, Complexity, and Enforcement Actions

Not all policies are created or enforced equally. Most platforms will have certain violations that rank higher than others due to a variety of factors. These policies will be enforced as P0 or and receive the highest priority in review and may include a quicker turnaround time, more rigid SLAs, and a more specialized moderation skillset. Certain policies may be more complex than others. Some require a more binary or straightforward decision, and others require several steps to come to a determination. Others still may require a more contextual review, may cross-reference several policies, or require additional operational nuance.

Most platforms also have different types of enforcement actions that correlate to the severity and priority of the violation. If content violates more than one policy, a moderator may need to choose one from several enforcement options. Enforcement actions include, but are not limited to: temporary suspensions, permanent bans, temporary removal of access to parts or features of a product, rate limiting use, and shadow banning account actions.

By the end of this module, you will:

Apply the concepts previously learned in the T&S Curriculum to understand what content moderation looks like in context;
Understand the different enforcement actions taking in accordance with moderation scenarios;
Engage in moderation of fictitious scenarios to make enforcement decisions.

Content Moderation Simulation Module

The following 20 questions cover a range of scenarios a content moderator might find themselves evaluating. While a variety of policy categories and situations are included in these scenarios, they are not exhaustive. This simulation will give you a taste of the real work content moderators do, but is not meant to be a full-fledged reenactment. We’ve endeavored to sanitize any content that might be overly offensive for the sake of this exercise; however, please be mindful that content moderation can take a mental toll on those working in the field. As such, it’s important to be aware of the potential for reviewing triggering content before jumping in and doing the work. These scenarios focus on pure human review and do not include reference to how the content made its way to the moderator in the first place or contain any reference to automated enforcement mechanisms (like classifiers) which are often used in tandem with human moderators. With all of this in mind, let’s start moderating!

1 / 20

A user writes a comment on their friend’s post, saying “if you bring up this topic again, I will feed you to a velociraptor 😂”

No Violation

Hate Speech Violation

Spam Violation

Harassment Violation

Dangerous Misinformation

Threat of Violence

Negative Stereotype of a Protected Group

2 / 20

A video of a political protest from a local news channel includes images of nude protestors holding signs and chanting. Shots are framed to avoid showing bare nipples or genitalia. In wider shots the bodies of protestors are pixelated.

Adult Content Violation

Negative Stereotype of a Protected Group

Hate Speech Violation

Harassment Violation

Dangerous Misinformation

Threat of Violence

No Violation

3 / 20

A user makes a comment about women’s ability to drive and jokes that they are naturally bad drivers.

Dangerous Misinformation

Hate Speech Violation

Violent Extremism Violation

Harassment Violation

Spam Violation

Suicide and Self-Harm Violation

No Violation

4 / 20

A user states the Holocaust is a hoax perpetrated by the Allies, Jews, and/or the Soviet Union.

Violent Extremism Violation

No Violation

Adult Content Violation

Hate Speech Violation

Spam Violation

Dangerous Misinformation

Suicide and Self-Harm Violation

5 / 20

A user jokes with her friend, a guest on her livestream, making fun of her for losing a game they were playing. The guest responds by telling her friend to “go die!”

Dangerous Misinformation

Spam Violation

Hate Speech Violation

Threat of Violence

Suicide and Self-Harm Violation

Harassment Violation

No Violation

6 / 20

A video of a social commentator discussing public morals includes clips taken from a pornographic movie. The participants in the clips are clearly engaged in an explicit sex act, but due to the camera angle no genitalia are visible.

Suicide and Self-Harm Violation

Spam Violation

No Violation

Hate Speech Violation

Adult Content Violation

Dangerous Misinformation

Illegal Goods and Services

7 / 20

A user receives a private message that reads: hey. i'll give you $100 for sex.

Illegal Goods and Services

Adult Content Violation

Spam Violation

Harassment Violation

Negative Stereotype of a Protected Group

No Violation

Hate Speech Violation

8 / 20

A user account with the name and a publicly available photo of a famous model, and links to services where fans can donate money. The account was previously using the name and photo of a different person, and was set up in a different country. The model in question already has a separate account that is verified and trusted.

Hate Speech Violation

Illegal Goods and Services

Impersonation and Authenticity Violation

Harassment Violation

Dangerous Misinformation

No Violation

Spam Violation

9 / 20

Review this image to see if it violates policy. The accompanying text reads: I love martial arts!

Dangerous Misinformation

Suicide and Self-Harm Violation

Harassment Violation

Spam Violation

No Violation

Threat of Violence

Hate Speech Violation

10 / 20

A user posts a photo in which they have a rope in their hands and are attempting to build a noose around the handrail of a staircase. The caption on the photo reads: “i’ve accepted my fate…no turning back now.”

Negative Stereotype of a Protected Group

No Violation

Dangerous Misinformation

Threat of Violence

Harassment Violation

Suicide and Self-Harm Violation

Spam Violation

11 / 20

A user posts a lot about suicidal tendencies and makes a comment that it could be the best solution if you’re going through a rough time.

Dangerous Misinformation

Negative Stereotype of a Protected Group

Suicide and Self-Harm Violation

No Violation

Threat of Violence

Harassment Violation

Spam Violation

12 / 20

Review this image to see if it violates policy.

Adult Content Violation

Dangerous Misinformation

Harassment Violation

Hate Speech Violation

Spam Violation

Suicide and Self-Harm Violation

No Violation

13 / 20

An add with this photo includes the text: Assorted pain meds / opiates, leftover prescription.

Dangerous Misinformation

No Violation

Harassment Violation

Illegal Goods and Services

Adult Content Violation

Spam Violation

Threat of Violence

14 / 20

A user shares a story about her recent car accident and then shares a photo of her injury; the photo depicts exposed tendons in her hand.

Violent and Graphic Content Violation

Negative Stereotype of a Protected Group

Suicide and Self-Harm Violation

No Violation

It's Complicated

Dangerous Misinformation

Spam Violation

15 / 20

A user posts the image above with the text: All hail Osama Bin Laden and the great work he has done for our people! None can compare to his genius.

Spam Violation

Negative Stereotype of a Protected Group

Violent Extremism Violation

No Violation

Harassment Violation

Hate Speech Violation

Dangerous Misinformation

16 / 20

A user displays the flag of ISIS as the main background of their video while the audio is a commentary condemning the acts of ISIS in the Middle East.

Harassment Violation

Hate Speech Violation

Promotion of Hateful Ideology

Dangerous Misinformation

Negative Stereotype of a Protected Group

No Violation

Violent Extremism Violation

17 / 20

A user plays a video game on livestream in which there is a display of the swastika symbol as shown in the image attached.

Hate Speech Violation

Harassment Violation

No Violation

Violent Extremism Violation

Spam Violation

Negative Stereotype of a Protected Group

It's Complicated

18 / 20

A user calls out the immigrants in his/her country saying that they are degrading the country's culture and should go back to their own land.

Suicide and Self-Harm Violation

Dangerous Misinformation

No Violation

Violent Extremism Violation

Harassment Violation

Hate Speech Violation

Spam Violation

19 / 20

A user announces on a livestream that he is on his way to “shoot up a public school in the next block.”

Threat of Violence

Harassment Violation

No Violation

Negative Stereotype of a Protected Group

Spam Violation

Hate Speech Violation

Violent Extremism Violation

20 / 20

A user posts a photo of a rat with a caption that compares people of a certain race to the rodent in the photo.

No Violation

Harassment Violation

Adult Content Violation

Dangerous Misinformation

Hate Speech Violation

Violent Extremist Violation

Suicide and Self-Harm Violation

Your score is

The average score is 63%

Acknowledgements

Authors│James Gresham, Abigail Schmidt, Noor Haneyah, Allison Weilandics
Contributors│Harsha Bhatlapenumarthy, Dona Bellow, Sarah Godlewski
Special Thanks│Automattic Team

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.