Content Moderation Simulation

Content moderation—the process of reviewing content created and shared by users to see if it complies with a digital platform’s policies—is a challenging and complex undertaking, especially at scale. But before we dive deeper into content moderation, let’s take a moment to review some of the key takeaways covered in early T&S Curriculum chapters: 

Industry Overview

  • Trust & Safety (T&S) is a term to describe the teams at internet companies and service providers that work to ensure users are protected from harmful and unwanted experiences
  • T&S has evolved from removing bad content to having a much larger role in creating the content and product policies governing a company’s products and services, as well as developing the tools, systems, and techniques for enforcing those policies
  • The advent of social media has increased the need for user-generated content (UGC) to be moderated and this is a huge focus of T&S work. 

Creating and Enforcing Policy

  • Policy is the set of rules and guidelines that a platform uses to govern the conduct of its users.
  • Policy often exists in two forms: an external document providing an overview of the company’s expectations of user behavior, and an internal document detailing exactly how and when to apply the policies in making specific decisions.
  • Policies change in scope over time due to a variety of factors, including but not limited to: a change in societal or user behavior, the regulatory landscape, to improve detection capabilities, etc.
  • Policy models vary in their scope and different types are used by different websites/platforms. Similarly, the abuse that policy intends to mitigate varies across platforms, but some examples include Violent and Criminal Behavior, Regulated Goods and Services, Offensive & Objectionable Content, User Safety, Scaled Abuse, Deceptive & Fraudulent Behavior.

Content Moderation and Operations 

  • Content moderation is the process of reviewing online user-generated content (UGC) for compliance against a digital platform’s policies regarding what is and is not allowed to be shared on their platform.
  • Moderation is done through human review or automated tooling, or a combination of both. 
  • Content moderation results in a few different potential enforcement actions, such as: content deletion; banning; temporary suspension; feature blocking; reducing visibility; labeling; demonetization; withholding payments; referral to law enforcement
  • Content moderation is important for many reasons, including ensuring safety and privacy, supporting free expression, and generating trust, revenue, and growth.  

Policy Tiers, Complexity, and Enforcement Actions 

Not all policies are created or enforced equally. Most platforms will have certain violations that rank higher than others due to a variety of factors. These policies will be enforced as P0 or and receive the highest priority in review and may include a quicker turnaround time, more rigid SLAs, and a more specialized moderation skillset. Certain policies may be more complex than others. Some require a more binary or straightforward decision, and others require several steps to come to a determination. Others still may require a more contextual review, may cross-reference several policies, or require additional operational nuance.

Most platforms also have different types of enforcement actions that correlate to the severity and priority of the violation. If content violates more than one policy, a moderator may need to choose one from several enforcement options. Enforcement actions include, but are not limited to: temporary suspensions, permanent bans, temporary removal of access to parts or features of a product, rate limiting use, and shadow banning account actions.

By the end of this module, you will: 

  • Apply the concepts previously learned in the T&S Curriculum to understand what content moderation looks like in context; 
  • Understand the different enforcement actions taking in accordance with moderation scenarios;
  • Engage in moderation of fictitious scenarios to make enforcement decisions. 

Content Moderation Simulation Module

The following 20 questions cover a range of scenarios a content moderator might find themselves evaluating. While a variety of policy categories and situations are included in these scenarios, they are not exhaustive. This simulation will give you a taste of the real work content moderators do, but is not meant to be a full-fledged reenactment. We’ve endeavored to sanitize any content that might be overly offensive for the sake of this exercise; however, please be mindful that content moderation can take a mental toll on those working in the field. As such, it’s important to be aware of the potential for reviewing triggering content before jumping in and doing the work. These scenarios focus on pure human review and do not include reference to how the content made its way to the moderator in the first place or contain any reference to automated enforcement mechanisms (like classifiers) which are often used in tandem with human moderators. With all of this in mind, let’s start moderating!

1 / 20

A user writes a comment on their friend’s post, saying “if you bring up this topic again, I will feed you to a velociraptor 😂

2 / 20

A video of a political protest from a local news channel includes images of nude protestors holding signs and chanting. Shots are framed to avoid showing bare nipples or genitalia. In wider shots the bodies of protestors are pixelated.

3 / 20

A user makes a comment about women’s ability to drive and jokes that they are naturally bad drivers.

4 / 20

A user states the Holocaust is a hoax perpetrated by the Allies, Jews, and/or the Soviet Union.

5 / 20

A user jokes with her friend, a guest on her livestream, making fun of her for losing a game they were playing. The guest responds by telling her friend to “go die!”

6 / 20

A video of a social commentator discussing public morals includes clips taken from a pornographic movie. The participants in the clips are clearly engaged in an explicit sex act, but due to the camera angle no genitalia are visible.

7 / 20

A user receives a private message that reads: hey. i'll give you $100 for sex.

8 / 20

A user account with the name and a publicly available photo of a famous model, and links to services where fans can donate money. The account was previously using the name and photo of a different person, and was set up in a different country. The model in question already has a separate account that is verified and trusted. 

9 / 20

Review this image to see if it violates policy. The accompanying text reads: I love martial arts!

10 / 20

A user posts a photo in which they have a rope in their hands and are attempting to build a noose around the handrail of a staircase. The caption on the photo reads: “i’ve accepted my fate…no turning back now.”

11 / 20

A user posts a lot about suicidal tendencies and makes a comment that it could be the best solution if you’re going through a rough time.

12 / 20

Review this image to see if it violates policy.

13 / 20

An add with this photo includes the text: Assorted pain meds / opiates, leftover prescription.

14 / 20

A user shares a story about her recent car accident and then shares a photo of her injury; the photo depicts exposed tendons in her hand.

15 / 20

A user posts the image above with the text: All hail Osama Bin Laden and the great work he has done for our people! None can compare to his genius.

16 / 20

A user displays the flag of ISIS as the main background of their video while the audio is a commentary condemning the acts of ISIS in the Middle East.

17 / 20

A user plays a video game on livestream in which there is a display of the swastika symbol as shown in the image attached.

18 / 20

A user calls out the immigrants in his/her country saying that they are degrading the country's culture and should go back to their own land.

19 / 20

A user announces on a livestream that he is on his way to “shoot up a public school in the next block.”

20 / 20

A user posts a photo of a rat with a caption that compares people of a certain race to the rodent in the photo.

Your score is

The average score is 63%

0%


Acknowledgements

Authors│James Gresham, Abigail Schmidt, Noor Haneyah, Allison Weilandics
Contributors│Harsha Bhatlapenumarthy, Dona Bellow, Sarah Godlewski
Special Thanks│Automattic Team