Defining abuse types is the first step in policy creation. The next is to determine how abuse policies are enforced. There are a variety of ways to enforce policies, but broadly, policies can be enforced reactively, after a user, trusted flagger, or other entity has flagged content, or proactively, often using artificial intelligence to identify abusive content or behavior before it is flagged or available on a platform. (See the Content Moderation and Operations chapter and the Automated Systems and AI chapter for additional information on how proactive content moderation is performed.)
Determining whether policies are enforced reactively or proactively depends on a variety of factors and differs from company to company. Key considerations when determining which enforcement method to use include resources, user expectations of privacy, abuse types, legal considerations, and other considerations.
Smaller services often lack the engineering and operations resources needed to build the machine learning classifiers needed for proactive enforcement, and as a result rely more on user flags to identify abusive content. Such services may also lack human reviewers or moderators to review content to check for policy compliance prior to user generated content being posted. While some commercial solutions exist to identify abusive content, such as profanity or explicit content filters, such products often require customization to be useful.
User Expectations of Privacy
Proactive enforcement often makes sense when a product or service’s user generated content is publicly accessible or viewable by anyone. For example, tweets or user reviews on Google Maps are public, and thus ensuring there is no hate speech, harassment, or sexually explicit content is essential to ensuring the service is not overwhelmed with abusive content. There is also little expectation of privacy for content that is posted publicly, and therefore a higher expectation that the product or service will proactively review public content for policy adherence. On the other hand, products and services in which content is shared with a small audience or directly with another user tend to rely more heavily on reactive reporting because proactive enforcement can invade a sense of users’ privacy. For example, users of Skype or Snapchat direct messaging expect that conversations they have with each other remain private and not scanned or monitored for abusive content. While some forms of proactive enforcement can occur without reviewing content, often by using metadata or behavior pattern analysis, many forms of abuse can only be determined by looking at the content itself.
Products and services can often calibrate the method of enforcement based on the type of content or behavior. Proactive enforcement is usually applied when the abuse type is disruptive, widely unwanted, severe in nature, and clearly defined. It may make sense to proactively detect and filter spam, for example, because it is disruptive and widely unwanted and failing to do so would negatively impact the product or service’s popularity. For more severe forms of abuse, such as child sexual abuse material (CSAM), a product or service may choose to proactively identify such content even in more private settings, such as 1:1 chats, where user expectations of privacy are higher. For less severe policy violations waiting for a user flag may make more sense. Abuse types that are clearly defined and have few “gray areas” may also be more suitable for proactive enforcement because machine tools such as hash matching or machine learning classifiers can identify such content with a high accuracy, leading to fewer false positives that could lead to wrongful enforcement actions or overwhelm manual systems.
In certain cases, proactive enforcement may be legally required. For example, India’s Intermediary Liability Guidelines (2021) mandates significant social media intermediaries to “deploy technology-based measures, including automated tools or other mechanisms” to proactively detect and filter content depicting rape, CSAM, or content removed previously. Similarly, the European Union’s Copyright Directive (2019) instructs online content-sharing service providers to make “best efforts” to block future uploads of content that has previously been removed pursuant to a takedown notice. As the Brookings Institute noted, “This will force internet services to implement filters to check every piece of content uploaded to the site against a database of known copyrighted works.”
Determining the enforcement method to use can also depend on a range of other factors. Product ethos, the way in which a product or service is framed to the public, can often be a key consideration. Products that optimize for safety may select a more proactive enforcement regime, whereas products that prioritize free expression and a more hands-off attitude may only prefer to act against abuse when such content is reported (in addition to having fewer abuse policies in general). Proactive enforcement can also be helpful in identifying abusive content in situations where there is user complicity—members of a private Google Group dedicated to buying/selling drugs probably will not flag abusive content and behavior occurring within the group because they are all benefiting from it. Even so, proactive enforcement has its limits: artificial intelligence never identifies all novel forms of abusive content and behavior, and human reviewers are imperfect.
Determining which enforcement action moderators should take in response to a given policy violation is an essential part of designing a fair, comprehensive policy and enforcement system. There are a wide variety of possible actions, each with different tradeoffs. Finding the right balance depends on a variety of factors. These include the nature and severity of the violation, the type of content (i.e., sponsored or organic content), the potential harm involved, and the likely response and nature of the responsible party. The following are types of enforcement regularly used throughout the industry, roughly ordered from simple and common actions to more complex and rarely used actions.
Professor of Law at Santa Clara University and TSPA co-founder Eric Goldman breaks down enforcement actions, or “remedy options,” into five broad categories—content regulation, account regulation, visibility reductions, monetary, and other—and provides examples of each in the table below.
|Content Regulation||Account Regulation||Visibility Reductions||Monetary||Other|
|Remove content||Terminate account||Shadow ban||Forfeit accrued earnings||Educate users|
|Suspend content||Suspend account||Remove from external search index||Terminate future earnings (by item or account)||Assign strikes/warnings|
|Relocate content||Suspend posting rights||No-follow author’s links||Suspend future earnings (by item or account)||Outing/unmasking|
|Edit/redact content||Remove credibility badges||Remove from internal search index||Fine author/impose liquidated damages||Report to law enforcement|
|Interstitial warning||Reduce services levels (data, speed, etc.)||Downgrade internal search visibility||Put user/content on blocklist|
|Add warning legend||Shaming||No auto-suggest||Community service|
|Add counterspeech||No/reduced internal promotion||Restorative justice/apology|
|Disable comments||No/reduced navigation links|
|Display only to logged-in readers|
Content deletion is the removal of a piece of content that violates policy from the platform. Content deletion is the most common action taken by platforms for a wide range of abuse types. Not all violating content is deliberate and malicious, and it is common for content to be removed from platforms with no further consequences to the user.
Banning is the permanent removal or blocking of a user or account from the platform. This can also extend to banning any new accounts (if they can be identified) that the user attempts to create or uses to access the platform. Content from before an account is banned may still be visible or may be removed as part of the ban, depending on the nature of both the platform and the policy violation.
Banning can be the result of a single serious violation or multiple smaller violations. Industrial models in particular often use strike-based systems to ban repeat violators.
Banning and content deletion are the simplest and most intuitive forms of penalty, and as such are usually the first enforcement options implemented on nearly all platforms, across all enforcement models, and at all scales.
Temporary suspension is identical to banning, but lasts only for either a specified period of time or until the user completes certain specified actions, after which the account is automatically reinstated.
Timed temporary suspensions of escalating length are frequently used as a precursor to banning a user permanently, in an attempt to both change user behavior and to convey the compounding seriousness of repeated policy violations.
Contingent temporary suspensions are most frequently used in three situations: (1) when the company suspects a user’s account is compromised; (2) when the company wants the user to read and acknowledge a set of policies before returning to the site to help minimize repeated abusive behavior; or (3) when the company needs the user to contact support teams to answer questions as part of investigating an incident.
Feature blocking encompasses any restriction of access to certain features of a platform based on previous actions of a user, either temporarily or permanently. This might involve specifically removing access to features that have been misused in the past, or to features that would generally be considered higher risk or are more difficult to moderate, such as live streaming. Feature blocking has the advantage of allowing users to remain active on the platform, while minimizing potential harm from their actions. Historically, feature blocking has been mostly used on large, established platforms with diverse features that might justify a specific feature blocking action.
Reducing visibility refers to steps that reduce how often and how prominently a piece of content or an account is viewed. These steps are most often used on platforms in which the product itself guides and curates a user’s experience with algorithms. This can take the form of directly imposed visibility reductions by platforms practicing the Centralized/Industrial model of T&S or as a result of user downvotes in companies using Community Reliant models
The form that visibility reduction may take can vary significantly based on the nature of the product itself. Common examples include removing the user/content from features such as recommendations or trending stories; downranking the user’s or content’s position in search results or feeds; and auto-collapsing comments on threaded posts.
Reducing visibility is distinct from the more general field of search/feed quality, which evaluates whether content is relevant, interesting, or useful for ranking purposes based on views and clicks. Reducing visibility is a deliberate action to minimize viewing because of the nature of the content. However, there is occasionally crossover between these fields such as when a Community Reliant platform hides content with large numbers of downvotes.
Labeling involves attaching a message to a user or piece of content to provide information to the viewer. These labels can be used to inform the viewer of any concerns or of important information relevant to the content or the topic discussed. For example, labels may be used to provide authoritative information on a topical subject which is more prone to false information (e.g., COVID), or to comply with laws such as POFMA.
Labels can be placed either next to the content or placed over the content so that it is not visible unless a viewer chooses to click through to see it. This is often used to hide content that is allowed on a platform but is potentially offensive or disturbing. The most common example of this is age-restricted content that involves violence or nudity, requiring the user to click through and / or to sign in.
Occasionally, labels or other messaging can be presented to the original user who posted the content, rather than those who view the content. The most common example of this is support and helpline information for users at risk of suicide.
Demonetization is an action that prevents users from earning income and specifically applies to platforms where users can earn money from their content, usually through advertising. Demonetization is often applied to content that is allowed on the platform, but which is controversial or which advertisers may not wish to sponsor or be directly associated with.
Withholding payments is similar to demonetization but applies specifically to marketplaces where the company serves as an intermediary between a buyer and a seller. It is most often imposed in response to suspected compromised accounts and fraudulent sales, but may also be used in conjunction with banning and referral to law enforcement as a penalty for serious criminal behavior.
Referral to Law Enforcement
Referral to law enforcement is the most serious action that platforms can take and involves forwarding details of the user and their situation to law enforcement or other related third parties such as the National Center for Missing and Exploited Children (NCMEC). Situations where law enforcement may be contacted include:
- People at imminent risk of serious self harm or suicide;
- People planning, threatening, or committing crimes involving serious physical harm such as murder;
- The sale of dangerous and/or illegal goods;
- Terrorists and terrorist organizations;
- Abuse of children, including the sharing of video and images;
- Human trafficking.
Violations of this type are generally rare compared to other forms of online abuse. However, because these situations are extremely serious and often time-sensitive, having clear policy guidelines and action plans for whom to contact and how are essential to handling them effectively.
In addition to proactive reports by a company to law enforcement, law enforcement may also make legal requests of platforms for information on users for criminal, civil, regulatory, or other legal reasons. These requests can be made with or without a court order or similar legal ruling. Some of these situations may involve imminent and time sensitive threats, so such requests are generally very high priority. For more information, see the Trust & Safety and Law Enforcement chapter.