Automation is the creation and application of technologies to produce and deliver services with minimal human intervention. Trust and safety professionals use automation in a myriad of ways to achieve several core trust and safety functions and objectives. In particular, automation can improve enforcement scale and speed, efficiency, consistency, and accuracy.
In this chapter, automation includes technologies that assist human review (e.g., by flagging content, or portions of it, that may violate a service’s community guidelines or content policy) as well as technologies deployed for “automated decision making” (i.e., there is no human in the loop before a decision is reached about a particular piece of content or an account). Trust and safety teams also use automation tools more broadly, as discussed below.
Enforcement Scale and Speed
One of the never-ending challenges for trust and safety is identifying large volumes of harmful, unwanted, or other policy-violating content, conduct, or accounts before users are exposed to it. Automation technology plays a critical role in proactively identifying such content at a speed and scale impossible to replicate with manual operations. For example, according to Facebook’s Community Standards Enforcement Report (Q2 2021), automated technologies such as machine learning classifiers are used to proactively take action on more than 90% of content in areas such as hate speech, sexually explicit content, and child sexual exploitation before users see it. Similarly, YouTube’s Transparency Report shows that from April to June 2021 more than 94% of videos were removed via “automated flagging,” with the remainder taken down after reports from users, nongovernmental organizations (NGOs), trusted flaggers, or government agencies.
The use of automation to proactively identify and remove policy-violating content has been significantly increasing since the last half of the 2010s. Prior to this, most platforms minimally deployed automated detection and usually only in specific circumstances, such as the most obvious low quality spam, exact matches on known child sexual abuse material, and Digital Millennium Copyright Act violations. Both technological advances and greater public attention on contextual issues such as misinformation and hate speech have been major contributors to the expanded use of automation.
Efficiency is a critical operational objective of trust and safety teams. Wisely deploying limited review capacity, specialized knowledge, tools, and resources is essential for teams to deliver as much positive impact as possible. Automation technologies can improve efficiency in three primary ways: (1) by minimizing the number of human reviewers needed to make a decision; (2) by routing content or accounts that need to be reviewed to the most appropriate human reviewer (e.g., reviewers with the appropriate language skills or subject matter knowledge); and (3) by prioritizing reviews in the most impactful way possible.
Below are specific examples of how automation systems can be used to improve efficiency:
- Grouping duplicate user abuse reports into a single report so that content policy review teams don’t have to review the same content repeatedly.
- Grouping content that is similar or that comes from a common source such as a single user, so it can be reviewed together in a single bulk action.
- Grouping similar types of content and suspected policy violations into dedicated queues, so reviewers don’t have to spend as much time switching between different review types, content types, or policies.
- Automatically detecting the language of a piece of content, allowing content to be routed to the reviewer with the appropriate language expertise.
- Automatically removing content that is an exact or near exact match to content that has been previously deemed to violate policy.
- Identifying and prioritizing user reports needing review based on the probability of the content violating policy and severity of abuse.
- Extracting keywords or objects to aid in reviewer decisions about whether to remove, warn, limit viewability, or take other moderation measures.
Automated systems can also play a critical role in ensuring policy enforcement decisions are made the same way each time. Without consistency, users can become frustrated when the same or nearly identical piece of content or type of behavior is removed or penalized when shared by one user but permitted when shared by another. While in select cases there may be a rationale for such inconsistency (e.g., a policy change or additional context), in many cases inconsistency may be the result of multiple content reviewers reaching different decisions about the same case. Automated systems can help ensure the same content deemed to be policy-violating is removed across a particular platform consistently and that similar content is also actioned appropriately. This can be done via basic rule-setting, hash matching, or blocklisting (topics discussed in more detail later in this chapter).
There will always be a degree of subjectivity and specific edge cases within community guidelines, standards, and policies. These instances can lead to inconsistency, but they can also lead to inaccurate assessments or reviews. In some circumstances, automated systems can be more accurate and precise than human reviews. This is especially true when assessments of reviews require a large amount of data from an account’s behavior or activity. Automated systems, for example, may be used to detect botnets, fake accounts, or inauthentic engagement more accurately than human reviewers. Oftentimes, automated systems can provide much needed context or tip-offs for human reviewers or analysts who then can make more accurate assessments. However, if an automated system, such as a machine learning classifier used to detect sexually explicit content, is trained using incomplete, biased, or inaccurate data, such a system may have the opposite effect—unintentionally creating more inaccurate reviews. (AI bias will be discussed in more detail later in this chapter.)
Automated systems can also be leveraged for additional trust and safety purposes. For example, automated systems can be used to limit traumatic content a reviewer has to see. One way to do this is to reduce the level of graphicness and detail of the content by blurring images and video, or converting video to a series of screenshots. In many situations violations will still be evident and actionable, without exposing the reviewer to the entire piece of content in full detail. Such features can be switched on and off so the reviewer only has to view the full content when absolutely necessary.
Automated systems can also be used to monitor and manage the types and degrees of traumatic content to which reviewers are exposed. For example, such a system could be used to reduce the chance of reviewers having to see the same content repeatedly, limit extended exposure to specific policy violation types, or proactively suggest wellness breaks.
Additionally, automated systems can be used to build and improve upon other automated systems—creating a flywheel effect in which technology helps “train” new technology to more consistently, precisely, and quickly detect particular types of content. Other potential uses of automation include efforts to gather more data around a new trust and safety related issue or content area, or measure the prevalence of certain forms of content or behavior. These potential uses are nascent but highlight how automated technologies will continue to impact nearly all areas of trust and safety in the future.
In-depth Look: Automation and the Christchurch Attack
The March 2019 attack by a lone gunman at two mosques in Christchurch, New Zealand reveals both the efficacy and the limits of automated systems deployed by large tech companies hosting user-generated content. The video of the massacre, which was initially live streamed by the attacker on Facebook, was quickly re-uploaded by users on Facebook as well as other major platforms, most notably YouTube and Twitter. These companies used automated tools in different ways, including removing millions of re-uploads of the video. According to Facebook, of the 1.5 million re-uploads of the video it removed in the days following the attack, 1.2 million were blocked by automated systems at the moment of upload. When users searched for information about the attack, automated tools used by Google, YouTube, and other major platforms served content produced by news agencies rather than unverified and often misinformation-laden content produced and posted by individuals.
At the same time, however, automated systems struggled to prevent the viral spread of variations of the video. Within days of the video being live streamed, users made more than 800 distinct edits of the footage, according to The Guardian. YouTube noted that small modifications to the video, such as adding watermarks or logos to the footage or altering the size of the clips, made YouTube’s automated ability to detect and remove such videos more challenging. This led to both the proliferation of clips and variations of the video continuing to resurface on platforms months after the attack.
Since the 2019 massacre and the Christchurch Call to Action, many tech companies have adjusted their automated systems to more comprehensively detect harmful, policy-violating content and to limit the viral spread of similar content. They have also developed incident management protocols to respond to future instances. This in-depth look of the Christchurch attack highlights how automated systems are necessary but almost always insufficient in detecting and limiting the spread of policy-violating—and often deeply painful and harmful—content.