Metrics for Content Moderation - Trust & Safety Professional Association

To measure and control the health and effectiveness of content moderation efforts, operations teams regularly monitor a variety of metrics that reflect the different goals of content moderation. While many of these metrics are extremely common, there is considerable industry variation in both precise definitions and naming conventions, including the same or similar names being used for different metrics. Please use caution when looking at terminology from different platforms.

Enforcement Volume Metrics

Enforcement volume metrics are the most basic metrics used in moderation operations, and represent counting events that are part of the moderation process. These metrics are cheap and easy to create as they have low technical requirements and don’t require any additional reviews or actions from reviewers beyond regular enforcement. As a result, they are usually among the first metrics implemented in moderation operations.

Incoming

Incoming captures the volume of content flagged for review, either through user reports or proactively by a platform’s detection mechanisms. It generally represents the number of reviews that the platform has decided to or has been requested to perform.

Incoming acts as the supply of potential reviews made available for reviewers to evaluate. As such, it should generally be kept balanced with available review resources. Falling out of balance can result in reviewer time being wasted when there is no content to review, or delays and failures to review potentially serious problems.

The number of user reports can be influenced by altering the reporting process – for example, creating friction by requiring more detail from the reporter. Proactive incoming volumes can be changed by simply raising or lowering the threshold for a review. However, while incoming volumes are relatively easy to control, any changes are likely to have substantial knock-on effects on other metrics and on users.

Closes

Closes capture the volume of content that was closed by a platform’s content moderation system, either through manual review or automated decisions. The exact definition of what counts as a close can vary. A close may require a concrete decision that content is or is not violating, or it may include situations where content does not need review such as a previous review or the content already being deleted.

These metrics help platforms assess demand and how their capacity is being distributed and managed so that they can plan supply accordingly.

Actions

Actions represents the number of decisions where an act of moderation was made as a direct result of the decision. Unlike closes, which include decisions that content was non-violating, actions only include those decisions where some material action such as removal of content was taken. Actions will also regularly be monitored through action rate, the percentage of closed volumes that receive an action.

High action rates are often desirable as they can indicate that review efforts are being spent on decisions that materially affect the platform and its users.

Increasing action rates involves decreasing the amount of non-violating content coming in for review. This can be achieved by many methods including improving classifiers and other proactive detection systems, or educating users to improve the quality of user reports.

Actions metrics are also regularly used in public reporting. Reporting that a large number of accounts or pieces of content have been removed is a simple and effective way to communicate to users, regulators, and the public that moderation efforts are actively working to protect users and penalize bad actors, and to convey the large scales often involved in such efforts.

Actions and action rate are frequently broken down into different actions and action types. For example, it is common to separate actions where content was removed from a platform from actions where content was restricted such as applying a warning label. This helps to prevent confusing or misleading comparisons between severe violations – particularly rarer ones – and more common and less severe violations of policy.

Time Based Metrics

Time based metrics are based on the amount of time taken to perform various parts of the content moderation process. These metrics are derived from event logs – lists of events that occur as part of the moderation process. While these don’t require any additional review effort, they are more complex to implement and maintain than volume metrics.

Time based metrics are often monitored both as a total figure for the amount of time taken, and also as means and medians. It is typical for reviews and processes on different types of content and with different levels of scrutiny to take substantially different amounts of time, so averages are mostly used on subsets where reviews are broadly similar to ensure fair comparisons and reduce volatility.

Time based metrics are also commonly monitored based on what percentage of cases exceed a predefined target time. These targets are usually included as Service Level Agreements (SLAs) and are used to define expectations for review and operations teams of how long a task should take or the maximum acceptable time for a process to be completed.

Any part of a moderation process with defined and recorded start and end times can have a time metric. Special handling may be required when these times are not permanently fixed – for example, if a review can be reopened after being closed.

Review Time

Also known as handling time, this metric captures the time taken to review and make a decision on each piece of content. This metric is helpful for platforms to understand the amount of time being spent on review by content moderators. This is primarily relevant to reviewer performance and to forecasting moderation demand and supply, but can also affect response times.

Review time usually refers to the time taken for a single decision by a single reviewer. However, sometimes review time can also refer to the total amount of review time spent on a piece of content which may include escalations, quality checks, and other multi-stage review processes. This is often referred to as total review time.

Some of the common factors that can influence review time include:

Clear guidelines on how to perform the review process in a way that balances speed with quality. Examples of this may include how much of a long piece of content like a video or website to review, how far back to go in user history, whether to check external links, and which information is in and out of scope.
Highlighting relevant information such as keywords in large paragraphs or timestamps in videos so reviewers can quickly identify the most relevant information.
Simplifying the decision process by keeping the number of steps and options low and creating clear separation between categories.
User interface improvements such as keyboard shortcuts, buttons with appropriate sizing and spacing, and clear visual feedback.

Response Time

Response time represents the time between a report being made and a completed decision about whether the content is violating. Response time is a more user aligned metric that better captures the user experience of how long it takes for a report to be dealt with.

Response time can be affected by slow handling in a similar way to review time, but in most circumstances the majority of response time is time spent queuing for a review. Response time may also include the time required for automated processes such as classification and prioritization, and routing between queues.

Response time is often a particular focus for regulators, as it provides an externally visible check on how quickly. Legal requirements like Germany’s Netzdg law sometimes require action be taken within a certain time of a complaint being made, and report time is regularly used to monitor and ensure compliance.

Improving response time can be achieved by increasing the number of reviews completed through additional staff and resources or quicker reviews, or by decreasing the number of incoming reviews to complete. Both options reduce both the average and maximum time spent queuing. Distributing review operations around different time zones to increase hours of operation can also improve response times, as can removing any general problems that could cause blocks or delays such as technical bugs.

Time to Action

Also known as velocity, this metric represents the time between content being uploaded or created and a completed decision about whether the content is violating. Time to action is broadly similar to response time, except it also covers the time taken for content to be identified or reported.

Time to Action is a useful metric for measuring the potential for harm to users. Content that is taken down quickly has less opportunity to be shown or shared. However, it is less directly influenceable than response time as it often relies on prompt reporting by users. In some verticals, particularly those that are secretive or insular, both volume and speed of user reporting can be significantly lower.

Quality Metrics

Quality metrics are generally based on re-checks of previous reviews by either the existing review teams, subject matter experts, or dedicated quality reviewers. As a result, quality has many metrics that are the same or similar to enforcement based on volumes of reviews incoming and closed, rates of various decisions, and time taken.

Error Rate

Error rate is the percentage of decisions where an incorrect decision is made, based on evaluation through a quality process. This metric is also often formulated as quality rate, the percentage of decisions where a correct decision is made.

Errors in quality are regularly broken down into the four categories of error highlighted previously in the Quality section – False Positives, False Negatives, Wrong Selection, and Technical Error. These subsets are particularly important if the cost of a specific type of error is extremely high, and it may be preferable to keep those specific errors to a minimum even if it means a lower quality rate overall.

As an example a False Negative when there is an imminent risk of serious physical harm is extremely undesirable. In such situations it may be worth accepting a higher False Positive rate by encouraging reviewers to err on the side of taking action, even if that results in multiple False Positives for every False Negative.

Consistency

Consistency is a metric applied when the same content is reviewed two or more times by regular review processes, and is a measure of how often mismatches occur. This can be based on the number of reviews that disagree with the majority or the number of times when there is any disagreement.

Consistency is similar to other quality metrics in that it provides information about potential mistakes. The primary difference is that consistency is more focused on whether a review process will reliably produce the same decision. Consistency is more likely to be affected by vague or subjective policies, differences in training between teams or locations, and variation in reviewer experience and ability.

Appeals Metrics

As with Quality, Appeals metrics involve re-checks of previous reviews by either the existing review teams, subject matter experts, or dedicated quality reviewers. As a result, appeals has many metrics that are the same or similar to enforcement based on volumes of reviews incoming and closed, rates of various decisions, and time taken.

Overturns and Overturn Rate

Overturns represents the total volume of successful appeals – appeals where the original decision that a violation has occurred is overruled. Overturn volume is often used to calculate overturn rate, which represents either the percentage of all content decisions that are overturned or the ratio of overturns to actions.

Overturn can be a difficult metric to interpret, and will usually require context from other metrics such as incoming and closed appeals..A high overturn rate can reflect an effective and efficient appeals process, but can also reflect a large number of mistakes. Similarly, a low overturn rate can suggest that decisions are mostly correct, or alternatively that users have given up attempting to appeal.

A common variation on overturns are revokes. These are broadly similar, but may also include decisions where the original decision was made correctly but the action still needs to be changed or removed. Reasons for this might include a hacked account that has been restored to the rightful owner, violating content that the user has deleted, or a change in policy that means content that was actioned according to older rules is no longer violating.

Some platforms also allow users who submitted a report to appeal a decision that the content is non-violating. When these decisions are successfully appealed, they may be reported separately, included in overturn rate, or included with other enforcement reviews.

Successful Appeal Rate

Successful appeal rate represents the percentage of appeal requests that were successful. In addition to fulfilling many of the same roles as action rate for appeals processes, this metric is also useful for monitoring user perceptions about platform policies and appeals processes.

For example, if users regularly appeal unsuccessfully in a particular vertical, it may reflect that the reasons for the original decision have not been explained clearly enough, or that users disagree with a particular policy or how it is enforced. This is particularly likely if users attempt to appeal multiple times for the same content or issue.

Time to Resolution

Time to resolution is a metric that represents the time taken for an appeal to be successful. This is similar to response time in enforcement, except it includes time taken for multiple appeal attempts and communication efforts between the platform and the user.

Note: Time to resolution is mostly used in areas where the owner or maintainer affected by the violation is not acting maliciously. One example of this is a forum that becomes overrun with spam links, and gets removed from search engines as a result. In circumstances like these it is generally desirable to clearly communicate any outstanding problems to the user and resolve appeals quickly to help restore their content as fast as possible. Time to resolution can help measure how effectively this is done.

Other Metrics

Some metrics that are relevant to content moderation are less directly tied to day-to-day operational decisions. While still important, these metrics may be influenced by many other factors, particularly in large organizations with diverse workflows, volatile populations, or high levels of automation. Different operations teams may have different levels of control or responsibility for these metrics, though they are very likely to influence how operations are run.

Prevalence

Also referred to as abuse rate, violation rate, or other similar terms, this metric represents the overall rate of policy violations on the platform. This is normally based on a random sample of all content on the platform. Prevalence is one of the closest possible metrics to the primary goals of moderation, representing an estimate of all violations that exist at time of measurement.

Prevalence is an expensive metric to generate. It requires high quality reviewing of samples of content from the overall population, while generating little or no enforcement value. Larger and larger sample volumes are needed to provide statistically useful measures for rare violations, for splits such as language or country, or for short timescales such as every week or day.

Some sampling methodologies such as stratification or consistency sampling can potentially be used to reduce sample sizes, or to structure. However, this increases technical complexity of the sampling process and calculations, and there is a limit to the savings that can be made. As a result prevalence should be regarded as a premium metric – important and powerful, but at a high cost.

Cost

Cost represents the financial outlay required to perform content moderation. Cost is usually derived from time based metrics like review time and the compensation rate of the reviewers. From these, metrics for the financial cost of closing and actioning different types of review can be calculated.

Cost metrics can be useful in financial and business planning. However, they often have significant practical limitations. Correctly assessing and assigning the costs of business overhead can be challenging, as can working out the direct and indirect costs of making mistakes or failing to perform a review, or time lost due to wastage or technical problems.

Using cost metrics can also create problems with ethical decisions and their perception. As a practical matter no operations team has an infinite budget, and decisions on where and how to assign staff and resources must be made. Despite this, assigning a dollar cost to specific reviews or verticals can create an uncomfortable dynamic between financial motivations and the protection of users, and conflict around the relative importance of addressing one issue over another.

Impressions

Impressions represent the number of times a piece of content was shown. Impressions are often already measured in online platforms as they are widely used as a metric for audience engagement and reach. as well as being required for billing in areas like advertisement.

Measuring impressions can be useful for prioritizing and focusing review efforts on the content affecting the largest number of users. Impressions can also be used as a way of measuring the effect and impact of enforcement efforts on users, which provides a contrast to metrics based on monitoring workflows such as closed volumes.

However, this metric can sometimes be misleading. Catching violating content before it reaches a wider audience, while undoubtedly a better outcome, may not look as impactful as catching it afterwards. Also, when accounts, sites or groups that contain a mix of violating and non violating content are taken down it can be difficult to determine how many impressions were from each.

It is also possible to estimate the amount of impressions that content would have received had it not been actioned. This can be done through forecasting models based on historical data, though these can be volatile and inaccurate. An alternative method is for a platform to rerun content presentation algorithms but with moderation actions ignored, identifying what content would have been presented to users had no moderation occured. This can increase accuracy but is often technically complex and expensive.

Using impressions as a metric is generally only effective for violations where the harm comes from being shown to large numbers of users such as spam or various offensive content violations. In areas like human trafficking where bad actors are actively trying to avoid wider visibility and exposure, impressions is a poor measure of the harmful impact involved.

Acknowledgements

Authors│Harsha Bhatlapenumarthy, James Gresham
Contributors│Jan Eissfeldt, Elissa Emory, Amanda Menking, Nivedita Mishra
Special Thanks│Sarah Banks, Kate Gotimer, Martin Le Bars, Colleen Reding Mearn, Mackenzie Tudor, Charlotte Willner

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.