Common Types of Data in Trust & Safety

User Data

Many different types of user data can help trust and safety practitioners understand the health, trust, and safety of a given platform. Some of this data may overlap with common user experience metrics, and some may be more unique to trust and safety.

Metrics are regularly calculated over different time frames—in particular daily, weekly and monthly intervals. These different granularities allow monitoring and identification of different trends and events at different scales, as well as making it easier to provide summaries and reviews. Occasionally more complex intervals and metrics—such as rolling windows and moving averages—are also used to better showcase patterns in the data.

Data that may be beneficial to collect to yield knowledge and wisdom on platform problems:

  • Engagement rates. “Engagement” refers to how much users are actively interacting with the platform, but it may be defined differently depending on the platform or purpose. For instance, engagement may be operationalized as logging into an app, watching a video, or adding a comment. Regardless of how engagement is defined, T&S teams use these metrics for a variety of purposes, including but not limited to assessing how “viral” a misinformation post is, how many users may be exposed to harmful content, etc.
  • Number of active users. This metric generally refers to how many users are engaged, as mentioned above. Again, “active” may be defined differently, such as staying on an app for at least one hour each day, or logging into the platform at least five days a week. T&S teams may use this metric to assess the health of the platforms or potential impact of interventions (e.g., whether the number of active users increased after procedures are in place to increase the safety of the platform).
  • Return rates. This refers to percentages of activities or users coming back to certain activities or platforms. Any identifiable activity or behavior can be used to define a return metric—common examples include logins, visits to a page, or use of a feature. Sometimes, the reverse of a return rate—users who do not continue activities—is used instead; this is often referred to as feature churn or usage churn. 

Return rates / usage churn metrics can be used to assess the health of the platform and the effect of potential interventions on both the violating and non violating population. Different definitions of return measured over different timescales can provide detailed insight into whether users are having a positive experience, both in general and specifically when interacting with trust and safety processes.

  • Account deactivation rates. These metrics suggest users are leaving the platform, whether by actively deleting accounts or simply by no longer using them for an extended period. Account deactivation can be attributed to a variety of different factors related to Trust & Safety:
  • Individual or repeated negative interactions with users such as harassment or threats;
  • Failure of the platform to take or communicate actions promptly when a report is made;
  • Receiving a penalty that is perceived as unfair;
  • Large systemic events such as a major negative news story about the platform.

Account deactivation rates are sometimes referred to as account churn, customer churn, or simply churn.

Data that may be beneficial to collect to yield knowledge and wisdom on content trends:

  • Report rates can provide important insights into the level of abuse on a platform. In general, the number of violations reported by users should scale with the number and visibility of overall violations as well as how much users care about said violations being prevented.

However, there are other factors that affect report rates. Poorly defined or confusing policies may lead to users making incorrect reports in good faith. Additionally, policies that are misaligned with user expectations may result in more reports for non-violating behavior because users believe it should be actioned regardless of what the policy says. Reports may also be influenced by increases in controversial content (such as during elections), viral content, or abuse of the reporting system. Reporting systems that are complicated or difficult to use can also affect report rates.

Report rates do not always accurately reflect how much policy violation is taking place, particularly across different violation types. When a violation is hidden or restricted in some way, (such as users sharing illegal content through a private group), it’s reasonable to expect lower report rates than for widely visible violations such as spam and misinformation. Report rates can also depend on the perceived severity of violations, and how immediately obvious they are—for example, a well crafted fraud may be less likely to generate reports because it isn’t instantly recognizable as violating.

  • Sentiment and behavior metrics. Sometimes specific behaviors or signals, while not inherently violating policy, can be a sign of problems. These metrics can be used to judge the size of populations where policy violations are more likely to occur. A particularly well studied example is sentiment analysis—identifying general moods and opinions in content—where increases in anger or contempt can be used to understand or predict changes in the number of policy violations.

Other behavioral metrics can be more functional. For example, a surge in new search keywords and web traffic among old websites might indicate an increase in hacked sites where content is replaced with spam or ads. 

  • Network level metrics. Although network level metrics may be more prominent in a social network context, they are also relevant to other online platforms. Trends and patterns can be identified by comparing various potential identifiers including usernames and contact information with device ID, IP address, or payment method. This allows T&S professionals to proactively identify a network of scammers, spammers, and/or perpetrators of other abusive behaviors on their platform. 

Content Moderator Data

Performance Data

Measuring the performance in content moderation, both of systems and staff, requires data related to the reviews and processes they are using. There are many criteria used in evaluating performance, some of which include:

  • Accuracy refers to how consistently the review decision is the correct decision (that is, the decision that the platform would ideally want to be taken), usually based on the relevant policy. It’s also common to monitor sub-metrics like the number of false positives and false negatives to capture information about different types of mistakes that may be more or less serious.
  • Review time refers to the amount of time taken to complete a review. This may include time waiting in queues or going through automatic processes, or only the time when the reviewer is actively working on a specific review.
  • Review volume refers to the number of reviews completed successfully.
  • Consistency is how often reviewers agree with each other about the same decision in cases where we have multiple reviewers per single report, which can help separate systemic issues such as policy or training problems from individual mistakes. This can be measured during quality reviews or when review decisions are made by consensus.
  • Action rates refers to the number and percentage of each specific action which can indicate if the team is penalizing too weakly or aggressively. It is also important to consider this type of data within the broader context and over time. For instance, a spike in action rates for a specific abuse may simply reflect the seasonality of certain behaviors (e.g., misinformation prior to an election season).
  • Skip / pass rates are the rates at which reviews are skipped, reassigned, or otherwise left incomplete. This can reflect particularly difficult policy decisions, but also issues like incorrect language identification, review tool bugs, or necessary information not being available.

Learn more about enforcement and operational performance metrics in the Operations and Content Moderation chapter.

Wellness Data

Given the nature of content moderation, it is important to proactively and continuously monitor content moderator wellbeing and provide preventive care. To empirically and reliably assess content moderator wellbeing levels, research-validated psychometrics may be adopted. Industry experts have written extensively on this topic, and therefore we will refer to the resource here. To select psychometrics tailored to a given platform and set of tasks, it is common to consult with a licensed psychometrician and/or psychologist.

Depending on whether or not an internal wellness team is available to support content moderators, wellness data such as  service usage and employee service satisfaction may also be available. Industry experts generally find it useful to create dashboards to reflect the real-time service usage (e.g., the number of 121 counseling sessions requested, the number of attendants for group sessions), the reason for seeking service (e.g., personal, work, content impact etc.), and employee feedback post service. Issues such as access control, privacy, and ethics of displaying such sensitive data should also be considered when collecting and analyzing wellness metrics.

People Metrics

Although content moderator people metrics may largely overlap with employee people metrics, it is critical to specifically consider content moderator people metrics. Depending on a variety of factors (e.g., geographical culture, company culture), content moderators may not explicitly voice help when they are not faring well. However, metrics such as attendance, absenteeism, unplanned leave, and attrition should be considered in tandem with the wellness metrics noted above to gain a broad understanding of content moderator wellbeing.