Defining & Measuring Success

As with every other team in a business, it’s important to develop and track metrics to gauge the impact of investigations and intelligence teams’ work. Failure to define and measure success would result in teams producing unclear outcomes and developing unfocused work processes that do not necessarily improve bottom-line user safety. Before designing team-level or operational metrics, investigations and intelligence teams should identify which guiding-star metrics they need to align to. These metrics can be based on wider business outcomes, or wider trust and safety outcomes. Some examples of guiding-star metrics may include:

  • Reductions in risk, or costs (financial, regulatory);
  • Volume or scale of regulatory or media escalations;
  • User sentiment scores, or user reporting aggregated feedback;
  • Feedback from user / safety advisory councils;
  • Violation rates for a platform’s Terms of Service or Community Guidelines.

Although other teams within trust and safety can use numerical data points (such as the amount of takedowns on violating content or user reports handled) to measure an organization’s effectiveness in mitigating abuse, an investigations team’s level of success does not depend on how many actions have already been taken. Instead, an investigation’s team effectiveness depends on how well it can prepare an organization to handle known and unknown threats to its customers. 

A successful investigative team must first have a solid investigation management strategy before it can start taking on investigation requests from internal and external parties. Many team leaders may jump straight to the following actions to establish strategy:

  • Select existing ops analysts (such as escalation specialists, content moderators, market specialists) to become investigations analysts;  
  • Create intake channels to receive investigation requests from internal or external parties;
  • Set up alerts to make sure its analysts have enough leads to a potential threat to start conducting an investigation;
  • See how many investigation reports and enforcement-related metrics are taken before deciding whether additional staff, skills, and resources such as logging and monitoring vendors, systems, or tools are needed; 
  • Identify and clarify if the investigation team will run long term/deep investigations, or act as incident responders for high value attacks.

However, an organization will benefit longer term from setting clear scope, objectives, and expectations before assigning  analysts to start collecting and analyzing threats. This prevents an investigations team from declining due to employee burnout (which will be explained in more detail in the following section).

To determine whether quantitative, qualitative, or both types of metrics are useful for developing an investigation management strategy, an organization and its leaders need to answer the following questions:

  • What type of product or platform abuse exceeds what current ops teams, systems, processes, data, and infrastructure can capture?
  • How prepared is the organization currently to handle and mitigate this abuse in terms of access to data needed to investigate the abuse, or amount of personnel dedicated and needed to handle the abuse?
  • What are the negative consequences that an organization will face if they do not take any action? 
  • How much time does an organization have before it starts facing the above-mentioned negative consequences?

Once an organization has answers to these questions, then it will have an easier time making staffing decisions for an investigative and intelligence gathering function within its Trust and Safety team. 

An Example of Defining & Measuring Success

Below is an example of how a fictional company and its leadership may approach these questions.

An Example: Love at First Loaf

Love at First Loaf offers a service where users can upload a photo to its website for the company to print out bread loaves with the user’s uploaded image printed on each slice. The user can submit a mailing address for the recipient to receive their loaves, or gift its customized bread loaves by physical mail and shipping. 

A couple of months after Love at First Loaf started, the company’s customer support team received a complaint that a religious group repeatedly received bread loaves from an unknown user that had offensive imagery and slurs printed on the bread slices . The religious group does not know how to prevent this individual from sending them bread, and the company does not know how to moderate what type of imagery someone can upload to its site. Love at First Loaf wants to dedicate more time and staff to find out who is making these orders. So, they plan their strategy:

  • What is Love at First Loaf’s guiding-star metric specific to this team’s strategy?
    • Customer Satisfaction, or Brand Reputation Score
  • What type of product or platform abuse exceeds what current ops teams, systems, processes, data, and infrastructure can capture?
    • A user’s ability to upload offensive imagery when making a bread loaf request
  • How prepared is their organization currently to handle and mitigate this abuse in terms of access to data needed to investigate the abuse, amount of personnel dedicated and needed to handle the abuse?
    • Partially prepared. The company is able to dedicate two staff members from the customer support team to investigate this issue with no set deadline from the customer to complete the investigation. But Love at First Loaf lacks the following to handle and mitigate current and future abuse:
      • Policies that allow a company to prevent a certain customer from being able to make orders;
      • Policies and processes that allow someone like the religious group to opt out of receiving loaves from their company; 
      • Previous reports from users that describe experiencing a similar type of abuse; 
      • Lack of internal data about how frequent this abuse occurs. 
  • What are the negative consequences that an organization will face if they do not take any action?
    • More complaints from users for the same type of abuse, which can cause:
      • Reputational loss from users being harassed by offensive bread loaves; 
      • Revenue loss from users refusing to use Love at First Loaf’s services, or advertisers pulling out of promoting Love at First Loaf;
      • News coverage about the issue, which can further reputational and revenue loss while also potentially inciting others to commit the same or additional abuse types.
  • How much time does an organization have before it starts facing the above-mentioned negative consequences?
    • Undetermined. Currently the organization only knows of this current complaint, and the investigation requestor has not set a strict deadline for completion.  
    • Given there are no safeguards that prevent the user from uploading offensive imagery or safeguards to refuse the order from being processed, the consequences can happen at any moment in the future.

Notice in the above example that Love at First Loaf now has a clearer understanding of what resources, policies, and processes they need before they can use metrics to evaluate their progress. For maturing and established investigative teams, they can use metrics similar to the ones that cybersecurity teams within Incident Response can use to monitor and evaluate the prevalence of abuse, reduction in abuse over time, and level of compliance with existing policies and procedures. These metrics can demonstrate the level of thorough analysis and findings uncovered in an investigation, and how an organization can position itself to prevent certain types of abuse. 

Unfortunately, many of the challenges that trust and safety professionals face (such as fraud or threats of harm) cannot be entirely eliminated due to crime and abuse being a part of human nature. Instead, investigative teams and other intelligence gathering groups can use metrics to determine how to make abuse more manageable and preventable, not necessarily erasable. 

A Guide to Metrics

Below is a guide on how various metrics can be used to track an investigative team’s progress, or to capture key findings in an investigation report as outlined earlier in this chapter. Ultimately, it’s the investigation management team and the organizational leadership’s decision on how to use metrics for maintaining investigative functions and progress. (Please note that the definitions in the table below may differ from traditional definitions of data points used in other fields and industries.)

Data PointPurposeMeasuringData CollectorUse Cases
Time to Acknowledge/ Triage (TTA), or Mean Time to Acknowledge (MTTA)Calculate the amount of time between when an alert was acknowledged to when someone started working on the investigation

Can be absolute or consolidated as an average
Efficiency/timeliness in uncovering a potential or known issue from a user, stakeholder, or third party 

Efficiency/timeliness in responding to a known issue
Investigations teams in the short term for newly discovered or identified abuses

Data science or engineering teams in the long term for overall metrics tracking so investigations teams have more time to uncover abuses proactively
Use TTA/MTTA to calculate how quickly the team can respond to an alert. The team can then find ways to consolidate certain alerts, balance out their schedules, or hire more staff.

High TTA/MTAA may mean:

The team is suffering from alert fatigue and is burnt out;

There are more false positive alerts than actionable alerts;

There are too many alerts that have been categorized as high priority;

The team may be handling one or more complex cases that could warrant its own investigation instead of treating them as individual alerts.
Time to Detect (TTD)Calculate the amount of time when a team uncovered an issue from when it occurred

For an investigations team, this could be the amount of time an investigation takes because not all abuse types can be “resolved” or completely prevented
Efficiency or timeliness in uncovering a potential or known issueInvestigations teams in the short term for newly discovered or identified abuses

Data science or engineering teams in the long term for overall metrics tracking so investigations teams have more time to uncover abuses proactively
Use TTD to assess how long it takes for an investigator or team to identify an issue in a certain domain. 

If the TTD is high, it can mean that the team lacks the resources, data access policies, a clear chain of command, or knowledge to uncover an issue. It can indicate an opportunity to train current investigators, recognize an investigator’s good work or domain expertise, or provide investigators more access to data and automated resources to help them become more proficient and efficient.
Time to Contain (TTC)Calculate the amount of time between TTD and when the investigation team started taking any enforcement actions or other measures to mitigate an issue

This metric more accurately describes how a Trust and Safety team may handle platform abuse compared to Time to Resolve (TTR)
Efficiency or timeliness in uncovering a potential or known issue 

Quality of an investigative report – generally if the initial report is comprehensive enough to help an organization take action quickly, then the TTC becomes shorter, and is correlated with high quality in investigative reporting
May be hard for any team to capture because this depends on how frequent a type of abuse occurs to have a more holistic or average metric

Data science or engineering team would be better to track this than an investigations team to prevent bias
An investigations team can compare TTCs from the time a team first learned of the abuse or unusual behavior to future instances of when the team contained the abuse to measure how much the organization has adapted to handling a known abuse or behavior. 

A lower TTC over time can indicate how  aware an organization has become in handling a certain problem, or how the problem became less complex to handle thanks to improvements in operations, team knowledge, and resources allocated to handle the issue. 
Number of reports/alerts about an issueCalculate how many alerts about the same type of issue or abuse is known to the organizationScale of abuse 

Frequency of abuse

Existing volume or changes in volume 

Potential risk to an organization if the issue continues
Investigations teams for unknown abuse types

Data science teams for known abuse types
Take the number of alerts and break them down into:


Which alerts were actionable, which were false positives, and which had insufficient information to warrant further action?

Which abuses receive more/less alerts than others? Why?

Which alerts may require additional review or processes from other teams that can cause the MTTD or MTTA to increase/decrease?
Number of users or accountsIdentify the number of people involved

Can range from actual users of a platform or service, one user and its followers (or mutual users linked to one user using the same platform or service)
Scale of abuse

Potential risk to an organization and the communities it serves
Investigations teams for unknown abuse types

Data science teams for known abuse types
Break down user counts based on users impacted by the abuse identified in the alert, what types of users are the impacted users, what other types of users may be vulnerable/not vulnerable to the abuse.
Number of place/countries involved or impactedCan be specified at the local, regional, national, international or global levelsScale of abuse

Potential risk to an organization and the communities it serves

Amount of policies and procedures that may need to be implemented to be in compliance
Investigations, policy, and/or corporate communications teamsIf the organization doesn’t have a dedicated database for this metric, consider identifying where to pull this information so the team can find any trends between user, a user’s behavior, and a location over time. Ideas for pulling this information can be:

Location as listed on profile;

Time zone;

User’s IP logs;

Billing information.

Over time, the team can break down risk as it relates to a country, its laws, its demographics, typical product usage and user behaviors for that country.
Amount of external reporting about an issueDifferent from TTA where the alert is meant to prompt someone to take action

Could be number of news articles, watchlist, minutes of video, audio, or other news recordings that mention or cover the issue in more detail from an informative perspective
Potential risk to an organization 

Amount of policies and procedures that may need to be implemented to be in compliance
Investigations, policy, and/or corporate communications teams; third party groups, NGO partners, and/or think tanksCalculate this metric to explain to other teams or organization leadership about the significance or public concern for a certain issue and why the team needs more resources, staff, or higher prioritization from higher up leadership to take more action on an issue.
Number of entry points or attack vectors for a certain type of abuse to occurIdentify how many points (e.g., devices, login methods, methods of access attempts) need to be monitored to prevent a certain type of abuseScale of abuse

Amount of maintenance required to uphold a policy or process to mitigate a type of abuse
Investigations team to uncover the entry points/attack vectors

Policy team to confirm which entry points are manageable or enforceable
Calculate these to give to product teams what entry points or attack vectors can be mitigated with automated alerts and processes, public education and awareness, or product developments. 
Dates and times of re-reviewIdentify when a known abuse type was investigated again after initial enforcement or mitigative actions took placeLevel of compliance 

Risk reduction

Recidivism (return of offending behavior after actions were taken) from a known abuse type and its previous mitigative actions taken
Investigations teamsThis metric can be used together with MTTA, MTTD, and MTTC to assess how the investigations team evolves over time and how it handles known abuse types.
Number of bugs, feature requests, policy changes, or process changes occurred as a result of an investigationCalculate how many changes an organization made

These numbers can also be tracked over time, not just immediately after the first iteration of an investigation for a newly identified or known abuse type
Level of impact the Trust and Safety team made on an organization 

Level of compliance

Amount of costs, headcount, or other resources used or needed in the future to manage an issue
Product, engineering, data science, and/or policy teams; management or senior leadershipUse this metric to highlight the complexity and scale an issue has on its organization and how many resources are used or needed to handle the problem. 

This will give an organization a better understanding of how much impact an investigation has in calling for improvements across one or multiple areas.
How many internal teams or external parties were involved in an investigationCalculate how many personnel were involved in handling an investigation

This can be tracked over time to see if fewer or more people are needed to mitigate and prevent a certain type of abuse
Scale of abuse

Level of impact the Trust and Safety team made on an organization 

Amount of costs, headcount, or other resources used or needed to manage an issue
Investigations teamsUse this metric to quantify how much staff or public involvement was part of an investigation to evaluate the scale/impact a threat has on the organization.
Scale/degree of averted crisis or escalationCalculate the magnitude and costs of a crisis or escalation that would have happened, if not for the detection and mitigation work of the investigations / intelligence teamFinancial penalties, brand reputation damage, headcount or legal resources expended to mitigate crisis/escalationLegal and/or public policy teamsUse this metric to explain how effectively an organization mitigated risk to compare how the organization would fare if it took no action, partial action, or all actions based on an investigation report’s findings.
Change/ reduction in subsequent user reports about the threat vector in questionCalculate the change in the number or rate of user reportsScale of abuse

Change (increase or reduction) in abuse after problem identification and mitigation
Data science teamsCombine this metric with MTTA, MTTD, and MTTC to see whether a threat has become mitigated or better managed over time.