Defining & Measuring Success - Trust & Safety Professional Association

As with every other team in a business, it’s important to develop and track metrics to gauge the impact of investigations and intelligence teams’ work. Failure to define and measure success would result in teams producing unclear outcomes and developing unfocused work processes that do not necessarily improve bottom-line user safety. Before designing team-level or operational metrics, investigations and intelligence teams should identify which guiding-star metrics they need to align to. These metrics can be based on wider business outcomes, or wider trust and safety outcomes. Some examples of guiding-star metrics may include:

Reductions in risk, or costs (financial, regulatory);
Volume or scale of regulatory or media escalations;
User sentiment scores, or user reporting aggregated feedback;
Feedback from user / safety advisory councils;
Violation rates for a platform’s Terms of Service or Community Guidelines.

Although other teams within trust and safety can use numerical data points (such as the amount of takedowns on violating content or user reports handled) to measure an organization’s effectiveness in mitigating abuse, an investigations team’s level of success does not depend on how many actions have already been taken. Instead, an investigation’s team effectiveness depends on how well it can prepare an organization to handle known and unknown threats to its customers.

A successful investigative team must first have a solid investigation management strategy before it can start taking on investigation requests from internal and external parties. Many team leaders may jump straight to the following actions to establish strategy:

Select existing ops analysts (such as escalation specialists, content moderators, market specialists) to become investigations analysts;
Create intake channels to receive investigation requests from internal or external parties;
Set up alerts to make sure its analysts have enough leads to a potential threat to start conducting an investigation;
See how many investigation reports and enforcement-related metrics are taken before deciding whether additional staff, skills, and resources such as logging and monitoring vendors, systems, or tools are needed;
Identify and clarify if the investigation team will run long term/deep investigations, or act as incident responders for high value attacks.

However, an organization will benefit longer term from setting clear scope, objectives, and expectations before assigning analysts to start collecting and analyzing threats. This prevents an investigations team from declining due to employee burnout (which will be explained in more detail in the following section).

To determine whether quantitative, qualitative, or both types of metrics are useful for developing an investigation management strategy, an organization and its leaders need to answer the following questions:

What type of product or platform abuse exceeds what current ops teams, systems, processes, data, and infrastructure can capture?
How prepared is the organization currently to handle and mitigate this abuse in terms of access to data needed to investigate the abuse, or amount of personnel dedicated and needed to handle the abuse?
What are the negative consequences that an organization will face if they do not take any action?
How much time does an organization have before it starts facing the above-mentioned negative consequences?

Once an organization has answers to these questions, then it will have an easier time making staffing decisions for an investigative and intelligence gathering function within its Trust and Safety team.

An Example of Defining & Measuring Success

Below is an example of how a fictional company and its leadership may approach these questions.

An Example: Love at First Loaf

Love at First Loaf offers a service where users can upload a photo to its website for the company to print out bread loaves with the user’s uploaded image printed on each slice. The user can submit a mailing address for the recipient to receive their loaves, or gift its customized bread loaves by physical mail and shipping.

A couple of months after Love at First Loaf started, the company’s customer support team received a complaint that a religious group repeatedly received bread loaves from an unknown user that had offensive imagery and slurs printed on the bread slices . The religious group does not know how to prevent this individual from sending them bread, and the company does not know how to moderate what type of imagery someone can upload to its site. Love at First Loaf wants to dedicate more time and staff to find out who is making these orders. So, they plan their strategy:

What is Love at First Loaf’s guiding-star metric specific to this team’s strategy?
- Customer Satisfaction, or Brand Reputation Score
What type of product or platform abuse exceeds what current ops teams, systems, processes, data, and infrastructure can capture?
- A user’s ability to upload offensive imagery when making a bread loaf request
How prepared is their organization currently to handle and mitigate this abuse in terms of access to data needed to investigate the abuse, amount of personnel dedicated and needed to handle the abuse?
- Partially prepared. The company is able to dedicate two staff members from the customer support team to investigate this issue with no set deadline from the customer to complete the investigation. But Love at First Loaf lacks the following to handle and mitigate current and future abuse:
  - Policies that allow a company to prevent a certain customer from being able to make orders;
  - Policies and processes that allow someone like the religious group to opt out of receiving loaves from their company;
  - Previous reports from users that describe experiencing a similar type of abuse;
  - Lack of internal data about how frequent this abuse occurs.
What are the negative consequences that an organization will face if they do not take any action?
- More complaints from users for the same type of abuse, which can cause:
  - Reputational loss from users being harassed by offensive bread loaves;
  - Revenue loss from users refusing to use Love at First Loaf’s services, or advertisers pulling out of promoting Love at First Loaf;
  - News coverage about the issue, which can further reputational and revenue loss while also potentially inciting others to commit the same or additional abuse types.
How much time does an organization have before it starts facing the above-mentioned negative consequences?
- Undetermined. Currently the organization only knows of this current complaint, and the investigation requestor has not set a strict deadline for completion.
- Given there are no safeguards that prevent the user from uploading offensive imagery or safeguards to refuse the order from being processed, the consequences can happen at any moment in the future.

Notice in the above example that Love at First Loaf now has a clearer understanding of what resources, policies, and processes they need before they can use metrics to evaluate their progress. For maturing and established investigative teams, they can use metrics similar to the ones that cybersecurity teams within Incident Response can use to monitor and evaluate the prevalence of abuse, reduction in abuse over time, and level of compliance with existing policies and procedures. These metrics can demonstrate the level of thorough analysis and findings uncovered in an investigation, and how an organization can position itself to prevent certain types of abuse.

Unfortunately, many of the challenges that trust and safety professionals face (such as fraud or threats of harm) cannot be entirely eliminated due to crime and abuse being a part of human nature. Instead, investigative teams and other intelligence gathering groups can use metrics to determine how to make abuse more manageable and preventable, not necessarily erasable.

A Guide to Metrics

Below is a guide on how various metrics can be used to track an investigative team’s progress, or to capture key findings in an investigation report as outlined earlier in this chapter. Ultimately, it’s the investigation management team and the organizational leadership’s decision on how to use metrics for maintaining investigative functions and progress. (Please note that the definitions in the table below may differ from traditional definitions of data points used in other fields and industries.)

Data Point	Purpose	Measuring	Data Collector	Use Cases
Time to Acknowledge/ Triage (TTA), or Mean Time to Acknowledge (MTTA)	Calculate the amount of time between when an alert was acknowledged to when someone started working on the investigation Can be absolute or consolidated as an average	Efficiency/timeliness in uncovering a potential or known issue from a user, stakeholder, or third party Efficiency/timeliness in responding to a known issue	Investigations teams in the short term for newly discovered or identified abuses Data science or engineering teams in the long term for overall metrics tracking so investigations teams have more time to uncover abuses proactively	Use TTA/MTTA to calculate how quickly the team can respond to an alert. The team can then find ways to consolidate certain alerts, balance out their schedules, or hire more staff. High TTA/MTAA may mean: The team is suffering from alert fatigue and is burnt out; There are more false positive alerts than actionable alerts; There are too many alerts that have been categorized as high priority; The team may be handling one or more complex cases that could warrant its own investigation instead of treating them as individual alerts.
Time to Detect (TTD)	Calculate the amount of time when a team uncovered an issue from when it occurred For an investigations team, this could be the amount of time an investigation takes because not all abuse types can be “resolved” or completely prevented	Efficiency or timeliness in uncovering a potential or known issue	Investigations teams in the short term for newly discovered or identified abuses Data science or engineering teams in the long term for overall metrics tracking so investigations teams have more time to uncover abuses proactively	Use TTD to assess how long it takes for an investigator or team to identify an issue in a certain domain. If the TTD is high, it can mean that the team lacks the resources, data access policies, a clear chain of command, or knowledge to uncover an issue. It can indicate an opportunity to train current investigators, recognize an investigator’s good work or domain expertise, or provide investigators more access to data and automated resources to help them become more proficient and efficient.
Time to Contain (TTC)	Calculate the amount of time between TTD and when the investigation team started taking any enforcement actions or other measures to mitigate an issue This metric more accurately describes how a Trust and Safety team may handle platform abuse compared to Time to Resolve (TTR)	Efficiency or timeliness in uncovering a potential or known issue Quality of an investigative report – generally if the initial report is comprehensive enough to help an organization take action quickly, then the TTC becomes shorter, and is correlated with high quality in investigative reporting	May be hard for any team to capture because this depends on how frequent a type of abuse occurs to have a more holistic or average metric Data science or engineering team would be better to track this than an investigations team to prevent bias	An investigations team can compare TTCs from the time a team first learned of the abuse or unusual behavior to future instances of when the team contained the abuse to measure how much the organization has adapted to handling a known abuse or behavior. A lower TTC over time can indicate how aware an organization has become in handling a certain problem, or how the problem became less complex to handle thanks to improvements in operations, team knowledge, and resources allocated to handle the issue.
Number of reports/alerts about an issue	Calculate how many alerts about the same type of issue or abuse is known to the organization	Scale of abuse Frequency of abuse Existing volume or changes in volume Potential risk to an organization if the issue continues	Investigations teams for unknown abuse types Data science teams for known abuse types	Take the number of alerts and break them down into: Which alerts were actionable, which were false positives, and which had insufficient information to warrant further action? Which abuses receive more/less alerts than others? Why? Which alerts may require additional review or processes from other teams that can cause the MTTD or MTTA to increase/decrease?
Number of users or accounts	Identify the number of people involved Can range from actual users of a platform or service, one user and its followers (or mutual users linked to one user using the same platform or service)	Scale of abuse Potential risk to an organization and the communities it serves	Investigations teams for unknown abuse types Data science teams for known abuse types	Break down user counts based on users impacted by the abuse identified in the alert, what types of users are the impacted users, what other types of users may be vulnerable/not vulnerable to the abuse.
Number of place/countries involved or impacted	Can be specified at the local, regional, national, international or global levels	Scale of abuse Potential risk to an organization and the communities it serves Amount of policies and procedures that may need to be implemented to be in compliance	Investigations, policy, and/or corporate communications teams	If the organization doesn’t have a dedicated database for this metric, consider identifying where to pull this information so the team can find any trends between user, a user’s behavior, and a location over time. Ideas for pulling this information can be: Location as listed on profile; Time zone; User’s IP logs; Billing information. Over time, the team can break down risk as it relates to a country, its laws, its demographics, typical product usage and user behaviors for that country.
Amount of external reporting about an issue	Different from TTA where the alert is meant to prompt someone to take action Could be number of news articles, watchlist, minutes of video, audio, or other news recordings that mention or cover the issue in more detail from an informative perspective	Potential risk to an organization Amount of policies and procedures that may need to be implemented to be in compliance	Investigations, policy, and/or corporate communications teams; third party groups, NGO partners, and/or think tanks	Calculate this metric to explain to other teams or organization leadership about the significance or public concern for a certain issue and why the team needs more resources, staff, or higher prioritization from higher up leadership to take more action on an issue.
Number of entry points or attack vectors for a certain type of abuse to occur	Identify how many points (e.g., devices, login methods, methods of access attempts) need to be monitored to prevent a certain type of abuse	Scale of abuse Amount of maintenance required to uphold a policy or process to mitigate a type of abuse	Investigations team to uncover the entry points/attack vectors Policy team to confirm which entry points are manageable or enforceable	Calculate these to give to product teams what entry points or attack vectors can be mitigated with automated alerts and processes, public education and awareness, or product developments.
Dates and times of re-review	Identify when a known abuse type was investigated again after initial enforcement or mitigative actions took place	Level of compliance Risk reduction Recidivism (return of offending behavior after actions were taken) from a known abuse type and its previous mitigative actions taken	Investigations teams	This metric can be used together with MTTA, MTTD, and MTTC to assess how the investigations team evolves over time and how it handles known abuse types.
Number of bugs, feature requests, policy changes, or process changes occurred as a result of an investigation	Calculate how many changes an organization made These numbers can also be tracked over time, not just immediately after the first iteration of an investigation for a newly identified or known abuse type	Level of impact the Trust and Safety team made on an organization Level of compliance Amount of costs, headcount, or other resources used or needed in the future to manage an issue	Product, engineering, data science, and/or policy teams; management or senior leadership	Use this metric to highlight the complexity and scale an issue has on its organization and how many resources are used or needed to handle the problem. This will give an organization a better understanding of how much impact an investigation has in calling for improvements across one or multiple areas.
How many internal teams or external parties were involved in an investigation	Calculate how many personnel were involved in handling an investigation This can be tracked over time to see if fewer or more people are needed to mitigate and prevent a certain type of abuse	Scale of abuse Level of impact the Trust and Safety team made on an organization Amount of costs, headcount, or other resources used or needed to manage an issue	Investigations teams	Use this metric to quantify how much staff or public involvement was part of an investigation to evaluate the scale/impact a threat has on the organization.
Scale/degree of averted crisis or escalation	Calculate the magnitude and costs of a crisis or escalation that would have happened, if not for the detection and mitigation work of the investigations / intelligence team	Financial penalties, brand reputation damage, headcount or legal resources expended to mitigate crisis/escalation	Legal and/or public policy teams	Use this metric to explain how effectively an organization mitigated risk to compare how the organization would fare if it took no action, partial action, or all actions based on an investigation report’s findings.
Change/ reduction in subsequent user reports about the threat vector in question	Calculate the change in the number or rate of user reports	Scale of abuse Change (increase or reduction) in abuse after problem identification and mitigation	Data science teams	Combine this metric with MTTA, MTTD, and MTTC to see whether a threat has become mitigated or better managed over time.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.