Whether sharing raw data, or communicating information, knowledge, or wisdom generated from raw data, there are many benefits and challenges to consider.
Data Transparency
Benefits
- Trust and Accountability: As previously mentioned, without ongoing and transparent communication, it’s possible for misunderstandings or rumors about a platform’s policies to outshine the facts. Sharing trust and safety metrics with the public is critical for building trust and demonstrating accountability. Many online platforms publish transparency reports quarterly or annually, sharing statistics including government requests for information, numbers and types of violating behavior on the platform, as well as appeals and restorations where appropriate. This shows current and future stakeholders that a given platform takes safety seriously.
- Monitoring and Measuring Performance: Metrics are necessary to understand current Trust and Safety processes and to measure the effectiveness of new initiatives. Key Performance Indicators (KPIs) are a common tool used in different areas of a business to align stakeholders on processes and initiatives, including Trust and Safety goals. However, KPIs for trust and safety are challenging because although there may be a definitive understanding among stakeholders of what bad performance looks like (e.g., regulatory investigations, negative customer feedback) there may not always be a clear definition of what good performance looks like. Companies must reflect and define what goals they’re comfortable with and identify a way to measure progress towards that goal—and then communicate those goals and their progress to stakeholders inside and outside the organization.
Challenges
- Operational Costs. Complete transparency can be an expensive affair depending on the amount of data validation and quality reviews required. Additional costs are also common in the development and publication of accompanying content such as visual design and hosting platforms.
- Data Availability. Platforms usually don’t log all possible types of data and metrics, so transparency is often limited by what data is available to analyze and share. Balancing this reality with public expectations that companies always know what’s happening on their platforms can be particularly difficult for Trust & Safety teams, especially when other parts of the business make claims or promises about moderation capabilities in the absence of relevant data.
- Privacy and Legal Considerations. Many types of data are regulated in many jurisdictions around the world, including data about individuals or evidence of a crime. This puts additional constraints on whether certain datasets or analyses can be shared widely. Some organizations are able to take advantage of emerging Privacy Enhancing Technologies (PET), such as differential privacy and synthetic data, to reduce the privacy risks of sharing certain types of regulated data. However, there is no one-size-fits-all solution and regulators in the United States and EU have already issued public warnings against organizations’ misleading or inaccurate claims of “anonymized data.”
- Differing Perspectives or Bias. External stakeholders including journalists, academics, customers, and regulators, have their own expectations or bias about the degree to which data should be available and how it should be interpreted. Just like trust and safety professionals, stakeholder perspectives are shaped by varying degrees of subject matter expertise, technical acumen, and lived experiences. There is no guarantee that every (or any) audience will interpret transparency reports and data with the same context or values as trust and safety teams do.
Trust and Safety Storytelling Tools
Stories help us understand each other. There are many ways to use narrative techniques to explain important information like events, trends, and risks that surface during data analysis. Two of the most common tools used by trust and safety teams to share information about data in a consumable format are visualizations and reports.
Visualizations can help trust and safety professionals translate the big ideas or narratives (e.g., knowledge, wisdom in the DIKW framework) reflected in their datasets. This is particularly helpful when communicating with audiences that might not have the same level of experience working with data or technical systems. When using visualization to communicate about topics related to Trust and Safety, it’s important the main idea is immediately recognizable and graphical elements enhance (rather than distract from) key points.
Reports are popular tools for trust and safety communication because they’re familiar to business stakeholders, such as company executives, clients, regulators, etc. Reports produced as part of a contractual or regulatory obligation typically have specific requirements for what data and insights must be included, what types of graphs are expected, and possibly an exact template to follow. Reports are also important for documentation and archives. The key to a good trust and safety report is the balance between big picture messages and nuanced details.
As many resources on best practices for visualizations and reporting exist already, discussing the technical details of visualizations and reporting is beyond the scope of this chapter. However, it is important to note that visualization and reporting may vary depending on whether it is for internal or external purposes. For instance, one may need to adhere to reporting standards outlined if the audience is a specific regulatory body. If it is for an internal trust and safety team, one may provide more exploratory data and visualizations and may undertake deep dives into specific topics. Some examples of external reports include transparency reports from major social media platforms, such as Meta and TikTok.
Conclusion
Many aspects of trust and safety rely on leveraging and understanding data. As such, it is important to understand why data is important, what types of data exist, and how data is commonly used in this industry. This chapter serves as a broad overview of existing data practices in the field. Similar to many areas in this field, these practices may be subject to change and mature depending on many factors such as industry trends, legal and regulatory updates, and user behavior changes. Nonetheless, understanding how data is currently being used will help trust and safety professionals better collaborate both internally and externally, and may also help shape the future of data practices at a given organization.
Appendix
Resources
Please note that the inclusion of the following resources does not imply endorsement of products either by the authors of this chapter or by TSPA.
Key Functions and Roles
Please see Key Functions and Roles for descriptions of different roles. Of note, specific job titles, levels, and responsibilities for trust and safety team members vary by organization, although there is some commonality, particularly within the same industry. The role descriptions provided are general and focus on the common positions within the private sector. To some degree, all of these roles can be involved in creating and enforcing trust and safety policies, as well as responding to specific incidents.
Programming Languages
To process large amounts of data, programming languages are adopted to speed up the data analysis and insight generation processes. Below are some common languages used in Trust & Safety.
Python
Python is a general-purpose programming language (GPL). GPLs are designed to be used for a wide range of tasks, both through their flexibility and through the availability of libraries – add ons that provide additional functionality for a wide range of tasks. Python is one of the most common GPLs in the world, with extensive community support. While not explicitly built for statistical analysis like R, the extensive libraries available for everything from machine learning to visualization to retrieving data from online APIs make it a valuable and useful tool for data science and analytics.
R
R is one of the most popular open-access, object-oriented programming languages. R is known for its versatility in data wrangling and statistical capabilities thanks to its ever-growing open-access packages.
SQL
SQL (Structured Query Language) is a programming language used to work with databases. It is the most common language for extracting and organizing data that is stored in a relational database. SQL has many variants with slight variations in syntax and functionality, but these are generally easy to resolve. While it does not contain advanced statistical functions or other powerful analytics tools, and does not have the flexibility of a general purpose language, SQL remains one of the most useful programming languages for analyzing and understanding data.
Research Methods & Storytelling
Campbell, Donald T., and Julian C. Stanley. Experimental and Quasi-Experimental Designs for Research. Ravenio Books, 2015.
Evergreen Data, August 21, 2023. https://stephanieevergreen.com/blog/.
“How to Determine the Correct Survey Sample Size.” Qualtrics, August 21, 2023. https://www.qualtrics.com/experience-management/research/determine-sample-size/.
“How to Use Stratified Random Sampling in 2023.” Qualtrics, August 21 2023. https://www.qualtrics.com/experience-management/research/stratified-random-sampling/.
Kurnoff, Janine, and Lee Lazarus. Everyday Business Storytelling: Create, Simplify, and Adapt A Visual Narrative for Any Audience. Wiley, 2021.
Acknowledgements
Authors│James Gresham, Xieyining “Irene” Huang, Melanie Ensign
Contributors│Harsha Bhatlapenumarthy
Special Thanks│Massimo Belloni, Max Aliapoulios