12/12/2024 | Press release | Distributed by Public on 12/12/2024 12:23
Trust Testing - a methodology specifically designed to address the complexities of agentic AI - focuses on systematically uncovering and addressing subtle biases in AI systems while building user trust across diverse cultural contexts
As businesses increasingly embrace agentic AI, an advanced form of artificial intelligence (AI) capable of making decisions and taking actions autonomously, a technological transformation with vast potential is underway. For the enterprise, AI agents promise to revolutionize industries by augmenting employees, enhancing customer experiences, and unlocking a digital labor force for every vertical.
Agentic AI systems include Agentforce, Salesforce's complete AI system that integrates data, AI, automation, and humans to deploy trusted AI agents for concrete business outcomes. Although it's still early days, companies like Wiley and Unity Environmental University are seeing promising results.
Wiley, an early adopter, is resolving cases over 40% faster with Agentforce than their previous chatbot. Unity Environmental University expects to save $800 in staff resource time for every request for information made via their website with Agentforce.
Wiley, an early adopter, is resolving cases over 40% faster with Agentforce than their previous chatbot.
Toni Morgan, Product Management DirectorYet, as agent capabilities grow, so too does our responsibility to mitigate risk.
Agents' capacity to take independent action introduces important ethical questions: How do we ensure these systems are fair and equitable? How do we prevent them from perpetuating stereotypes or alienating users from diverse cultural or linguistic backgrounds? The very qualities that make agentic AI transformative - its ability to learn, adapt, and operate autonomously - also make it vulnerable to reinforcing biases, eroding user trust, and amplifying systemic inequities if left unchecked. These risks are further compounded by the non-deterministic nature of generative AI, where outputs can vary depending on context.
Without rigorous ethical frameworks for testing and deploying these systems, businesses risk not only reputational damage but also the potential to lose the trust of the very people they aim to serve.
Without rigorous ethical frameworks for testing and deploying these systems, businesses risk not only reputational damage but also the potential to lose the trust of the very people they aim to serve.
Toni Morgan, Product Management DirectorAt Salesforce, with efforts led by our Office of Ethical & Humane Use, we're committed to addressing these challenges head-on through a comprehensive strategy that includes red teaming, various adversarial testing methods, and Trust Testing - a complementary type of product testing designed to help Salesforce agentic AI products better serve globally diverse populations.
Traditional manual testing frameworks have long played a key role in AI validation, providing a human lens to complement automated methods. These approaches are essential for maintaining technical accuracy and robustness. But as agentic AI becomes more sophisticated, these frameworks reveal the significant limitation of missing the lived experiences of diverse users. They tend to follow static checklists that focus on technical correctness, rather than assessing how real users from different cultural, linguistic, or social backgrounds, and with varying disability statuses, might experience the system. Without human testers representing diverse perspectives, these frameworks miss subtle, yet impactful, forms of bias.
Unlike traditional machine learning models, agentic AI acts autonomously in response to human input, which introduces unique challenges: responses can vary depending on context, making it harder to identify and address biases embedded in the system. For example, imagine a career advice agent unintentionally steering women away from leadership roles or a customer service agent misunderstanding regional idioms. These biases, while subtle, can erode trust and alienate users.
Traditional testing frameworks fail to account for the dynamic and context-sensitive nature of agentic AI systems. This is where Trust Testing - a methodology designed to address the complexities of agentic AI by combining manual testing with real-world interactions led by diverse users - stands apart.
With the launch of Agentforce for Service, we faced a challenge: how to rigorously test for bias in a system that takes semi-autonomous action. While traditional testing methods formed the foundation of our product testing strategy, we knew that a different kind of testing was required to surface the subtle forms of bias that diverse users experience. To meet this challenge, we tapped a diverse population of employees from across Salesforce's global workforce to evaluate the trustworthiness of prompt responses. Participants engaged with the AI in simulated real-world scenarios, creating detailed personas to represent diverse user experiences and using those personas to explore interactions that probed for biases, inconsistencies, and gaps in cultural sensitivity.
In other words, Trust Testing leverages diverse perspectives to understand how AI systems can maintain user trust even when outputs can't be predicted.
Toni Morgan, Product Management DirectorIn other words, Trust Testing leverages diverse perspectives to understand how AI systems can maintain user trust even when outputs can't be predicted. Trust Testing expands upon our existing ethical product testing efforts which prioritize diverse perspectives to uncover unintentional biases in the customer experience. This serves as a cornerstone in our trust architecture, rigorously testing and identifying potential biases and vulnerabilities to ensure our AI systems are not only robust but also align with the Salesforce's standards of ethical responsibility.
This methodology goes beyond simple accuracy metrics that focus solely on finding gaps in the product experience. It examines how different users might interact with, trust, or mistrust an agentic AI system based on their lived experiences, cultural contexts, and personal knowledge - a methodology where testers develop dynamic characters with unique backgrounds, challenges, and motivations.
For example, one persona might represent a bilingual educator from India who switches between English and Hindi in conversation. Another might reflect a non-binary software engineer navigating corporate culture in Japan. By embedding these personas into the testing process, we can evaluate how AI systems handle diverse linguistic, cultural, and contextual scenarios.
To make sure the testing process captured a full spectrum of perspectives, Salesforce partnered with our Equality Groups to recruit testers from diverse cultural, linguistic, and professional backgrounds, including those with disabilities. This approach complements traditional forms of testing by centering the human experience when using AI. For instance, testers identified system inconsistencies in how the AI responded to different English dialects, cultural references, and identity disclosures, helping us refine responses to be more inclusive and contextually appropriate.
By prioritizing lived experience and diverse expertise, Trust Testing uncovered critical insights that strengthened system performance while building trust with global users.
Toni Morgan, Product Management DirectorThis approach also avoids the pitfalls of traditional testing, in which the individual identities of those from marginalized groups are leveraged as proxies for that entire demographic group. Instead, persona-based testing empowers testers to contribute their insights by allowing them to create dynamic, multidimensional personas that reflect their lived experiences, avoiding the reductive practice of treating individuals as demographic stand-ins. By prioritizing lived experience and diverse expertise, Trust Testing uncovered critical insights that strengthened system performance while building trust with global users.
Trust Testing transformed our understanding of how AI systems handle linguistic and cultural diversity. We identified four critical focus areas for improvement:
Most significantly, these learnings demonstrate how Trust Testing can systematically uncover and address subtle biases in AI systems, while building genuine user trust across diverse cultural contexts. These lessons go beyond improving individual products - they highlight how businesses can address the complexities of global AI deployment. In an era where agentic AI systems are becoming increasingly autonomous, trust is paramount. Trust Testing gives businesses a framework to ensure their AI systems are not only innovative but also equitable, adaptive, and aligned with the realities of global users.
Trust Testing has also become a platform for upskilling our own workforce. By engaging diverse testers as collaborators, the process naturally evolved into a framework for developing critical AI competencies.
Participants in Trust Testing gained hands-on experience with Agentforce while building essential skills for the AI-powered future, including:
Through this integrated approach, testers not only helped improve Agentforce but also developed practical skills that enhanced their technological readiness. The process transformed what could have been a simple testing exercise into a mutual learning opportunity, ensuring our workforce grows alongside our AI capabilities.
As businesses increasingly integrate agentic AI into their workflows, the need for ethical and transparent testing frameworks has never been more urgent. Trust Testing is one part of a multi-layered approach to testing AI at Salesforce Trust Testing and is also a blueprint for how businesses everywhere can build trustworthy AI systems that equitably serve diverse user groups.
The path to ethical AI deployment lies in recognizing bias identification and AI skills development not as separate hurdles, but as interconnected opportunities to create more inclusive and effective AI systems. Trust Testing is more than a tool - it's a call to action for the industry to lead with equity and accountability. As AI systems gain increasing autonomy, ensuring they serve all communities equitably will be critical. Together, we can build a future where AI is not only innovative but also ethical, equitable, and trustworthy.