salesforce.com Inc.

12/12/2024 | Press release | Distributed by Public on 12/12/2024 12:23

Diverse and Ethical Product Testing: How Salesforce Builds Trust in Agentforce

Diverse and Ethical Product Testing: How Salesforce Builds Trust in Agentforce

Trust Testing - a methodology specifically designed to address the complexities of agentic AI - focuses on systematically uncovering and addressing subtle biases in AI systems while building user trust across diverse cultural contexts

As businesses increasingly embrace agentic AI, an advanced form of artificial intelligence (AI) capable of making decisions and taking actions autonomously, a technological transformation with vast potential is underway. For the enterprise, AI agents promise to revolutionize industries by augmenting employees, enhancing customer experiences, and unlocking a digital labor force for every vertical.

Agentic AI systems include Agentforce, Salesforce's complete AI system that integrates data, AI, automation, and humans to deploy trusted AI agents for concrete business outcomes. Although it's still early days, companies like Wiley and Unity Environmental University are seeing promising results.

Wiley, an early adopter, is resolving cases over 40% faster with Agentforce than their previous chatbot. Unity Environmental University expects to save $800 in staff resource time for every request for information made via their website with Agentforce.

Wiley, an early adopter, is resolving cases over 40% faster with Agentforce than their previous chatbot.

Toni Morgan, Product Management Director

Yet, as agent capabilities grow, so too does our responsibility to mitigate risk.

Addressing the ethical implications of agentic AI

Agents' capacity to take independent action introduces important ethical questions: How do we ensure these systems are fair and equitable? How do we prevent them from perpetuating stereotypes or alienating users from diverse cultural or linguistic backgrounds? The very qualities that make agentic AI transformative - its ability to learn, adapt, and operate autonomously - also make it vulnerable to reinforcing biases, eroding user trust, and amplifying systemic inequities if left unchecked. These risks are further compounded by the non-deterministic nature of generative AI, where outputs can vary depending on context.

Without rigorous ethical frameworks for testing and deploying these systems, businesses risk not only reputational damage but also the potential to lose the trust of the very people they aim to serve.

Without rigorous ethical frameworks for testing and deploying these systems, businesses risk not only reputational damage but also the potential to lose the trust of the very people they aim to serve.

Toni Morgan, Product Management Director

At Salesforce, with efforts led by our Office of Ethical & Humane Use, we're committed to addressing these challenges head-on through a comprehensive strategy that includes red teaming, various adversarial testing methods, and Trust Testing - a complementary type of product testing designed to help Salesforce agentic AI products better serve globally diverse populations.

Manually testing for bias: Where traditional methods fall short

Traditional manual testing frameworks have long played a key role in AI validation, providing a human lens to complement automated methods. These approaches are essential for maintaining technical accuracy and robustness. But as agentic AI becomes more sophisticated, these frameworks reveal the significant limitation of missing the lived experiences of diverse users. They tend to follow static checklists that focus on technical correctness, rather than assessing how real users from different cultural, linguistic, or social backgrounds, and with varying disability statuses, might experience the system. Without human testers representing diverse perspectives, these frameworks miss subtle, yet impactful, forms of bias.

Humans with Agents Drive Customer Success Together.

LEARN MORE

Unlike traditional machine learning models, agentic AI acts autonomously in response to human input, which introduces unique challenges: responses can vary depending on context, making it harder to identify and address biases embedded in the system. For example, imagine a career advice agent unintentionally steering women away from leadership roles or a customer service agent misunderstanding regional idioms. These biases, while subtle, can erode trust and alienate users.

Traditional testing frameworks fail to account for the dynamic and context-sensitive nature of agentic AI systems. This is where Trust Testing - a methodology designed to address the complexities of agentic AI by combining manual testing with real-world interactions led by diverse users - stands apart.

Grounding Agentforce in Trust Testing: Understanding its capacity for bias

With the launch of Agentforce for Service, we faced a challenge: how to rigorously test for bias in a system that takes semi-autonomous action. While traditional testing methods formed the foundation of our product testing strategy, we knew that a different kind of testing was required to surface the subtle forms of bias that diverse users experience. To meet this challenge, we tapped a diverse population of employees from across Salesforce's global workforce to evaluate the trustworthiness of prompt responses. Participants engaged with the AI in simulated real-world scenarios, creating detailed personas to represent diverse user experiences and using those personas to explore interactions that probed for biases, inconsistencies, and gaps in cultural sensitivity.

In other words, Trust Testing leverages diverse perspectives to understand how AI systems can maintain user trust even when outputs can't be predicted.

Toni Morgan, Product Management Director

In other words, Trust Testing leverages diverse perspectives to understand how AI systems can maintain user trust even when outputs can't be predicted. Trust Testing expands upon our existing ethical product testing efforts which prioritize diverse perspectives to uncover unintentional biases in the customer experience. This serves as a cornerstone in our trust architecture, rigorously testing and identifying potential biases and vulnerabilities to ensure our AI systems are not only robust but also align with the Salesforce's standards of ethical responsibility. ​

Designing the right type of test: The Trust Test

This methodology goes beyond simple accuracy metrics that focus solely on finding gaps in the product experience. It examines how different users might interact with, trust, or mistrust an agentic AI system based on their lived experiences, cultural contexts, and personal knowledge - a methodology where testers develop dynamic characters with unique backgrounds, challenges, and motivations.

For example, one persona might represent a bilingual educator from India who switches between English and Hindi in conversation. Another might reflect a non-binary software engineer navigating corporate culture in Japan. By embedding these personas into the testing process, we can evaluate how AI systems handle diverse linguistic, cultural, and contextual scenarios.

To make sure the testing process captured a full spectrum of perspectives, Salesforce partnered with our Equality Groups to recruit testers from diverse cultural, linguistic, and professional backgrounds, including those with disabilities. This approach complements traditional forms of testing by centering the human experience when using AI. For instance, testers identified system inconsistencies in how the AI responded to different English dialects, cultural references, and identity disclosures, helping us refine responses to be more inclusive and contextually appropriate.

By prioritizing lived experience and diverse expertise, Trust Testing uncovered critical insights that strengthened system performance while building trust with global users.

Toni Morgan, Product Management Director

This approach also avoids the pitfalls of traditional testing, in which the individual identities of those from marginalized groups are leveraged as proxies for that entire demographic group. Instead, persona-based testing empowers testers to contribute their insights by allowing them to create dynamic, multidimensional personas that reflect their lived experiences, avoiding the reductive practice of treating individuals as demographic stand-ins. By prioritizing lived experience and diverse expertise, Trust Testing uncovered critical insights that strengthened system performance while building trust with global users.

Results: What we learned from Trust Testing

Trust Testing transformed our understanding of how AI systems handle linguistic and cultural diversity. We identified four critical focus areas for improvement:

  1. Global English Variations: We moved beyond U.S.-centric language models to ensure our systems accommodate English as it is spoken across regions like India, Australia, and the U.K. This means users now experience responses that reflect their local context, vocabulary, and communication styles - whether it's interpreting regional idioms or understanding colloquial expressions.
  2. Multi-lingual Interactions: The testing showed us how to improve Agentforce to support conversations that mix languages or use non-standard English. For example, our AI can now better handle scenarios where users seamlessly switch between languages mid-conversation or use localized phrases that deviate from traditional grammar. This ensures smoother, more intuitive interactions for multilingual users.
  3. Identity Handling: We improved how our systems manage identity-related disclosures, such as references to a "partner" versus a "spouse." These adjustments allow the AI to respond more respectfully and inclusively, avoiding assumptions that might alienate or misrepresent users.
  4. More than Technical Accuracy: While accuracy remains foundational, Trust Testing revealed that building trust requires more than technically correct answers. By addressing cultural sensitivity, language diversity, and inclusivity, our AI systems now deliver responses that resonate with users on both a cognitive and emotional level, bridging both reliability and approachability.

Most significantly, these learnings demonstrate how Trust Testing can systematically uncover and address subtle biases in AI systems, while building genuine user trust across diverse cultural contexts. These lessons go beyond improving individual products - they highlight how businesses can address the complexities of global AI deployment. In an era where agentic AI systems are becoming increasingly autonomous, trust is paramount. Trust Testing gives businesses a framework to ensure their AI systems are not only innovative but also equitable, adaptive, and aligned with the realities of global users.

Leveraging non-technical skills to improve a technical product

Trust Testing has also become a platform for upskilling our own workforce. By engaging diverse testers as collaborators, the process naturally evolved into a framework for developing critical AI competencies.

Participants in Trust Testing gained hands-on experience with Agentforce while building essential skills for the AI-powered future, including:

  • AI Literacy: Understanding how AI systems work, their limitations, and their ethical implications.
  • Critical Analysis: Learning to evaluate AI outputs for subtle biases and inconsistencies.
  • Collaborative Problem-Solving: Working in diverse teams to navigate complex challenges.
  • Data Interpretation: Recognizing how data shapes AI outcomes and where improvements are needed.
  • Ethical Reasoning: Applying principles of fairness and equity in testing and design.

Through this integrated approach, testers not only helped improve Agentforce but also developed practical skills that enhanced their technological readiness. The process transformed what could have been a simple testing exercise into a mutual learning opportunity, ensuring our workforce grows alongside our AI capabilities.

Looking ahead

As businesses increasingly integrate agentic AI into their workflows, the need for ethical and transparent testing frameworks has never been more urgent. Trust Testing is one part of a multi-layered approach to testing AI at Salesforce Trust Testing and is also a blueprint for how businesses everywhere can build trustworthy AI systems that equitably serve diverse user groups.

The path to ethical AI deployment lies in recognizing bias identification and AI skills development not as separate hurdles, but as interconnected opportunities to create more inclusive and effective AI systems. Trust Testing is more than a tool - it's a call to action for the industry to lead with equity and accountability. As AI systems gain increasing autonomy, ensuring they serve all communities equitably will be critical. Together, we can build a future where AI is not only innovative but also ethical, equitable, and trustworthy.

More information:

Toni Morgan Product Management Director
More by Toni