Zscaler Inc.

29/08/2024 | News release | Distributed by Public on 29/08/2024 18:38

Automate ITSM Workflows and Accelerate IT Resolutions Using AI

At Zenith Live, I had the pleasure of talking to our customers about how AI can revolutionize IT service management (ITSM) by detecting issues early and initiating automated root cause analysis, helping reduce Mean Time to Detect (MTTD) and Mean Time to Repair (MTTR). In this article, we'll recap what was covered during my session.

We also trained hundreds of people in the ZDX workshop, and if you're interested in ZDX Certifications, there's more on that later. Read on!

This article is organized into two main parts:

  • Automate ITSM workflows:In the first half, we explore the recent innovations in alert routing with ServiceNow and Zscaler Workflow Automation.
  • Accelerate IT resolutions using AI:In the second half, we'll discuss the innovations in the ZDX AI space.

If you are new to ZDX, here's a quick summary

Application performance and network issues are time-consuming tasks for Tier 1 to Tier 4 service desk and IT teams troubleshooting user experience. They're further hampered by the fact that legacy monitoring tools don't share context and require manual correlation. Beyond that, major skill gaps lead to unnecessary escalations, even for simple problems/tickets that could have been resolved early and saved from being pushed to higher tiers.

Zscaler Digital Experience (ZDX) addresses these challenges by providing a multi-tenant, cloud-based monitoring platformthat probes, benchmarks, and measures digital experiences for every user in the organization.

By leveraging the same Zscaler Client Connector, which many customers have already deployed, ZDX performs synthetic probing to desired SaaS applications or internet-based services, offering critical insights into the user experience. This allows IT support teams to triage, escalate, and close tickets more efficiently significantly reducing the burden on IT support.

Automate ITSM Workflows

Switching gears; ZDX has built in workflows that triggers incident creation when an anomaly is detected. It can do this using webhooks, email, or by leveraging the ServiceNow Plugin which is available in the ServiceNow AppStore.

Here are some of the options which are available to customers to integrate ZDX into their ITSM systems.

  • ZDX ServiceNow Integration
  • Zscaler Workflow Automation
  • Integrations leveraging ZDX Open APIs

Let's take a look at each of them in detail:

ZDX ServiceNow Integration

ZDX provides out of the box integration with ServiceNow , the plugin enabling this is available in the ServiceNow Marketplace. Once configured; it provides the following capabilities:

  • Map the categories and subcategoriesfor incoming alerts or created incidents
  • Create Deep Tracing sessionsto provide deeper granularity and process-level information for a user
  • Run a Root Cause Analysisto help detect and identify the root cause of a drop in an application's ZDX Score

Here's a videoI had recorded if you want to get into how the integration is done.

Once the setup is complete ZDX Alertsstart flowing in as Incidents into ServiceNow.

Bringing in the User Level Details into the ServiceNow ticket, the integration of ZDX with ServiceNow not only allows alerts to flow seamlessly into ServiceNow as incidents but also brings in user-level details.

When a ServiceNow incident is opened for a user, the user's ZDX experience score is populated within the incident. This enables ServiceDesk technicians to provide a white-glove experience by having immediate insight into the current state of the user's experience.

Additionally, ServiceDesk technicians can run an on-demand troubleshooting session. This session captures granular details and allows them to analyze, evaluate, and troubleshoot issues for a specific user, device, or application.

Zscaler Workflow Automation

We've covered the ServiceNow integration, but what if you need more control over which users, departments, or even ServiceNow instances handle specific tickets?

To address these challenges and provide enhanced control, we are launching Zscaler Workflow Automation, currently in beta.This new feature allows for precise configuration and management of ticket routing, ensuring that the right teams handle the right issues efficiently. Let's dive into it.

Once Workflow Automation is enabled for your tenant, you can define multiple ServiceNow destinations and create routing rules for your alerts. These rules ensure that alerts flow into ServiceNow based on specified criteria.

You can configure the system to trigger a workflow for a specific alert type, such as Network, if it is classified as high severity.

You can define that specific Alert Type (say Network) and if its high Severity should trigger this workflow.

This workflow will then assign the ticket to the appropriate ServiceNow tenant or user according to your predefined configurations. This ensures that critical issues are routed to the right team or individual for prompt attention and resolution, enhancing the efficiency and responsiveness of your IT support operations.

Integrations leveraging ZDX Open APIs

During the session, we also explored how ZDX integrates seamlessly with advanced analytics tools like Splunk, Power BI, and Moogsoft. These integrations offer significant flexibility and enhanced analytics capabilities, allowing you to gain deeper insights into your IT environment.

Please note that these options are not intended to export all data from ZDX into a third-party data analytics solution. Instead, they enable you to export the most relevant data for your specific use case, within the constraints of API limits.

Leveraging the ZDX API for Custom Integrations:

  • Splunk: Integrate ZDX with Splunkto create real-time dashboards and visualizations, helping identify trends, anomalies, and potential issues before they impact users.
  • Power BI:Use Power BI with ZDX to generate comprehensive reports and interactive dashboards, offering advanced data modeling and visual insights into your IT operations.
  • Moogsoft:Integrate ZDX with Moogsoft for AI-driven incident management, using machine learning to detect patterns, predict incidents, and automate remediation, reducing MTTD and MTTR.

We have a Jupyter Notebook on our GitHubrepository to the ZDX APIs as well as examples of some of the above-mentioned use cases.

Innovations in AI/ML for IT Troubleshooting

We explored the latest innovations in machine learning (ML) and artificial intelligence (AI) that assist with troubleshooting and detecting issues in IT environments. These technologies can accelerate IT resolutions by providing deeper insights and automated solutions.

ZDX has been leveraging AI/ML in the various feature sets for ZDX from very early on, we started with Automated Root Cause Analysis as our first ML based feature , here are some of the ZDX features which leverages AI/ML:

  • Automated Root Cause Analysis
  • Incident Dashboard
  • Self Service
  • ZDX Copilot

Let's look at each of these.

Automated Root Cause Analysis

Now let's take a look at how IT admins can leverage AI/ML to get to the root cause of the issue quickly: ZDX can swiftly identify the root cause of user experience issues with its new AI-powered root cause analysis capability.

The Automate Root Cause Analysis feature is a powerful tool that assists both ServiceDesk and Tier 4 teams in quickly pinpointing the source of issues.

When a user's score falls into the poor category, the "Analyze Score" button triggers a correlation of data points (CPU, Memory, Network, Wi-Fi, DNS, etc.) to determine what caused the degradation in user experience.

It then provides a verdict with detailed insights into the root cause.

The analysis table provides key details for a specific date and time in the graph:

  • Factor:Identifies a probable cause for the low score.
  • Explanation:Describes why the factor might be an issue.
  • Confidence Level:Indicates the assumed accuracy of the analysis based on similar issues.
  • Provide Feedback:Allows users to rate the accuracy of the analysis by clicking the thumbs-up icon.

With ZDX, you can compare application scores to understand why they might vary over time. Score comparisons can reveal why a current score differs significantly from a previous one. This feature utilizes web, device, and Cloud Path metrics to determine differences in scoring.

To start your comparison, select a point within the ZDX Score Over Time graph and choose from the "Compare to" drop-down menu:

  • Compare the ZDX Score of your selected point to a previous score.
  • Compare the ZDX Score of your selected point to a future score, up to the current date and time.

In the above example, we demonstrate the differences in network statistics between the compared point and the analyzed point. It provides detailed information to the ServiceDesk admin on what has changed.

Incident Dashboard

So far, we've seen how to trigger the root cause analysis on a degraded score, but since ZDX already has millions of telemetry points from the end user experience perspective, we went a step ahead and developed our Incident Dashboard which correlates these myriad data points and bubbles up deep-rooted issues in your environment which you may not be yet aware of.

The ZDX Incidents Dashboard provides a comprehensive view of IT incidents impacting user device performance, categorized into Wi-Fi, Last Mile ISP, ZIA/ZPA Public Service Edge, and Application and more.

It uses AI/ML to detect and analyze incidents, offering real-time monitoring and detailed metrics, such as the number of incidents, impacted users, and their geographic locations. Filters allow you to refine the data by geolocation, type, and time range. Key features include incident analysis over time, visualization of incidents on a map, and detailed incident insights, enhancing IT operations and service quality.

Key components:

  • Detailed Metrics: View total incidents, impacted users, and incidents across key areas
  • Incidents Over Time: Analyze the number of incidents and impacted users over selected periods
  • Geographic Visualization: Map-based display of incident epicenters with different icons for each incident type

A common question is about the criteria or thresholds that trigger incidents on the dashboard. While much of this is driven by machine learning, here are some high-level details on the thresholds and data monitored over time to trigger an incident.

Self Service

Keeping the same innovations in AI, we next looked at resolving the issues at the user level itself before it goes and impacts end user experience. With self service, we provide the user with a gentle nudge (notification) that there could be a CPU or a Wi-Fi Issue which might be impacting his/her experience.

Self Service can help users identify the root cause of issues related to CPU usage and Wi-Fi access, allowing users to investigate potential solutions without the need to contact customer support. When enabled for your users, Self Service provides notifications when issues are detected and need attention. Each notification contains a brief diagnosis and recommendation that might resolve the CPU or Wi-Fi issue.

Alright now we know it took a lot of process but which one

ZDX Copilot

ZDX Copilot offers versatile capabilities for various IT functions:

  • Upskilling Service Desk Analysts:New hires or service desk analysts can quickly enhance their skills by asking domain-specific questions or extracting knowledge from documentation with simple queries.
  • Advanced Analysis for Experienced Analysts:More tenured analysts can delve deeper into issues. For instance, if multiple employees in Paris report slow Outlook performance, they can ask, "Why was Outlook slow for users in Paris at 9 AM today?" to uncover root causes and performance trends.
  • Automating Configuration Tasks:Copilot can automate tasks based on recurring issues. For example, if frequent Outlook complaints from Paris users are detected, you can set Copilot to trigger an alert when more than 25% of users in Paris experience poor Outlook performance.

ZDX Copilot aids IT employees across various functions in upskilling, automating tasks, gaining digital experience insights, and performing in-depth performance analysis. By leveraging knowledge from over 500 trillion daily metrics across devices, networks, and applications, observed by the world's largest security cloud, ZDX with Copilot helps your teams significantly improve efficiency and collaboration across IT operations, service desks, and security.

Cover the frequently asked questions about ZDX Copilot - Frozen LLM .

And to conclude, let's look at the financial benefits of implementing ZDX

A properly deployed ZDX solution offers significant financial advantages. For a company with 45,000 users, we projected annual cost savings of approximately $7.4 million, driven by:

  • Productivity gains
  • Operational efficiency
  • Tool consolidation

Conclusion:

Incorporating AI and ML into ITSM workflows not only enhances efficiency but also drives substantial cost savings. ZDX's comprehensive approach addresses common IT challenges, providing a robust solution for modern IT environments.

For more details on our analysis and to see how we reached these savings, check out the guide.

If you've made it this far, thank you for reading!Please reach out to your account team if you would like to hear more about the features discussed in this article and ways on how ZDXcan enhance your end user and employee experience.