Palo Alto Networks Inc.

08/21/2024 | News release | Distributed by Public on 08/22/2024 06:58

Autoencoder Is All You Need: Profiling and Detecting Malicious DN...

Executive Summary

To improve our detection of suspicious network activity, we leveraged a deep learning method to profile and detect malicious DNS traffic patterns. Based on these DNS profiles, we developed multiple detection modules, each designed to identify suspicious domains from different perspectives. We will explore how these DNS traffic patterns correlate with specific types of cyberattacks' activities through various case studies.

DNS resolution traffic, as the initial stage of network communication, provides critical insights into cyberthreats behind malicious network traffic. Analyzing DNS traffic patterns and characteristics can help us detect and prevent unauthorized infiltration attempts.

For example, upon gaining access to a host within a victim's network, an attacker often deploys malware that periodically connects to its command and control (C2) servers. This results in DNS traffic patterns for these C2 domains that can be markedly different from typical benign DNS activity.

Our detector captured 170 emerging suspicious domains in May 2024. The resulting signatures blocked approximately 374,000 malicious DNS requests every day.

The malicious DNS traffic detector is deployed on the Advanced DNS Security service to provide real-time protection against various network threats. The detected malicious domains are also shared with Advanced URL Filtering.

If you think you might have been compromised or have an urgent matter, contact the Unit 42 Incident Response team.

Autoencoder-Based DNS Traffic Profiling

The Palo Alto Networks Advanced DNS Security service continuously monitors real-world DNS traffic to detect and block threats within organizations' environments. To aid in threat hunting, all DNS requests are logged and delivered to our backend detection systems. This logging grants us the ability to collect and investigate the time series data of DNS traffic for each domain and customer device in real-time.

Analyzing this time series data can help track malicious DNS activity. By comparing emerging traffic trends with known malicious patterns, we can uncover emerging attacker domain names.

However, running any comparisons on raw traffic data is complex, making the analysis computationally expensive. Furthermore, this complexity will grow exponentially as the time series data scales larger. Therefore, a major challenge in analyzing large-scale time series data is determining how to process and store the data efficiently and scalably.

To overcome these challenges, we transform our dynamic DNS traffic time series data into lower, fixed-dimensional vectors called DNS profiles with the autoencoder technique. An autoencoder is a deep learning model that is designed to compress its input into a lower-dimensional vector, then reconstruct the output from this representation. People typically use an autoencoder for dimensionality reduction and feature learning.

We constructed our autoencoder with recurrent neural networks (RNN) cells so it can take variable-dimensional inputs and output compressed, fixed-dimensional vectors that preserve the input sets' characteristics. We use the same set of time series data as the input and ground truth to train the model and obtain the intermediate vector as our traffic profiles for each device. In this way, the profiles closely model the characteristics of the input DNS traffic time series data.

Figure 1 plots our autoencoder's validation loss curve during the model training. The loss stabilizes at a low level after 5,000 epochs of training, indicating this is the point where the model has successfully learned the manifold of real-world DNS traffic and its representative characteristics.

Malicious DNS Traffic Detection

After profiling a domain's DNS traffic into a fixed-dimensional vector, we can leverage various machine learning algorithms, including classification, cluster and anomaly detection models, to discover and analyze malicious network traffic patterns.

DNS Profile Classification

To identify the DNS resolution traffic to malicious domains, we implemented a high-precision classification model trained by historical benign and malicious DNS traffic profiles. The classification model serves a real-time detection system that is able to capture and block ongoing attack traffic as soon as a domain presents suspicious DNS traffic patterns. We illustrate the details of our detection pipeline in the conclusion section.

DNS Traffic Pattern Clustering

In addition to detecting the DNS requests of malicious domains, our DNS profiles also help us move a step further to understand the characteristics of different attacking behaviors in malicious network traffic. Specifically, we apply the clustering algorithm on the malicious DNS profiles to identify different groups of malicious DNS traffic so that we can analyze various DNS patterns and hunt down attack campaigns effectively.

Figure 2 categorizes the traffic of malicious domains into three distinct clusters based on their DNS profiles. The first cluster shows a steady flow of traffic with minimal spikes. Dynamic DNS (DDNS) is one of the behaviors that will generate these high frequency DNS requests and attackers commonly abuse this technique.

The IP addresses of DDNS domains are changed frequently by their name servers controlled by DDNS providers or adversaries. As a result, the time-to-live (TTL) values of DDNS records are usually short to force the clients to query for potential resolution updates frequently.

Domains in the second cluster shown in Figure 2 experience a moderate amount of traffic, typically 10-20 requests every hour. This cluster could correspond to DNS tunneling domains, which are contacted by malware periodically for data exfiltration. To extract the stolen data, the attackers usually initialize several DNS requests each time for different subdomains.

The third cluster in Figure 2 is characterized by infrequent activity. These domains are mostly inactive but exhibit periodic bursts of DNS queries on a weekly basis, which is the representative behavior of malware heartbeat communication.

Anomaly DNS Traffic Detection

As demonstrated in the previous section, our clustering process revealed several commonly seen DNS profile patterns. However, we noticed that some DNS profiles presented uncommon trend patterns that were significantly different from others.

The significant variations in these trends could indicate either intentional attacking, malicious behaviors or unintentional issues. To identify all such irregular DNS profile trend patterns, we leverage an anomaly detection algorithm that helps detect outliers.

Figure 3 illustrates the DNS requests trend for the domain run[.]sh from a specific device that presents abnormal time series data. We notice that there is an abnormally high frequency of requests to the domain. Since the trend is relatively stable over the whole 24-hour time period, we conclude that this is most likely done programmatically.

Furthermore, the domain name is also a commonly used name for a bash script program. This leads us to conclude that the DNS requests may have been produced unintentionally.

The user may attempt to run a script that has the same filename but causes a DNS lookup to the file name instead. For instance, if someone uses the browser's address bar to search for a local file or GitHub file named run[.]sh, it may unintentionally cause a DNS lookup instead.

From our data, we see that besides this device, there are thousands of DNS queries for the same domain every day globally. At the time of detection, this domain was a parked domain and it was not involved in any attack campaign. However, if an attacker takes control of the domain, they can benefit from a large amount of unintentional DNS requests.

Case Study

In this section, we dive into detailed case studies focusing on the detection of malicious traffic patterns through DNS traffic profiling. This analysis covers various network abuses, each presenting unique traffic patterns.

Our profiling method can extract the distinct characteristics of different attacks, enabling efficient detection of intrusion attempts from large-scale network traffic logs. These cases demonstrate the efficacy of our system and provide insights into the landscape of network cyberthreat activities.

Command and Control

Attackers typically use C2 domains for servers that send malware and communicate with devices compromised by malware. Our real-time malicious DNS traffic pattern detector can effectively capture DNS traffic of C2 domains based on their characteristic behavior patterns.

Our detector identified one such C2 domain, biillpi[.]com. Figure 4 shows the DNS request trends for this domain. We observe that the domain receives requests that peak once per day with relatively stable gaps. This pattern correlates with the malicious network activity pattern of Trojans.

Malware typically stays dormant for extended periods of time and activates periodically to contact the C2 server in a heartbeat pattern to confirm connection and obtain further instructions. Therefore, the network traffic a Trojan produces is sparse and presents stable patterns. Furthermore, we also observed that the DNS traffic for this domain consists of many different subdomains with random strings, which could carry data that an attacker attempted to exfiltrate through DNS tunneling.

Malicious DDNS

DDNS services offer a way for domain owners to automatically update the IP addresses associated with their domain names on the fly. This allows websites and services to operate seamlessly with dynamic IP addresses.

Threat actors can abuse DDNS services for C2 traffic using different IP addresses over time for the same domain. This type of DNS traffic presents exclusive characteristics:

  • Compared to legitimate websites and services, malicious DDNS domains receive more sparse network traffic, as there would be no continuous visits to these malicious domains.
  • DDNS domain records will have short TTL values, so we would see more DNS requests within a single communication session to the attackers' infrastructure.

Figure 5 presents the DNS requests to a malicious Trojan's C2 domain robotatten[.]com, hosted by nameservers from the DDNS provider ztomy[.]com. The DDNS service resolves this domain to many IP addresses across the world, and each record has a DNS TTL of five minutes. We notice that while there are multiple DNS requests within each session, the overall trend over the course of days is relatively sparse.

Strategically Aged Domains

Strategically aged domains refer to domains that are registered and left dormant for months or even years before being actively used for attack campaigns. Advanced persistent threat (APT) groups occasionally use this strategy for their C2 domains so their traffic can evade traditional domain-based reputation checks.

A strategically aged domain's DNS traffic will present a sudden burst or pattern change during their activation. Our system is able to capture this indicative signal for cyberattacks from massive DNS traffic.

Figure 6 shows an example from this type of DNS traffic trend. An infected host periodically sends a limited amount of heartbeat traffic to the malicious domain pococo[.]ccbut not in a uniform manner, over the course of one day. After a successful infiltration, we would observe a much higher volume of DNS traffic toward the domains in high frequency for Trojan operations and data exfiltration.

Domain Squatting

Our detector also captures traffic toward squatting domains. One example is comcadt[.]net, which is a typosquatting domain mimicking a popular telecommunications company. Since the characters dand sare neighbors on the keyboard, an unintentional typographical error (typo) would lead a victim to the typosquatting domain.

Figure 7 shows how we observed that the DNS queries for comcadt[.]netexperienced alternating active and dormant phases, with each phase lasting several days. Investigating the DNS responses for comcadt[.]net, we found this domain was hosted by more than 50 different malicious IP addresses located in the United States and the Netherlands.

Furthermore, the DNS records only had a TTL of 10 minutes, indicating the possible use of fast flux, a technique that makes the malicious domain resolve to many malicious IP addresses that rapidly circulate. Cybercriminals can use this technique to improve the resilience of their attacking infrastructure while preventing investigators from effectively isolating and blocking their attacks.

Internet Scam

Scamming websites also generate representative DNS resolution patterns that are captured by the malicious DNS traffic detector. One example is the domain carollewis[.]network.

Figure 8 shows that DNS traffic for this scam activity is relatively sparse, since once victims notice the scam website, they tend to not visit it again. We also observed that most of this scam traffic appeared during business hours.

We find that the detected scam domain hosts several dynamically generated URLs. These URLs redirect the user to various suspicious landing pages.

An example URL is cqk1rt8hubcc73f3775g.networkcyclechain[.]com/01, which was a fake antivirus page when we checked it in a lab environment. Figure 9 presents a screenshot of the fake antivirus landing page.

Conclusion

Malicious DNS requests generated by attackers will present patterns that are distinct and different from legitimate DNS traffic. This insight allows us to identify malicious domains based on DNS traffic patterns and characteristics.

To capture the attacking indicators from the DNS traffic, we developed an autoencoder-based deep learning profiling solution to vectorize steaming DNS request traffic. Our autoencoder model is efficient and scalable to encode the DNS trends in real-time. Once we create a baseline of the DNS traffic, we leverage comprehensive threat intelligence to build the classifier that identifies the representative request patterns for various cyberthreats.

How Palo Alto Networks Incorporates Autoencoder-Based DNS Traffic Profiling Into Our Detections

Figure 10 shows the architecture of our system. Our traffic encoder ingests real-time logs from our Advanced DNS Security system to generate and continuously update DNS profiles for each domain and source tuple.

We store all the profiles to an in-memory database so we can achieve high throughput and scalability. The maliciousness classifier scans all updated profiles to hunt for emerging attacks.

The detected malicious domain names will be delivered to the Next-Generation Firewall through the cloud-delivered security services. So the firewall can block any further communication to these domains as soon as possible.

Figure 11 presents the detection performance of our detector. In May 2024, our detector captured 170 emerging malicious domains. All signatures from the detector blocked an average of 374,000 malicious DNS requests in our customers' networks every day during this period.

Palo Alto Networks Mitigations

Palo Alto Networks continuously monitors the traffic from the Advanced DNS Security customers to detect and block emerging cyberthreats as soon as they present suspicious network activities. The detected malicious domains are also shared with Advanced URL Filtering.

If you think you may have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:

  • North America Toll-Free: 866.486.4842 (866.4.UNIT42)
  • EMEA: +31.20.299.3130
  • APAC: +65.6983.8730
  • Japan: +81.50.1790.0200

Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.

Indicators of Compromise

Below is a list of domains and the URLs discussed in this article.

  • run[.]sh
  • biillpi[.]com
  • robotatten[.]com
  • pococo[.]cc
  • comcadt[.]net
  • carollewis[.]network
  • cqk1rt8hubcc73f3775g.networkcyclechain[.]com/01