ICANN - Internet Corporation for Assigned Names and Numbers

10/10/2024 | News release | Distributed by Public on 10/10/2024 09:54

Updates on ICANN’s Domain Abuse Activity Monthly ReportsCarlos Hernandez GananSiôn LloydSamaneh Tajalizadehkhoob

Since February 2024, ICANN's Security, Stability, and Resiliency (SSR) research team has noted a significant rise in reports within the Domain Abuse Activity Reporting (DAAR) system. In response, we have resumed DAAR monthly report publications and started a thorough investigation to understand the surge better, as explained in our earlier announcement. It took some time, but the SSR team now has a clearer understanding of various factors that have influenced the underlying data. In this blog post, we share some of the lessons learned throughout this process:

Understanding the Surge

A small reminder of how the DAAR monthly reports are created: The DAAR system is run by our DAAR contractor, CyberToolBelt. The system takes the list of all domains from TLD zones and lists of "reported abuse domains" from multiple Reputation Block Lists (RBL) as inputs. Once processed, it provides an aggregate monthly output file that is then used by the SSR team to create the DAAR monthly report. All the steps in this system from data collection to processing to creating DAAR reports are automated; however, the SSR team eyeballs the monthly DAAR reports before we publish them to detect possible anomalies. To understand this surge observed in February 2024, the SSR team analyzed several potential factors for the increase in suspect domain reports, considering:

  1. An increase in the total number of reported domains.
  2. Changes in data collection methods.
  3. Changes in data attribution.
  4. Modifications in the collection infrastructure.

These factors could originate from either our DAAR contractor, CyberToolBelt, or RBL providers of DAAR: SURBL, Spamhaus, Anti-Phishing Working Group (APWG), PhishTank, Malware Patrol or Abuse.ch.

To uncover the cause of the surge, we engaged with CyberToolBelt and RBL providers to discuss any observed patterns or policy changes. Our findings revealed that the increase resulted from multiple factors:

  • A natural rise in the number of reported domains.
  • A policy change that resulted in a technical pipeline issue. One of our RBL providers, Spamhaus, identified a group of domains being reused for malicious Domain Name System (DNS) servers, a trend also observed by other feed providers. To prevent malicious use of these domains, Spamhaus decided to extend their retention time, resulting in a significant increase in the cumulative count of abused domains on their list. This policy change occurred in late 2023. The technical implementation of this policy change, combined with how CyberToolBelt set up their collection method and pipeline, created an unexpected mismatch. Consequently, CyberToolBelt received all the new "retained" domains in one batch in February 2024, instead of the gradual increase that actually began in late 2023.

Key Takeaways

From our discussions with RBL providers and our DAAR contractor, we gained valuable insights relevant to our community, especially those measuring DNS abuse.

Interconnected Data Dependencies: Anyone experienced with large-scale data pipelines knows that such issues are common. Data pipelines often face a domino effect; a change in one component can impact the entire system. Fortunately, we did not have one RBL influencing others, and we detected the unusual pattern early, preventing further confusion.

Understanding RBL Use Case: RBL data providers collect data at scale. As noted in our OCTO document (OCTO-37), clients use this data for various purposes, such as whitelisting, spam filtering, block listing, and occasionally for abuse metrics. Each client focuses on their specific needs, often overlooking the broader ecosystem, unlike ICANN. Consequently, they lack the awareness and incentive to inform us about changes in their collection or retention policies. For us, even minor changes are significant as we study the DNS abuse landscape over time. Most clients neither notice nor care about these changes. While we clarify how our use cases differ from theirs, it's our responsibility to use their data properly.

Data Granularity Challenges: Given the reasons above and the vast amount of data generated daily, RBL providers lack the capacity to store long-term, granular data. Some update their feeds every 15 minutes, complicating fine-grained historical investigations. To address this, we needed to cross-reference three independent data sets: ICANN's internal RBL data, CyberToolBelt's, and the RBL providers.

Recommendations for Future Practices

For those using RBLs, consider the following:

  • Implement Alerts and Logs: Use alerts to detect unusual increases in file sizes, retaining files that triggered alerts for future investigation.
  • Monitor Metrics: Track metrics like the average lifetime of entries in RBLs to detect potential changes.
  • Adapt to Change: Be prepared for changes in processing pipelines, as what works today may not suffice tomorrow.
  • Embrace DataOps Best Practices: Automate data pipelines, ensure data quality, foster collaboration, and maintain version control to streamline operations and reduce errors.
  • Utilize Monitoring Tools: Deploy tools such as integrate.io or Fivetran for real-time data flow tracking and anomaly detection.
  • Conduct Audits: Regularly audit and stress-test data collection and processing pipelines to identify bottlenecks.
  • Maintain Communication: Keep open lines of communication with data providers to stay informed about infrastructure changes.

DAAR Monthly Reports

For the months impacted by the recent changes, January, and February 2024, we will issue updated reports while retaining previous versions on the DAAR webpage. We will continue to provide reports for the remaining months, consistent with past practices.

Moving Forward

While DAAR has fulfilled its initial purpose, it is becoming outdated. ICANN is leveraging industry insights, academic research, and community input to develop a new system that addresses broader issues beyond DNS reputational data.

We also acknowledge the efforts of other community members in advancing DNS abuse measurement and research. We encourage those interested in similar tools to explore resources like Netbeacon, Clean DNS, and DAP.live.

In conclusion, these lessons learned will inform our ongoing efforts to improve data handling and reporting, ultimately strengthening our understanding of the DNS abuse landscape.

Authors

Samaneh Tajalizadehkhoob

Director, Security, Stability and Resiliency Research
Read biography

Samaneh Tajalizadehkhoob

Director, Security, Stability and Resiliency Research

Samaneh is a reporting to John Crain, Chief Security, Stability & Resiliency Officer and is part of the Office of CTO (OCTO) group. She is based in ICANN's Europe Region and will be working remotely from the Netherlands. As the SSR Specialist, Samaneh works in close coordination with other ICANN organization functions to implement ICANN's Security, Stability and Resiliency strategies. Samaneh carries out research on DNS security and abuse. She also represents ICANN on matters relating to the SSR of the Internet's system of unique identifiers within ICANN's remit as well as helping to develop technical work, positions and produce materials related to the administration of those identifiers from an SSR perspective.

Samaneh is from a multi-disciplinary background. While she is an Electronics Engineer by training, she studied Engineering and Policy Analysis for her masters. She holds a PhD degree in Internet Security and Data Analytics from the Delft University of Technology in the Netherlands. She worked as a Post-Doctoral researcher at the same university where she did research on banking security and underground markets utilizing advanced statistical techniques and machine learning.

She has collaborated with other research teams as a visiting scholar; at KU Leuven, DistriNet Research Group she worked on Internet measurements to estimate web vulnerabilities and measure patching practices of hosting servers. Additionally, she worked with scholars from the security and privacy lab at University of Innsbruck on designing abuse metrics that can reliably measure security performance of Internet identifiers.

Samaneh has authored publications on web security, cyber security, Internet measurements, underground economy, and development of security metrics design using advance statistical methods.

Samaneh speaks English, Farsi, Dutch and has basic knowledge in Arabic. She is a big fan of board games. In her free time, she runs, plays tennis, and piano.

Siôn Lloyd

Principal Security, Stability & Resiliency Scientist
Read biography

Siôn Lloyd

Principal Security, Stability & Resiliency Scientist

Siôn Lloyd joined ICANN in January 2020 and currently serves the organization as a Lead Security, Stability, Resiliency Specialist.

Carlos Hernandez Ganan

Principal Security, Stability & Resiliency Scientist
Read biography

Carlos Hernandez Ganan

Principal Security, Stability & Resiliency Scientist

Carlos Ganan joined the ICANN organization on January 2020. He currently holds the title of Lead Security, Stability & Resiliency Specialist.