12/01/2024 | Press release | Distributed by Public on 12/01/2024 21:38
Today, we are announcing support for Snowflake and Amazon Athena as new sources for AWS Clean Rooms data collaborations. AWS Clean Rooms helps you and your partners more seamlessly and securely analyze your collective datasets without sharing or copying one another's underlying data. This enhancement helps you collaborate with datasets stored in Snowflake or those queryable through Athena features, such as AWS Lake Formation permissions or AWS Glue Data Catalog views, without moving or revealing the source data.
You often need to collaborate with partners to analyze datasets to get insights for research and development, investments, or marketing and advertising campaigns. In some cases, your partners' datasets are stored or managed outside of Amazon Simple Storage Service (Amazon S3), and companies want to reduce or eliminate the complexity, cost, compliance risks, and delays that are associated with moving or copying data. Companies also find that copying data can result in them using outdated information, potentially reducing the quality of the insights gained.
This launch helps companies to collaborate on the most up-to-date collective datasets in an AWS Clean Rooms collaboration with zero extract, transform, and load (zero-ETL). This eliminates the cost and complexity associated with migrating datasets out of existing environments. For example, an advertiser with data stored in Amazon S3 and a media publisher with data stored in Snowflake can run an audience overlap analysis to determine the percentage of users present in their collective datasets without having to build ETL data pipelines, or share underlying data with one another. No underlying data from external data sources is permanently stored in AWS Clean Rooms during the collaboration process and any data temporarily read into the AWS Clean Rooms analysis environment is deleted upon query completion. You can now work with your partners regardless of where their data is stored, streamlining the process of generating insights.
Let me show you how to use this feature.
How to use multiple clouds and data sources in AWS Clean Rooms
To demonstrate this feature, I use a scenario between an advertiser, Company A, and a publisher, Company B. Company A wants to know how many of their high-value users can be reached on Company B's website before running an ad campaign. Company A stores their data in Amazon S3. Company B stores their data in Snowflake. To use AWS Clean Rooms, both parties must have their own AWS accounts.
In this demo, Company A, the advertiser, is the collaboration creator. Company A creates the AWS Clean Rooms collaboration and invites Company B, who has data hosted in Snowflake, to collaborate. You can follow the specific steps to create a collaboration in the AWS Clean Rooms general availability announcement blog post.
Next, I show how Company B, the publisher, creates a configured table in AWS Clean Rooms, specifying Snowflake as the data source and providing the Secrets Manager Amazon Resource Name (ARN). AWS Secrets Manager helps you manage, retrieve, and rotate secrets such as database credentials throughout their lifecycles. Your secret must contain the credentials for a Snowflake user with read-only permission to the data you want to collaborate with. AWS Clean Rooms will use it to read your secret and access the data stored in Snowflake. See the Secrets Manager documentation for step-by-step instructions for creating your secret.
Using Company B's AWS account, I go to the AWS Clean Rooms console and choose Tables under Configured resources. I choose Configure new table. I choose Snowflake under Third-party clouds and data sources. I enter the Secret ARN for the secret that contains Snowflake credentials for a role with read access to the dataset stored in Snowflake I want to collaborate with. These are the credentials that you use to verify the identity of the entity trying to access the Snowflake table and schema. If you don't have a secret ARN, you can create a new secret using the Store a new secret for this table option.
To define the table and schema details, I use the Import from file option and choose the Columns View Information Schema CSV file I exported from Snowflake to populate the information for me. You can also enter the information manually.
For this demo, I choose All columns under the Columns allowed in collaborations. Next, I choose Configure new table.
I go to the configured table and observe the table details, such as AWS accounts allowed to create queries and columns available for querying. On this page, I can edit the table name, description, and analysis rule.
As part of configuring a table to use in AWS Clean Rooms for collaboration analysis, I need to configure an analysis rule. An analysis rule is a privacy-enhancing control that each data owner sets up on a configured table. An analysis rule determines how the configured table can be analyzed. I choose Configure analysis rule to configure a custom analysis rule that allows custom queries to be run on the configured table.
In Step 1, I proceed with the selections. You can use JSON editor to create, paste, or import an analysis rule definition in a JSON format. I choose Next.
In Step 2, I choose Allow any queries created by specific collaborators to run without review on this table under Analyses for direct querying. With this option, only queries provided by the AWS accounts that I specify in the list of allowed accounts can be run on the table. All analysis templates created by the allowed accounts will automatically be allowed to be run on this table without requiring a review. I choose the allowed account under AWS account ID and choose Next.
In Step 3, I proceed with the selections. I choose None under Columns not allowed in output to allow all columns to be shown in the query output. I choose Not allowed under Additional analyses applied to output, so no additional analyses can be run on this table. I choose Next.
In the final step, I review the configuration and choose Configure analysis rule.
Next, I associate the table with the collaboration Company A, the advertiser, created using Associate to collaboration.
On the pop-up window, I choose a collaboration from the ones with active memberships and select Choose collaboration.
On the next page, I choose the Configured table name and enter the Name under Table associations details. I choose a method to authorize AWS Clean Rooms to give the permission to query the table. I choose Associate table.
Company A, the advertiser, and Company B, the publisher, can now run an audience overlap analysis to determine the percentage of users present in their collective datasets without accessing each other's raw data. The analysis helps determine how much of the advertiser's audience can be reached by the publisher. By evaluating the overlap, advertisers can determine whether the publisher provides unique reach or if the publisher's audience predominantly overlaps with the advertiser's existing audience, without either party having to move or share their source data. I switch to Company A's account and go to AWS Clean Rooms console. I choose the collaboration I created and run the following query to get the audience overlap analysis result:
select count (distinct emailaddress) from customer_data_example as advertiser inner join synthetic_customer_data as publisher on 'emailaddress' = 'publisher_hashed_email_address'
In this example, I used Snowflake as a data source. You can also run queries on this data using Athena while following AWS Lake Formation permissions. This helps you do row- and column-level filtering with Lake Formation fine-grained access control and transform data using AWS Glue Data Catalog views before the datasets are associated to the collaboration.
Customer and partner voices
"Data security and privacy is essential to our work at Kinective Media by United Airlines, the world's first traveler media network," said Khatidja Ajania, Director, Strategic Partnerships, Kinective Media by United Airlines. "AWS Clean Rooms support of source data in multiple clouds and AWS sources enables us to securely and seamlessly work with more brands to deliver on closed loop measurement and other key use cases. This enhancement will make it easier for us to securely deliver personalized experiences, content, and relevant offerings to millions of United travelers through privacy-enhanced collaboration with our advertisers and partners."
"Snowflake recognizes the challenges of source data interoperability across tech stacks when using data clean room technology; we are excited to see the progress and one more step taken in the direction of a shared goal to empower users to unlock the full potential of their data partnerships through their solution of choice, safely and effectively" - Kamakshi Sivaramakrishnan, General Manager, Snowflake Data Clean Rooms
Now available
Support for Snowflake and Athena as data sources in AWS Clean Rooms offers significant benefits for cross-cloud collaboration. This launch eliminates the need for data movement across clouds and data sources and simplifies the collaboration process. This is a first step in our efforts to expand the ways in which customers can securely collaborate with any of their partners while protecting sensitive information, regardless of where their data is stored.
Get started with AWS Clean Rooms today. To learn more about collaborating with multiple data sources, visit the AWS Clean Rooms documentation.
- Esra