HubSpot Inc.

07/18/2024 | News release | Archived content

AI's Role in Protecting Consumer Data

AI's Role in Protecting Consumer Data

Updated: July 22, 2024

Published: July 18, 2024

Two years back, I was wowed reading this Guardian article written by AI. Today, I can't imagine a workday without using ChatGPT.

AI is a productivity genius, and it's only going to get better. But it's not without its flaws.

AI's overnight success rate has left privacy checks and data protection laws behind. Not good news if you're a company that deals with sensitive user data.

Deploying off-the-shelf AI chatbots without guardrails might harm your business. Just ask Air Canada!

While AI is great at simplifying and expediting processes but it can also "create" undesirable content based on personal data.

So what can you do as a business to protect consumer data and use AI responsibly? Let's get into that.

The Privacy Pitfalls of AI

Most of us pay little attention to what we feed the generative AI tools that help us in so many ways.

From product development cycles, and code reviews to marketing and hiring strategies - a lot of first-party data gets shared with third-party AI tools.

Since the laws and understanding of how AI works are still limited, we're wandering into uncharted territories.

How do the mainstream AI tools work?

The tools we call AI are essentially large language models (LLMs) that are great at pattern recognition, imitation, and prediction. LLMs need large datasets (hence, the name) to read and learn from before they effectively reply to user prompts.

When you ask ChatGPT to write a marketing strategy for Gen Z, it doesn't understand the context. But after reading thousands of content pieces on similar topics, it knows enough to predict the most appropriate answer.

The same thing happens when you ask DALL-E to draw a picture of a cowboy astronaut.

The point of contention is these large datasets that AI tools need to train themselves. Quality and relevancy matter a lot here and companies are ready to pay top dollars for training data.

AI companies use public and licensed datasets to train their products. These datasets consist of years of social media posts, forum discussions, blog posts, and even the entire Wikipedia. For example, GPT-3 was trained on 45 terabytes of text data.

User Data Security Risks

If something is online and publicly available, chances are it's used to train several AI tools today. Here are some reasons why privacy becomes a concern:

Lack of Transparency

We don't know how and where these datasets are stored, what measures AI companies take to protect sensitive info, and how protected users are against cyberattacks.

Sure, OpenAI takes the lead with extensive documentation but the companies are not yet regulated to provide transparency.

On top of that, we don't know how the AI tools create an answer. Since generative AI often hallucinates or makes up facts, it can create legal challenges for businesses.

Lack of Accountability

There's a great confusion regarding who's responsible for generative AI's mistakes. In Air Canada's case, it tried to argue the chatbot is a separate legal entity but failed to convince the court.

Unless you're using the ChatGPT Enterprise plan, OpenAI tracks your chat history to train its model by default. You can opt out of it but the feature is buried deep in your settings.

Now, picture this scenario: A customer service rep wants to send a follow-up email to a client recapping their conversation and outlining their requests.

They open their personal OpenAI account to ask ChatGPT to write a draft, including personally identifiable information (PII) to produce a more detailed output.

That interaction then becomes part of OpenAI's training model, including the client's personal data.

Many companies have yet to set guardrails on how their employees can (and should) leverage AI tools.

This lack of accountability for data protection can lead to incidents of synthetic identity theft and breaches in customer data.

Lack of Well-Defined Policies

At the end of the day, it all boils down to the regulatory framework.

Unfortunately, authorities seem as clueless as users about AI. We're yet to see established policies that hold LLM and AI companies liable for data collection and storage.

Companies and government agencies are working together to create policies for safe and meaningful AI usage but GDPR-like implementation is still far away.

Steps to Protect Consumer Privacy

If you're a business using AI to its full potential, you need to make concerted efforts to safeguard consumer data.

Here are a few ways you can protect privacy:

Communicate with users and allow them to opt out.

If you're using an AI chatbot to serve customers, make sure you put a disclaimer that explains how customer data can be processed and ways customers can opt out of that.

When customers know what they're using and the implications, they will be less likely to be surprised by AI hallucinations.

Providing information to your customers on what to do if their identity is stolen could be an important part of any communication strategy.

Air Canada's case is a good example of how transparency could have helped the company avoid the lawsuit.

Establish a privacy-first design.

This is an extension of the first step. If you have a company-wide privacy-first design, you're more likely to protect user data from leaking.

The privacy-first design puts user data and privacy at the forefront of security and implements steps to comply with regulations and gain consumer trust. It includes proactive maintenance, end-to-end security, transparent documentation and communication, and respecting consumer data.

This privacy-first design, combined with proper AI infrastructure, ensures comprehensive protection and trust in handling user data.

Improve dataset quality.

Since data governance is a major part of privacy-first design, you can try using zero-party and first-party data to train generative AI.

Meta's Llama model is all about custom training models that fit unique use cases and ChatGPT has plugins and custom models to train on datasets supplied by users. If you can feed the AI loads of internal data, it can serve your employees and customers better.

Custom datasets are going to be the norm going forward as more publications block GPT crawlers and decide to train their own LLMs. For example, Bloomberg released BloombergGPT last year, based on years of proprietary financial reports.

Authentic data is only the first step. You have to actively weed out algorithmic biases to reduce user discrimination and maintain data hygiene to reduce the attack surface.

Educate employees.

Employees who are aware of the pitfalls of data-hungry AI tools are more likely to keep data safe. But they're also an easy gateway to troves of AI data if you're not careful.

Ironically, AI-powered phishing attempts, deepfakes, and CEO frauds have given rise to a new breed of cybercrimes that are precise, effective, and hard to detect.

Employees need to be resistant to identity theftand social engineering attacks to not only protect themselves but also customer data.

Train your employees regarding public data sharing, AI tool management, and compliance. This includes not uploading random pictures to get those cinematic portraits, reading the privacy policies before inputting data, and clearing chat history frequently.

At the workplace, encourage them to only use enterprise versions, tweak settings to keep data secure, and frequently conduct seminars to keep them up-to-date with industry developments.

Follow global data protection laws.

It's a good idea to stick to established data privacy laws even if you're not compelled to follow them. Policies like GDPR, CASL, HIPAA, and CCPA provide a good framework for data collection, consent, and the right-to-be-forgotten laws.

Even though the laws don't have fleshed-out clauses for AI regulations yet, the overarching ideas are similar and can be implemented in any organization that values privacy.

AI as a privacy protector?

Enough of the fear mongering -AI's impact is vast and can be felt on both sides. But it can also lead the way in data protection.

Rethinking AI Training Datasets

Raw training data can lead to large-scale data breaches. which is not good news for the consumers. However, stopping AI models from accessing up-to-date datasets is not the solution.

The middle ground here is federated learning and additive secret sharing (ASS).

Federated learning decentralizes datasets by allowing the central model to interact with local models stored on the device to create plausible outcomes. This reduces data dependency and preserves local datasets.

ASS takes it further by collating encrypted datasets in the central server and pushing encrypted results back to different devices. ASS essentially protects data in transit.

Next up is the differential privacy framework which is essential for large-scale statistical analysis. If you have a bulk of PII-attached data, differential privacy adds "noise" or a close but random number that hides the sensitive parts from being accessed by AI models.

It only shares the necessary data with AI tools while preserving the PII. When you use AI to build reports and dossiers, differential privacy can save your data from leaking.

Additionally, generative AI can create synthetic data that mimic real-world numbers without actually using them. It anonymizes the training data so actual data can be protected.

AI-Powered Cybersecurity

According to Google Cloud's cybersecurity forecast, generative AI will contribute to sophisticated cyberattacks in 2024. The best way to tackle bad AI is with good AI.

Generative adversarial networks (GANs) are used to analyze existing cybercrime patterns and develop plausible attacks based on historical data.

It's a lot more accurate than it sounds because GANs split the training into two parts - a generator that creates plausible cybercrime scenarios and a discriminator that evaluates the result and decides whether it's possible.

At some point, the discriminator will judge more outcomes as possible than not, which indicates the GAN can be used to take proactive steps against future cybercrime.

As GANs keep developing based on extensive data, they'll be able to detect the slightest of anomalies and help security teams prevent cyberattacks from ever happening.

Companies like Aura, Darktrace, IBM, and SentinelOne already use AI to stay ahead of criminals, monitor for fraud using three bureau credit monitoring, and protect the system integrity.

Going forward, we expect mobile device management (MDM) policies to extend to bring your own AI (BYOAI) as more employees will use democratized custom models.

Banning BYOAI may not be a success for companies as employees will look to extract efficiency gains without informing the company, leading to potential data leaks.

In such cases, businesses should work with employees to build and deploy responsible AI models and encourage them to operate under MDM.

What is AI's role in a cookie-less world?

In a more recent development, AI seems to be the marketing champion in the post-cookie world. Remember the annoying popups on websites that ask you to accept cookies? Google is getting rid of them with something more sophisticated.

Cookies track users across the internet to help marketers serve better ads to them. Depending on execution, this could be invasive to online privacy.

Google's alternative to this is a privacy sandbox that will attach three broad topics to a user which will be shared with advertisers to serve ads. This drastically reduces the effectiveness of ads but AI can help!

Since marketers can't get specific behavioral parameters anymore, they can use AI to be more precise and effective. With synthetic data and differential privacy frameworks, marketers can A/B test strategies, feed research, and extract insights from data dumps - all with the help of AI.

Consumer Data and Their Locations

User privacy in the age of AI is a hotly debated topic simply because we don't have enough data to act on it. AI regulations will have far-reaching consequences for businesses so authorities can't afford to miss the mark.

Several policies are being created and amended to address the rapidly evolving AI implications. Here are a few of them you should follow and keep your eyes on:

  • CRPA: The California Privacy Rights Act (CRPA) is the evolved version of the original CCPA. CRPA borrows heavily from GDPR to enforce companies dealing with California residents to communicate their data sharing and storing policies. Consumers must also be allowed to opt out should they feel like it.
  • Delete Act: Under the Delete Act, several states like California, Vermont, and Texas require data brokers who collect and sell user data to register with the states. California goes a step further and allows users to request data brokers to permanently delete their data.
  • Partnership on AI: The PAI coalition focuses on the safe and responsible development of AI models to make sure they protect user data and stay compliant with evolving security threats. It's a growing coalition with extensive resources for companies that want to have more control over their AI models.

So, what's the takeaway?

For every AI manipulated robocall or deepfakes we have incredible stories of productivity hacks, strategy building, and personal development.

It's true AI has ushered in a new age of privacy nightmare we're ill-equipped to handle right this moment but things will only develop and crystallize into regulations.

Till then we should look beyond the charm of the word "AI" and see it for what it is today - a sophisticated pattern recognition and creation tool based on large sets of user data.

If we adopt the accountability of data protection and follow fundamental concepts of privacy as laid out in GDPR or CRPA, we'll be able to keep our consumer data protected.

Don't forget to share this post!