HubSpot Inc.

09/10/2024 | News release | Distributed by Public on 09/10/2024 01:01

How Localization Teams are Leveraging AI For Faster Turnarounds

How Localization Teams are Leveraging AI For Faster Turnarounds

Published: September 10, 2024

Welcome to 'The AI Edge,' a series exploring how professionals across industries are using AI at work.

Today, we've got a Q&A style interview with HubSpot's Dierk Runne, a senior manager who leads the product localization team and oversees their AI efforts.

I wanted to find out if LLMs are making localization easier or if legacy systems (like neural machine translations) are still more reliable.

On y va!

Tell me what you do and your role at HubSpot.

The localization team at HubSpot is basically an internal service provider within the company for any team, stakeholder, or department with a translation/localization need.

Not sure if we want to get into the difference between the two here.

Tell us, what's the difference?

My go-to example is this: Translating a British car to an American road would mean simply moving it. Localizing it would mean putting the steering wheel on the other side because you drive on the other side of the road in the US.

Localization means more adaptation to the local market and more editorial freedom.

Got it! Now back to your role and team.

We work with marketing a lot with a fairly content-heavy marketing playbook and support our knowledge base that's available in multiple languages.

We have roughly 30 people on the team, including trained linguists and translators on our team, but only for the five core languages (other than English) that help operate in all the languages in which we have a website.

Beyond that, we work a lot with translation agencies that we outsource to but we take care of the coordination, the budget management, and sort of the communications management with the external resources.

I'm the manager for product localization and look after our software in-app localization that is under my team. I'm also the tech guy on the localization team, meaning when it comes to questions around how we use AI, you've come to the right person.

That's always good to hear. Where does AI fit into the work you do, if at all?

In the current conversation, what people refer to when they say AI, they mostly mean generative AI. With translation, the use of AI goes back a lot further.

In fact, the technology that the current LLMs are built on, like the transformer technology, was initially developed for machine translation purposes by Google. Google Translate, for example, has been running on a transformer called BERT since roughly 2016.

It's the same underlying terminologies, what changed with generative AI is mostly scale.

We and the team have been using machine translation, which is typically referred to as neural machine translations (NMT), meaning it's based on neural network architecture and typically transformer architecture.

We've been using that at scale since 2019 and the biggest use case that we have is HubSpot's knowledge base (KB).

The knowledge base has hundreds, if not thousands, of articles that get updated extremely frequently because, obviously, our product updates all the time.

One of the most important criteria for the translated KB is that the time to translation is as short as possible. The idea is that if an English article gets updated, it shouldn't take weeks until the translated versions are updated, the information needs to be there ASAP.

That's sort of a very standard use case for machine translation. Could it be done with human effort? Sure, but we would need a lot more people for that to take care of the sheer volume and frequency of updates that we're seeing there.

But anything that is high visibility, we typically want to have a set of human eyes on the output from the machine translation.

With KB, the use case is somewhere in between - we tend to review like the top 20% of articles in terms of view count.

Then there's stuff like user-generated content, like comments and forums. It's for the convenience of the user, and typically what should go along with that is a disclaimer that says, "Hey, this was generated via machine translation, and it might not have been proofed by a human."

That transparency piece is very important, to let people know about that upfront and not try to hide it in any way.

We are starting to use more AI translation in marketing content. That tool will always have a layer of human review.

We can generate a blog post using ChatGPT or any other LLM, but would you feel OK with just publishing it, site unseen? Probably not. The same is true for machine translation - there are use cases where we are OK with publishing the raw output.

As a follow up to that, a lot of the work you said is done by machine translation. More recently you're doing some more marketing content. How much are you relying on machine translation versus LLMs?

For the moment, the vast majority of it is still the traditional machine translation system.

Machine translation engines are highly specific tools, whereas LLMs are generalist tools.

You can ask an LLM anything and you will get an answer. Whether or not it's the quality answer is besides the point, but you can't ask a machine translation engine to draw your picture, right? That doesn't work.

It is a single-purpose tool, and because of that, they tend to still have the edge over the large language models when it comes to translation quality, when it comes to translation speed especially.

That being said, LLMs are getting there in terms of quality. There have been some surprising advancements, and it is something that we are keeping a close eye on.

We have started experimenting a bit with LLMs for translation for error detection in translation, sort of as a second layer, because LLMs have a better semantic understanding.

Basically, you get an output from a machine translation engine and then you let an LLM determine if it's a good translation. Is this accurate, is this fluent, is this using the right terminology?

More about the experimentation that you're running with the error detection with LLMs. Is there a specific one that you're using that you can share?

We have been working with an external service provider recently on running an experiment where we tested a whole bunch of different systems, including Open AI. That was mostly sort of a proof of concept.

Right now, we're mostly looking for ways to introduce these systems into our existing workflows because so far, everything has been kind of a dedicated project where we have to extract a bunch of content to put in front of the LLM. So it's out[side] of the typical process.

The next big thing is going to be, how can we insert these extra steps or extra checks into the process in a way that doesn't disrupt it, but hat provides added value within the process?

Generative AI is mostly looked at as an ancillary tool that can help improve translations even more.

Have you seen any early results in terms of which AI model was better?

In our testing, we got the best results with GPT-4 at the time - 4o wasn't released at that point yet. We tried two approaches: We tried one very simple "Is there an error in this translation: YES or NO."

We got it to roughly above 80% accuracy across all of the language pairs that we have, which might not sound like much, but it is actually pretty good.

The second sort of step that we tried is to get the LLMs to classify the error. There, we saw the accuracy go way down in some languages.

In the multilingual space, these models show - and the same is true for translation engines - vast differences in terms of output quality and capabilities between languages.

It's all an effect of training data because in some languages, there is just not a whole lot of training data available.

I know your team is still thinking through how you want to integrate LLMs into your workflow, but I'm curious, what does that process look like in an ideal world?

My ideal scenario would be defining certain content elements and assets in the HubSpot ecosystem that we would watch via an LLM. The LLM would point us toward it like, "Hey, someone should really be looking at this."

I think that's a pretty cool use case. In our case specifically, we can also use this to rework our existing translation database.

When it comes to translation, there's a database system called translation memory, which basically means everything that we've ever translated is stored in a gigantic database. If we ever have to translate this content again, we can reuse our existing translations.

It's faster and cheaper, and it provides a lot more consistency than re-translating everything from scratch every single time. And this database, too, is a good candidate for running an LLM over it to look for improvements and errors.

That's very interesting. Any other ideal use cases?

So far, if we translate content, the French page will look pretty much exactly like the English page, except everything is obviously still in the French. But one thing that we haven't really done a lot of is repurposing and restructuring content.

LLMs are very good at summarization and paraphrasing - this is something we might want to experiment with going forward.

That could be a pretty cool way of taking more advantage of our English content. We have a pretty massive content creation engine on the English speaking side.

Not just marketing, but sales enablement. We have tons of content-creating teams, and in the other markets,there are typically a lot fewer people working on something similar.

This could be a nice middle ground where we can more easily digest the amount of content on the English side and get it into other languages, but in a different format.

What would you say is the biggest roadblock?

From a technical implementation perspective, I don't think that there are any big roadblocks.

We can basically flip that switch and use GPT 4o for translation purposes pretty much any time we want. The big issue - and this is not just with generative AI, but it is a bigger issue with generative AI typically - is terminology. This is the number one issue in any automated translation process.

If you have something like HubSpot's Service Hub, for example, you want to always have that term listed as is. If you just put Service Hub into Google Translate,you will probably get different translations in different contexts.

That can be that is obviously a huge detriment and it is not by any means a solved issue. However, there are a lot of machine translation providers who have glossary systems. I would say deepL is probably the leader at the moment.

How would you recommend that a small to medium-sized business leverage LLMs? Or would you recommend actually that they lean more on machine translation still?

For a small-to-medium business, the assumption would be like there's probably not a lot of shepherding of the system that can be done on the side of the business.

In this case, I would probably actually recommend going with a traditional machine translation provider if the purpose is generating translated content.

It is a more focused system and it is easier to set up, whereas LLMs will likely need a lot more fine tuning and a lot of iterations to get you to where you would want to be.

It kind of goes back to LLMs being generalists, right? They will perform well across a variety of tasks, but if you want the peak performance for translation tasks, I would go with a machine translation provider.

And there are fantastic sort of out-of-the-box solutions that will get you very far, and would be quite easy to implement as well.

When it comes to LLMs, is there a scenario in which they're the best options right now, despite being generalists?

If someone is just looking for a plug-and-play solution, like you turn it on and everything is handled, then I'd say an LLM is pretty much almost as good an option as an out-of-the-box machine translation provider.

Where I would give the lead to machine translation providers is typically in the amount of customization that you can do. Or, I should say in how much effort it takes to achieve that customization. LLMs will take more work to get to get to the same point.

What would you say is the most overrated claim about LLMs right now?

There are so many - when it comes to localization in particular, there's this notion that we will lose our jobs.

The translation industry, localization industry, we've been there since 2016, when the current generation of neural machine translation providers was released.

We've gone through this entire discussion already, but it was confined to our niche, so it didn't make the broad news as much as it does now with LLMs.

Localization isn't much different than other areas, like code generation, for example. An LLM doesn't mean you can get rid of your engineers. You can, but at your own risk. The same is true in the localization space.

Can you replace linguists, translators, reviewers, with LLMs? Yeah, you can do that if you want, but you might not like the consequences of that.

Lastly, how do you see LLMs impacting the work that localization teams do?

Localization probably provides the perfect crystal ball there just because we have gone through a cycle of this already where initially, the hype is up here.

Everyone's talking about how translators are gonna be out of a job. It's now eight years later, it hasn't happened so far. In fact, we're busier than ever.

It doesn't come down to a question, "Do we have human translators, or do we have machine translators?"

Those systems allow you to increase your coverage, be that across languages or across content types significantly. The scale is what these systems are really, really good at and what they really allow you to do with and without human involvement.

I don't honestly foresee a massive change in how things have been going in localization with the introduction of LLMs, just because we were already at that point.

Don't forget to share this post!