Capturing clean data with Ai will save you in the future.

It seems that every week another company or new start-up is pitching some artificial intelligence (Ai) powered or data-driven product. The reasoning behind this is obvious, as key buzzwords and anchor points, Ai and data are the veritable pheromones for potential clients and investors.

In 2017, 53% of companies reported to using big data in at least one aspect of their business (Columbus, 2017); this number is expected to rise in the coming years. Companies need to be asking themselves where it is necessary to capture quality data, what they define as quality data and how they can utilize it to benefit their business. Ai and data are consistently linked together, in order to take the maximum advantage of Ai, the need for quality data to power it cannot be understated.

What is quality data? Quality data will vary from industry to industry and even between applications. The core principles will remain the same, however. Quality data is data that fits its intended use in operations. For translation memory, this means that the more memory matches we have, the higher the data quality is. Some companies use a multi-vendor model, often opting for the cheapest translation in the moment will not only lose translation memory benefits but will run the risk of having poor data in the future. The translation memory will certainly not match between all their vendors, and there is no easy way to clean this and provide it back to a single vendor.

A large multi-billion dollar technology company based in Seattle has translated over 500,000 words with Straker. In a recent project, this company received a 63% discount on a 75,000-word project by capitalizing on translation memory. This did not only save them money, but also time. The project was completed almost twice as quickly as the same project without any translation memory available.

With Ai set to revolutionize the technology industry, the translation industry is primed to reap the benefits of this revolution. With millions of words being translated every day, these words all provide data points which are able to power Ai and machine learning engine to improve and give better results to their customers.

So how can language service providers and their clients maximize on this potential?

The first and largest hurdle is capturing that data. At Straker Translations, we use an Ai powered translation technology, such as translation memory. Translation memory works by ‘remembering’ a client’s previous translations. When a translator works on a new job, and they come across words and phrases that have been translated before, a translation memory will flag up these repetitions. The more words a client translates, the more the machines learn. Translation memory is unlimited. All updated content is automatically uploaded into the client’s bespoke translation memory for the next time.

Keeping this memory up to date or ‘clean’ is the key to powering machine translations in the future and can help to lower the cost and speed up the time it takes to complete a translation project.

There are plenty of companies trying to tackle the machine translation problem, from behemoths like IBM and Google to niche providers such as DeepL. None of these companies have been able to disrupt the 40 billion USD translation industry, mainly because languages are so subjective. Every company has its own internal language, and capturing this data within the translation memory is paramount to the success of any potential machine translations in the future for that company.

Straker has managed to shake up the translation industry by developing a best-in-class Ai data-driven machine translation engine interface to enable humans and machines to work together for maximum efficiency, allowing centralized quality control, speed improvements and adaptive machine learning.

As language service providers look to gain a competitive advantage, cutting costs for existing and potential clients is key. Utilizing machine translations as a first pass is commonplace in the industry, a model known as post edit.

The majority of providers pay their editors by the hour, but charge their clients per word. The best way to bring this per word cost down is to utilize the client’s translation memory to power the machine translation. This can effectively cut costs down by 30-40% compared to using a traditional model. When done correctly, the costs will only go down as the machine translation becomes more and more accurate. This will be powered by capturing more and more data, allowing Ai to continually redefine the translation process and reduce the time and effort needed for humans. While human translators are still needed, Ai and clean data will allow for humans and Ai to work together seamlessly and produce the best quality work for clients.

Companies who are regularly using language service providers should find out how their language service providers are using their data. Who owns this data and if they are using a post edit model. Future proofing the workflow now will allow companies to gain greater efficiencies and cost-cutting in the future.

By Gianluca Savenije
New Business Acquisitions Manager

Columbus, L. (2017). 53% Of Companies Are Adopting Big Data Analytics. Retrieved from