Translating Images and Scanned Documents
Translation technology has come a long way in recent years. From Neural Machine Translation (NMT) to Translation Management Systems (TMS) – technology is playing a significant role in aiding human translators and lubricating their workflow. Nevertheless, not all translation tasks are created equal. Especially when it comes to translating images. You won’t always have a pristine, cleanly formatted document in an ideal file type that needs translating. More often than not, you may find yourself in possession of a multitude of uneditable nor unfriendly file types that simply don’t work in the way you want them to.
Just the thought of rewriting files for translation is galling. An infographic image containing important text elements, to a stack of physical documents that you’ve labored to scan and digitize. When you factor in the hours required for such tasks, it can become an unwanted and unexpected expense.
Have. No. Fear. Text United has put together a simple how-to guide to save you time, money, and most importantly sanity. We’ve busted the jargon, included some examples, and even pointed you in the direction of some useful free-services to boot.
#1 OCRs are your best friend
An OCR or Optical Character Reader to use the full name does pretty much what it says on the tin. The technology is designed to scan characters in non-editable file types and convert them into text-friendly documents.
Most OCRs will take your file and convert it into something more palatable for translation. For example, a .docx or .txt, which you can then easily feed into your Computer-Aided Translation Tool (CAT Tool for short). There are plenty of free-to-use OCRs available online like Free OCR to Word or FreeOCR. You’ll even find Google Docs has a built-in feature (just upload to your drive, right-click and open with Google Docs).
Whether you use Text United or somebody else for your CAT Tool needs, this is the next step in your translation journey.
Once you’ve got your converted file, double-check that all of your text is present and correct. While OCRs have a good hit rate, it isn’t always possible for them to lead to 100% accuracy and the quality of the document/image is a big factor.
“In a sample of 45 pages to be representative of the libraries digitized newspaper collection 1803-1954, we found that raw OCR accuracy varied from 71% to 98.02%”
– Rose Holley, How Good Can It Get? – D-Lib Magazine
#2 OCR Example Results
As a quick sample, we put one of our PDFs through a free online OCR converter that outputs a nice handy .docx file, which we could easily upload to Google Docs and edit from there.
Although it skewed our formatting and design a little, you can see the accuracy of the OCR conversion was very high. Also, the mistakes are now easily correctable before entering the translation phase!
As we mentioned, our PDF file is high quality and modern, and the OCR has no issue converting it. Nevertheless, a low-quality scan of a physical document may yield less accurate results.
Another potential element to consider that could affect your outcome is the quality of the software you decide to use. A free service will certainly do the trick for small batches of high-quality image files. However, a more advanced and paid OCR tool like Abby Fine Reader will better handle low-quality images while offering a more comprehensive toolset. This can enable you to recreate the layout of the scanned document and correct any errors within the software. Furthermore, it will automatically detect low-resolution text that is not fully readable to the program. It also allows you to manually correct before creating your output file.
#3. Eyes on the Prize
As mentioned above, a clean and tidy document is essential to maximize the outcome from your CAT Tool. An OCR that attempts to recreate the layout of your original file may result in information that is out of context, making your translator’s life more difficult!
Keep a keen eye on your formatting and spacing when it comes to your OCR. Since converted text can end up broken down into unnecessary segments, and a single sentence can become multiple segments, as shown below:
#4. Lighten The Load
If time is of the essence and you’re looking for an all-in-one solution, you’ll be pleased to know that Text United supports 36 file types, including translating images and PDF(!)
We’ll create a translatable version of the file inhouse, and our professional human translators will do the rest! Once the translation process is complete, we make sure that you receive the translated version of the file in the original format and layout.
Take a look below to see more of our core features, including the ones discussed in this article.