How to translate XML files effectively?

xml.file.translation

XML is widely popular because XML’s structure enables you to separate content from its form and to categorize data in a way that can be processed by software.

What is XML?

The XML (Extensible Markup Language) file format is one of the most widely used file formats for representation and transferral of data structures. It is also used in XML markup languages, which are present in many other popular file formats, such as SVG, XLIFF, DOCX, XLSX, and TMX.

XML files can be edited with any conventional text editor. However, special XML editors are often used to create and edit XML files. These editors have an advantage over usual text editors as they prevent the creation of faulty XML files. Editorial systems and authoring tools often work with XML as their default format these days.

XML files can, but do not have to, have a file extension .xml. As it is often a case that markup language files are created on the basis of XML, these files have their own file extensions. For instance, SVG, which is an XML-based graphic format for the representation of 2D vector graphics, or popular formats in the translation industry like TMX, TBX, and XLIFF.

We’re not going to cover the more specific XML-based file types, such as XLIFF (although you can find more info about it here), but we will explore how to effectively translate XML files using Text United platform. We will also briefly cover TMX files, as they are used for the import and export of Translation Memories, which is another extremely important feature of the Text United translation system.

Are you ready to discover the XML magic? Then keep on reading!

The basics

First of all, we need to explain the basic structure of the translatable content in XML files.

A short example would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

In this example, there are four elements that contain content for translation.

These elements are

<title>, <author>, <year>, and <price>

Two elements contain the elements themselves, namely

<bookstore> and <book>

One element, <book> has an attribute <category>, which has two values: ‘children’ and ‘web’.

You may wonder why this is so important if you just have an XML file that needs translating. The answer is not that simple! Think about your experience. Sometimes, you want to translate parts of the file, not the whole content of the XML. What do you do then? Do you extract the content manually, or is there an easier way to do this?

The process

Using the example above, let us say that you only want to translate the book title ‘Learning XML’. For this, you would have to exclude the element with the attribute and the value . When translating XML files, the document setting templates are of importance.

Using Text United, you can do this upon uploading your file by modifying the setup of the XML filter. In the extraction settings, you just must define which elements you want to exclude (so, the element <book> with the attribute <category> and the value <children>).

Voila! The system does everything else by itself, and you can carry on localizing your file. In the translated file, you will get all the elements translated, and only the excluded elements will remain untouched. This option enables you to customize, play with, and adapt your XML translation to your own specific needs.

The mystery of TMX

Let us touch upon the subject of a special type of XML file, which is the TMX file format. TMX files can be used only by certain applications because they are created according to Translation Memory Exchange (TMX) format, created by CAT and other localization applications, and are used for exchanging translation memories (TM).

At Text United, we use TMX as the default way to import and export translation memories to and from our system. If you want to use your old TMX files and to reuse previous translations, you can simply import them to the platform directly from your Text United account. Once uploaded, the translations will be used in all projects between the given language combinations.

Translate XML files without the hassle

If you manage your translation projects using a file-based approach, you need to take care of a few factors that will make your workflow run more smoothly.

For instance, when it comes to XML files that have html tags (for example, <p>) you can choose the filter XML + HTML.

    There are two things to keep in mind when setting up filters for XML files:

  1. The segmentation needs to be double-checked
  2. The tags are crucial

Double check the segmentation
If you have a string that reads i.e. “The segmentation is <b>very</b> important.”, you need to know that the <b> element will break the string down into three parts:

  • The segmentation is
  • very
  • important

In order to have all three parts within one translation segment, make the <b> element “internal”, in order that it will not break strings in XML files into multiple parts.

Tags are crucial:
You will see small green numbers in segments, when you start translating your file. This is meant to protect the elements in your XML file. For example, the <b> element you saw in the previous example, would be presented as green numbers 1 and 2.

It is very important to understand that tags cannot be deleted, as you will corrupt your file.

Do you need assistance with translating your XML file? Or maybe, you have questions about uploading your TMX files to our system? Either way, we encourage you to reach out to us – we will gladly answer all of your questions and explain everything in detail. No strings attached!

XML in software localization

When it comes to software localization make sure to place all of your front end texts into external resource files. Android uses XML for its localization resources. There are three types of entries: string, string-array, and plurals.

The first type of entry contains simple strings or sentences. Each entry must have a name attribute that must be unique in a resource.

The string-array type also has a name attribute, which must also be unique in a resource file. Each string-array contains multiple items. These items are represented as a single translation entity so that it is possible to have a different number of entries in a particular language.

The last type of entries, plurals, also has an attribute name (which is required to be unique as well) and contains a list of items, each one having an attribute quantity that is used to select the appropriate plural form of the string for a particular number of elements.


 

Related Posts

Marek Piorkowski
Written By:

Marek is the founder and CEO of Text United. He closely follows global developments in the translation industry.

Add a Comment

Your email address will not be published. Required fields are marked *