Creating translation-oriented source documents

Written by Yves Vanneste |

Today almost all translation companies use translation memory (TM) systems in their translation process. A TM is a database that stores sentences  (“segments”) for future use, thereby increasing consistency and reducing translation time and cost.


A well-prepared and cleanly formatted source document can save a lot of time and money during translation with a TM system since the recognition capabilities of the TM system only make sense if the segments to be translated are actually identical or similar.


Here are some tips for optimally preparing source documents for translation during their initial creation.


PDF files vs. original file formats

Whenever possible, avoid using PDF files as the source document format for translation. Always try to provide the original file format that served as the basis for the creation of the PDF files since PDF files cannot currently be edited in some programs and instead have to be transformed into another format (usually Word) before translation. The transformed documents must generally be edited again before translation since the converted text usually contains too many formatting errors to be able to translate it sensibly with a TM system.



Hard and soft line breaks


Avoid hard line breaks (paragraph marks) within sentences; otherwise, no sensible segments can be offered for translation. 

Line breaks should only be used if a new paragraph is actually started. TM systems decide using segment end limiters where a translation unit (normally a sentence) ends. These characters are generally ., !, ? and ¶. A line break is always detected as a segment end, and manual editing is required if the line break is within a sentence and subdivides it into two segments.

Soft line breaks (Ctrl+Enter) should also be avoided. TM systems do not interpret them as segment ends, which is why such units are not detected correctly and they have to be re-worked manually by the translator.



Blank spaces and tabs


Try to use tabs or indents to indent texts and do not use a series of blank spaces to do this. After reading the document into a TM system, these characters are all displayed. In 99% of cases, the translation with the same blank spaces will look different than it does in the source document. The text then has to be reworked after the translation in nearly every case.



Superfluous formatting


If you work frequently with colored marking in the text in order to visually emphasize text passages, you should make sure that this has been removed completely before translation and that there are only line breaks and blank spaces remaining. Otherwise, this “invisible”

formatting is offered to the translator as possible formatting and may result in the translator writing directly in the wrong font or color.





If you want to use hyphenation in your text, make sure that you either use the “automatic hyphenation” function or insert an optional hyphen manually. Many people just insert a normal hyphen instead of using the automatic hyphenation or the manually inserted optional hyphen. In this case the translator and the TM system face the following problems: standard hyphens are recognized as normal characters and add an additional character to the word in which they are placed. This means that the translation unit stored in the TM will not be a 100% match even if exactly the same sentence appears again without hyphenation. 

(source Nicole Keller Multilingual Mag. 11-2011).



At Audience Global we offer highly specialized language services and expertise to help you manage language challenges so that you can expand your business globally and increase your sales revenues.

We specialize in providing reliable, cost-effective translation, localization, content management and multilingual publishing services to customers across the globe.

For more information visit our website.

You like this post? Click one of the button below to share it.

Comments are closed.