AM Blog

4 Helpful Ways to Use Regular Expression in Trados

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

Regex is an abbreviation of Regular Expression. In SDL Trados Studio, regex can be used to filter segments that contain a certain regex, find content that contains regex, set translation checks, create new segmentation rules to break segments in TM, and use regex for advanced search.

Replace tab characters and paragraph marks

With regex, tab characters and paragraph marks (new line or soft return) are displayed as follows:

\n: new line (shift+Enter)

\t: tab

For example, to filter segments containing paragraph mark(s), first we go to the find tool. There are three ways to use the find and filter tool in Trados.

  • Search box in Review tab (1)
  • Advanced Display Filter in View tab (2)
  • Ctrl + F or Ctrl + H (3)

Then, choose to find in Source or Target text. Enter characters representing the paragraph mark in the search box and press Enter.

Notes: The Regex box under the Search box in Review tab is automatically ticked. In the other two ways, manually tick the Regex box to use Regular Expression.

Option 1 (default)

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

Option 2 (tick to choose)

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

Option 3 (tick to choose)

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

Set up Verification tools

The translation checking tool of Trados, Verification, allows regex to be used in order to find potential errors. For example, we can create a rule for Trados to check if the paragraph marks in the target text are the same as in the source text.

First, go to the settings by selecting Project Settings > Verification > QA Checker 3.0 > Regular Expressions.

Tick box Search regular expressions. In the Description box, enter a name for the verification rule. Next, enter the regex to be verified in the corresponding source and target boxes. Finally, select an action in the Condition box (e.g. report if the same regex in the source text can’t be found the in target text).

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

Thus, by using the Verification tool for checking, Trados will notify you if there are any segments in which the regex in the translation is not the same as in the source text.

Create new segmentation rules to break segments in TM

By default, Trados will break segments at common line breaks such as period “.” and colon “:”. However, we can customize to allow Trados to break segments at specified regex characters, e.g. newline character (\n).

To do this, go to the settings of the project’s TMs (Project Settings > [Language Pairs > All Language Pairs > Translation Memory and Automated Translation] > Settings). In a default project, when selecting Project Settings, the “Translation Memory and Automated Translation” window shall automatically open.

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

Then, select Language Resources > Segmentation Rules

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

Select Add to add segmentation rules to break segments.

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

In the Before break column, enter a regex to break segments, the Regular Expression checkbox will be available only after a regex is entered manually.

4-Ung-dung-huu-ich-cua-Regular-Expression-trong-Trados

Click OK to save. Then re-add the file to apply the new segmentation rules.

Advanced Search with Regex

Example 1: Find British English words such as: behaviour, colour, humour.

Enter the following command:

Find: (\w+)our [Explain: keywords end with ‘our’]

In the above example, the characters on the left of ‘our’ in the brackets, indicating that this is a group, which can contain any character.

Example 2: Find all dates in October, November and December in the text. Example: 20th November

Enter the following command:

Find: (\d+th)(\s)(October|November|December)

In this example, we used a regex containing 3 groups:

Group 1: (\d+th) – one or more numeric characters followed by “th” (e.g.: 20th)

Group 2: (\s) – whitespace characters

Group 3: (October|November|December) – Any of the 3 words

Example 3: Find numbers by format. Example: 100,000.00

Find: \d+,\d+\.\d+

In this example, “\d+” represents any group of numbers The entire search string above shall be interpreted as [number],[number].[number]. There can be one or more numbers in the square brackets. As such, numbers like 10,00.2 or 15,231,562 will show up in the search results.

Thus, with regex, translators have more options to handle a file or filter segments at will. This will greatly help with translation quality control.

Share this post:

Leave a Comment: