Each line in [Phonetic]
section represents one replacement (transformation) rule. Words beginning with the same letter must be grouped together; the order inside this group does not depend on alphabetical issues but it gives priorities; the higher the rule the higher the priority. That's why the first rule that matches is applied. In the following example:
has higher priority than 'G K'
represents the empty string ""
. If 'GH _'
came after 'G K'
, the second rule would never match because the algorithm would stop searching for more rules after the first match. The above rules transform any 'GH'
to an empty string (delete them) and transforms any other 'G'
At the end of the first string of a line (the search string) there may optionally stand a number of characters in brackets. One (only one!) of these characters must fit. It's comparable with the '[ ]' brackets in regular expressions. The rule 'DG(EIY) J'
for example would match any 'DGE'
and replace them with 'J'
. This way you can reduce several rules to one.
At the end of the search string, one or more dashes '-'
may be placed. Those search strings will be matched totally but only the beginning of the string will be replaced. The rule 'TCH-- _'
will match any word containing 'TCH'
(like 'match') but will only replace the first character 'T'
with an empty string. The number of dashes determines how many characters from the end will not be replaced. After the replacement, the search for transformation rules continues with the not replaced 'CH'
If a '<'
is appended to the search string, the search for replacement rules will continue with the replacement string and not with the next character of the word. The rule 'PH< F'
for example would replace 'PH'
and then again start to search for a replacement rule for 'F...'
. If there would also be rules like 'FO O'
and 'F _'
then words like 'PHOXYZ'
would be transformed to 'OXYZ'
and any occurrences of 'PH'
that are not followed by an 'O'
will be deleted like 'PHIXYZ -> IXYZ'
The control character '^'
says that the search string only matches at the beginning of words so that the rule 'RH^ R'
will only apply to words like 'RHESUS'
but not 'PERHAPS'
. You can append another '^'
to the search string. In that case the algorithm treats the rest of the word totally separately from the first matched string at the beginning. This is useful for prefixes whose pronunciation does not depend on the rest of the word and vice versa like 'OVER^^'
in English for example.
The same way as '^'
works does '$'
only apply to words that end with the search string. 'GN$ N'
only matches on words like 'SIGN'
but not 'SIGNUM'
. If you use '^'
together, both of them must fit 'ENOUGH^$ NF'
will only match the word 'ENOUGH'
and nothing else.
Of course you can combine all of the mentioned control characters but they must occur in this order: '< ^ $'
. All characters must be written in CAPITAL letters.
If absolutely no rule can be found — might happen if you use strange characters for which you don't have any replacement rule — the next character will simply be skipped and the search for replacement rules will continue with the rest of the word.