Bash translated strings

Index, feed.

[ Date | 2023-11-17 23:47 -0500 ]
[ Current movie | Lost in Translation ]

Introduction

The Bash shell and programming language supports a specific string syntax, wherein characters are enclosed between $" and ", which I have never seen anyone use on purpose1.

Bash string syntaxes

Bash supports multiple syntaxes for strings:

Name Example Description
Single-quoted 'hello' Verbatim contents
Double-quoted "hello" Interpolation enabled
ANSI-C quoted $'hello' Some backslash sequences are interpreted
Translatable $"hello" Looks up translation based on locale settings

The first two syntaxes exist in Posix. Single quotes are good when no interpolation is expected or wanted; meaning that each character within a single-quoted string is taken as-is, and there is no way to include a single quote character within the string.

$ foo=hello

# Single quotes
#
# `$foo' is retained literally:
$ echo 'Hello, $foo.'
Hello, $foo.

# Double quotes
#
# `$foo' is replaced with the value of variable `foo':
$ echo "Hello, $foo."
Hello, hello.

The dollar-single-quote syntax can be useful to print some non-printable characters using backslash escapes, or to make it clear in the source code that specific whitespace characters are used:

$ echo $'foo\tbar' >out.tsv
$ od -c out.tsv
0000000    f   o   o  \t   b   a   r  \n
0000010

In the above, we can see that characters 'f', 'o', 'o', tab, 'b', 'a', 'r', were written (plus a final newline added by echo).

The fourth syntax, $"foo", I have never seen used on purpose (I think all the uses I’ve witnessed around me stemmed from assuming that this was the way to get the feature set from double quotes; as some programming languages use syntax that is not just "foo" for this).

Translatable strings

Until recently (as of Bash 5.1, which was current until September 2022), documentation on the feature was limited to2:

A double-quoted string preceded by a dollar sign ($"string") will cause the string to be translated according to the current locale. The gettext infrastructure performs the message catalog lookup and translation, using the LC_MESSAGES and TEXTDOMAIN shell variables. If the current locale is C or POSIX, or if there are no translations available, the dollar sign is ignored. If the string is translated and replaced, the replacement is double-quoted.

This at least hints to gettext being used, but does not explain in usable detail how the two environment variables mentioned are used. I also find the last sentence cryptic: what does it mean for a string, once it has been translated to be double-quoted?

Documentation for GNU gettext warns against using the feature:

GNU bash 2.0 or newer has a special shorthand for translating a string and substituting variable values in it: $"msgid". But the use of this construct is discouraged, due to the security holes it opens and due to its portability problems.

The security problems are:

  1. [Summary: accidental introduction of backquotes using a combination of legacy Asian character sets and running on a platform that does not properly group a character’s bytes together]

  2. A translator could - voluntarily or inadvertently - use backquotes “...” or dollar-parentheses “$(…)” in her translations. The enclosed strings would be executed as command lists by the shell.

This explains what the Bash man page meant with “the replacement is double-quoted”: after the characters in $"foo" are matched with a translation, that translation is not the end result; rather the string is interpreted again as if the characters in the translation had been written within double quotes in the script. For example, a translation that includes $foo as a substring would result in variable foo being retrieved.

Translatable string tutorial

  1. Sample script

    The following script, print-uname, uses a translatable string:

    #! /bin/bash
    set -euo pipefail
    printf $"Here is a uname: %s\n" "$(uname)"

    Without any special setup, this prints:

    $ ./print-uname
    Here is a uname: Linux
  2. Extract translatable strings from script:

    $ bash --dump-po-strings print-uname | tee foo.po
    #: print-uname:3
    msgid "Here is a uname: %s\\n"
    msgstr ""
  3. Add translation:

    Edit foo.po into, for example:

    #: print-uname:3
    msgid "Here is a uname: %s\\n"
    msgstr "Voici un uname: %s\\n"
  4. Compile po file into mo:

    First, create directory where the compiled translation file can be found by gettext:

    $ mkdir --verbose --parents locale/es/LC_MESSAGES
    mkdir: created directory 'locale/es'
    mkdir: created directory 'locale/es/LC_MESSAGES

    Compile:

    msgfmt --output-file locale/fr/LC_MESSAGES/foo.mo foo.po
  5. Call script with proper environment:

    Our “text domain”, set in variable TEXTDOMAIN, is an arbitrary identifier for our program. I use text domain foo in this example.

    In order to use a non-standard directory for our translations files, we have to set TEXTDOMAINDIR.

    With a locale (set with LANG or others) in language fr, translations will be searched in $TEXTDOMAINDIR/(language)/LC_MESSAGES/$TEXTDOMAIN.mo.

    Running our script with variables set correctly results in the translation being used in place of “Here is a uname”:

    $ TEXTDOMAINDIR=locale TEXTDOMAIN=foo LANG=fr_FR.UTF-8 ./print-uname
    Voici un uname: Linux

Security issue in action

If an evil translator were to use backquotes or the $(...) construct in their po file:

#: print-uname:3
msgid "Here is a uname: %s\\n"
msgstr "Heh: $(head -n1 /etc/passwd)\\n"
$ TEXTDOMAINDIR=locale TEXTDOMAIN=foo LANG=fr_FR.UTF-8 ./print-uname
Heh: root:x:0:0:root:/root:/bin/bash

Note how translating the string $"Here is a uname" resulted in a command being run which was not present in the script.

Mitigation

As of Bash version 5.2 (released on 2022-09-26, but still not available from all common distributions as of late 2023, especially long-term-maintained variants), the translation feature and its documentation have improved.

First, there is a way to prevent expansion of strings after they are translated, making the replacements works as if the translated strings had been enclosed in single quotes. I believe this is almost always what is wanted (cases where parts of strings have to be replaced are probably best handled by using and translating printf format strings). This is enabled with new shopt option noexpand_translation.

Second, the documentation includes a tutorial that gives good pointers on how to actually use the feature:

Locale-Specific Translation

Prefixing a double-quoted string with a dollar sign (‘$’), such as $"hello, world", will cause the string to be translated according to the current locale. The gettext infrastructure performs the lookup and translation, using the LC_MESSAGES, TEXTDOMAINDIR, and TEXTDOMAIN shell variables, as explained below. See the gettext documentation for additional details not covered here. If the current locale is C or POSIX, if there are no translations available, of if the string is not translated, the dollar sign is ignored. Since this is a form of double quoting, the string remains double-quoted by default, whether or not it is translated and replaced. If the noexpand_translation option is enabled using the shopt builtin […], translated strings are single-quoted instead of double-quoted.

The rest of this section is a brief overview of how you use gettext to create translations for strings […]

Improved script

#! /bin/bash
set -euo pipefail

## Prevent translated $"..." strings from introducing expansions.
## Requires Bash 5.2.
shopt -s noexpand_translation

printf $"Here is a uname: %s\n" "$(uname)"

And, indeed, using noexpand_translation plugs the security hole we previously observed3:

$ TEXTDOMAINDIR=locale TEXTDOMAIN=foo LANG=fr_FR.UTF-8 ./print-uname
Heh: $(head -n1 /etc/passwd)

In the above, translated string $(head -n1 /etc/passwd) got printed verbatim rather than being interpreted as a command substitution.

Conclusions

With the addition of the noexpand_translation shell option in Bash version 5.2, I believe that the security issues raised in GNU gettext documentation are addressed.


  1. This is a completely anecdotal report; I am certain that high-visibility projects exist that use Bash’s translation features extensively. This only reflects what I have been exposed to.↩︎

  2. From the Bash version 5.1.4 man page↩︎

  3. With a version of Bash older than 5.2, this would fail with a localized error message saying that noexpand_translation is not a supported shell option; perhaps « nom d’option du shell non valable » 🥖🥖🥖.↩︎

www.kurokatta.org


www.kurokatta.org

Quick links:

Photos
Montréal
Oregon
Paris
Camp info 2007
Camp Faécum 2007
--more--
Doc
Jussieu
Japanese adjectives
Muttrc
Bcc
Montréal
Couleurs LTP
French English words
Petites arnaques
--more--
Hacks
Statmail
DSC-W17 patch
Scarab: dictionnaire de Scrabble
Sigpue
Recipes
Omelette soufflée au sirop d'érable
Camembert fondu au sirop d'érable
La Mona de Tata Zineb
Cake aux bananes, au beurre de cacahuètes et aux pépites de chocolat
*