.
Last update: 1997-05-20
9945-2-9 Class: No change _____________________________________________________________________________ Topic: LC_CTYPE Relevant Sections: E.3.5.3 Defect Report: ----------------------- In Section 3.5.3 - Variables, the standard states that: This variable [LC_CTYPE] shall determine the interpretation of sequences of bytes of text data as characters (e.g. single- versus multibyte characters), which characters are defined as letters (character class alpha) and <blank>s (character class blank), and the behaviour of character classes within pattern matching. Changing the value of LC_CTYPE after the shell has started shall not affect the lexical processing of shell commands in the current shell execution environment or its subshells (see 3.12). [Draft 12 of ISO/IEC 9945-2:1993 (July 1992), p. 128, lines 268-276] The standard also states that the LANG variable ``shall provide a default value for the LC_* variables,as described in 2.6'' [Ibid., p. 128, line 261] and that the LC_ALL variable ``shall interact with the LANG and LC_* variables as described in 2.6.'' [Ibid., p. 128, line 264] In Section 2.6 - Environment Variables, the standard summarizes the meanings of these variables: LANG This variable shall determine the locale category for any category not specifically selected via a variable starting with LC_. LANG and the LC_ variables can be used by applications to determine the language for messages and instructions, collating sequences, date formats, etc. Additional semantics of this variable, if any, are implementation defined. LC_ALL This variable shall override the value of the LANG variable and the value of any of the other variables starting with LC_. [...] LC_CTYPE This variable shall determine the locale category for character handling functions. This environment variable shall determine the interpretation of sequences of bytes of text data as characters (e.g. single- versus multibyte characters), the classification of characters (e.g. alpha, digit, graph), and the behaviour of character classes. Additional semantics of this variable, if any, are implementation defined. [Ibid., pp. 76-77, lines 2635-2658] Does changing LC_ALL (or LANG if LC_CTYPE is not set) affect the lexical processing of shell commands in the current shell execution environment? Is the intent of the standard that any changes to environment variables that cause a new LC_TYPE to be used shall be ignored by the shell once it has started execution? An implementation of sh must use the locale specified in LC_CTYPE when reading a script. For example, isalpha/isalnum is used to parse variable names. Consider this simple command: FO<O-umlaut>=BAR cmd If isalnum('<O-umlaut>'), then this will parse as a variable assignment, otherwise it is argument 0. Similarly, cmd will be subject to alias expansion in the former case. There is no need to validate variable names at other times. In such an implementation, changing LC_CTYPE causes no problems. What are the problems with the following commands: LANG=locale-with-O-umlaut FO<O-umlaut>=BAR Then consider this sequence of commands: [ -n "$FO<O-umlaut>" ] && alias echo=: echo foo In both cases the parsing of the second line is determined by the execution of the first line. Traditional implementations execute the first line, then parse and execute the second line. What would a compiler do? On the other hand, if they where embedded in {...} or any other shell compound command, they would both be parsed before being executed. So we have two cases where behaviour is poorly defined or context dependent. I suggest the behaviour of setting the LC_CTYPE be made undefined. Changing LANG in an interactive shell is a reasonable thing to do, and an implementation may immediately change all locales with no problems. Having all but one locale change, and just in the shell, is unintuitive and not required. WG15 response for 9945-2:1993 ----------------------------------- The standard clearly states that changes to LC_CTYPE shall not take effect within the current shell execution environment. This is discussed in the rationale in Section E.3.5.3. Rationale for Interpretation: ----------------------------- None. _____________________________________________________________________________