.
Last update: 1997-05-20
9945-2-24 Class: Defect situation The standards states what it states, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the Sponsors of the standard for consideration as a future amendment. _____________________________________________________________________________ Topic: tr Relevant Sections: 4.64.5.1 Defect Report: ----------------------- Component: tr - Sect 4.64.5.1 Submitted by: Alex White Ref. No.: tr.1 Proposed Resolution: The interpretation request correctly describes what is in the standard but this was not what was intended. The working group will draft and propose a change to .2b to describe what was originally intended. _____________________________________________________________________________ In Section 4.64.5.1 - Standard Input {of tr}, the standard states that the standard input to tr ``can be any file type.'' [Draft 12 of ISO/IEC 9945-2:1993 (July 1992), p. 483, line 10456] However, in Section 4.64.5.3 - Environment Variables {of tr}, the standard states that the LC_COLLATE variable ``shall determine the behaviour of range expressions and equivalence classes.'' [Ibid., p. 483, lines 10499-10500] and in Section 4.64.7 - Extended Description {of tr}, the standard states that the \octal construct [...] can be used to represent characters with specific coded values. An octal sequence shall consist of a backslash followed by the longest sequence of one-, two-, or three-octal-digit characters (01234567). The sequence shall cause the character whose encoding is represented by the one-, two-, or three-digit octal integer to be placed into the array. [Ibid., p. 484, lines 10525-10530] These two statements cause tr to be unusable on any files of type other than text. Historically, tr has been used to manipulate files containing binary data. For example, the perfectly valid, and useful construct: tr -d '\200-\2ff' to delete all characters with the top bit on or even tr '\200-\2ff' '\0-\1ff' to strip the top bit (which are useful operations on binary files), no longer work. For example, in the PC character set, \200 is a C-cedilla, and \2ff is not defined as a glyph. Therefore, according to section 4.64.5.3, the most likely interpretation is characters which collate from C-cedilla (probably the letter D) through the end will all match here. This is clearly wrong, not historical practice, and of no use whatsoever. May we interpret the standard as permitting octal escape sequences as endpoints of a range to not use the collating order, but rather byte ordering? WG15 response for 9945-2:1993 ----------------------------------- The standard is clear in its requirement that octal sequences used as endpoints in a range be treated as collating elements. The implementation must follow this requirement. Concern over the wording of this area of this standard has been forwarded to the sponsors. Rationale for Interpretation: ----------------------------- None. _____________________________________________________________________________