WG15 Defect Report Ref: 9945-2-27
Topic: LC_COLLATE


This is an approved interpretation of 9945-2:1993.

.

Last update: 1997-05-20


								9945-2-27

	Class: The ambiguous situation

The standard is unclear on this issue, and as such no conformance
distinction can be made between alternative implementations based on this.
This is being referred to the Sponsors of the standard for clarifying 
wording in the next amendment.

 _____________________________________________________________________________

	Topic:			LC_COLLATE
	Relevant Sections:	2.5.2.2


Defect Report:
-----------------------

    (Section 2.5.2.2, LC_COLLATE, lines 1654-1658 in Draft 12)
    "User-defined ordering of collating elements. Each collating
    element shall be assigned a collation value defining its order
    in the character (or basic) collation sequence. This ordering
    is used by regular expressions and pattern matching and, unless
    collation weights are explicitly specified, also as the collation
    weight to be used in sorting."

Given this passage, assume there are two similar LC_COLLATE fragments.
The fragments include lowercase letters only to simplify the examples.
Here is the first fragment:

<a	<a>;<a>;<a>
<a-grave<a>;<a-grave>;<a-grave>
<a-acute<a>;<a-acute>;<a-acute>
<b	<b>;<b>;<b>
<c	<c>;<c>;<c>
<d	<d>;<d>;<d>
. . .
<z	<z>;<z>;<z>
. . .

Here is the second fragment:

<a	<a>;<a>;<a>
<b	<b>;<b>;<b>
<c	<c>;<c>;<c>
<d	<d>;<d>;<d>
. . .
<z	<z>;<z>;<z>
<a-grave<a>;<a-grave>;<a-grave>
<a-acute<a>;<a-acute>;<a-acute>
. . .


Suppose a user wanted to find all words that begin with a letter
in the range a-c. At the XoJIG meeting, we agreed that a locale
built using the first fragment returns words that begin with <a>,
<a-grave>, <a-acute>, <b>, and <c>. However, there were varying
opinions about whether the second fragment would return the same
results, or would exclude <a-grave> and <a-acute>. So the question
is this:

Should an RE run against a locale built using the second fragment
include the accented a's in the range because they are defined as
being in the same equivalence class as <a>, or should it exclude
the accented a's because they are listed outside the range of a-c?



WG15 response for 9945-2:1993 
-----------------------------------
The standard is ambiguous in this area, since it is not clear what the
phrase "collation sequence order" means or is.  The two possibilities
are "the order in locale file", or "the order determined by the weights
in the locale file".  The standard allows either behavior.  Concern over
the wording of this area has been forwarded to the Sponsors of the standard.

Rationale for Interpretation:
-----------------------------
None.
 _____________________________________________________________________________