.
Last update: 1997-05-20
9945-2-2 Class: No change _____________________________________________________________________________ Topic: Regular expressions Relevant Sections: B.5.2 Defect Report: ----------------------- In Section B.5.2 - Description {of C Binding for Regular Expression Matching}, the standard states that the re_nsub member of the regex_t structure represents the number of parenthesized subexpressions found in pattern. [Draft 12 of ISO/IEC 9945-2:1993 (July 1992), p. 766, lines 329-331] The standard then states that the pmatch argument shall point to an array with at least nmatch elements, and regexec() shall fill in the elements of that array with offsets of the substrings of string that correspond to the parenthesized subexpressions of pattern: pmatch[i].rm_so shall be the byte offset of the beginning and pmatch[i].rm_eo shall be one greater than the byte offset of the end of substring i. (Subexpression i begins at the ith matched open parenthesis, counting from 1.) Offsets in pmatch[0] shall identify the substring that corresponds to the entire regular expression. [Ibid., p. 766-767, lines 339-346] Thus, if pmatch[] contains nmatch elements, it can only hold nmatch-1 parenthesized subexpressions of string, since pmatch[0] represents the entire regular expression. The standard also states that ``if there are more than nmatch subexpressions in pattern (pattern itself counts as a subexpression), then regexec() [...] shall record only the first nmatch substrings.'' [Ibid., p. 767, lines 347-350] Lines 347-350 appear to contradict lines 339-346; the latter talks about parenthesized subexpressions, while the former mentions plain subexpressions. Is the intent of the standard to allow the re_nsub member to include the subexpression representing the entire regular expression in the count (since it is considered a subexpression on page 767, lines 347-350), or does it only count explicitly parenthesized subexpressions? We believe this is the easiest way to rectify the ambiguity. WG15 response for 9945-2:1993 ----------------------------------- The subexpression representing the entire RE is to be included in the count represented in the re_nsub member. No change in wording is necessary. Rationale for Interpretation: ----------------------------- The section quoted in the request, from Section B.5.2 (but lines 327-338 in the Standard) contains the phrase "(pattern itself counts as an expression)", which the committee considers key to interpreting this apparent conflict. _____________________________________________________________________________