[/============================================================================== Copyright (C) 2001-2011 Joel de Guzman Copyright (C) 2001-2011 Hartmut Kaiser Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ===============================================================================/] [section:char Character Parsers] This module includes parsers for single characters. Currently, this module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single characters, ranges and character sets) and the encoding specific character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.). [heading Module Header] // forwards to #include Also, see __include_structure__. [/------------------------------------------------------------------------------] [section:char Character Parser (`char_`, `lit`)] [heading Description] The `char_` parser matches single characters. The `char_` parser has an associated __char_encoding_namespace__. This is needed when doing basic operations such as inhibiting case sensitivity and dealing with character ranges. There are various forms of `char_`. [heading char_] The no argument form of `char_` matches any character in the associated __char_encoding_namespace__. char_ // matches any character [heading char_(ch)] The single argument form of `char_` (with a character argument) matches the supplied character. char_('x') // matches 'x' char_(L'x') // matches L'x' char_(x) // matches x (a char) [heading char_(first, last)] `char_` with two arguments, matches a range of characters. char_('a','z') // alphabetic characters char_(L'0',L'9') // digits A range of characters is created from a low-high character pair. Such a parser matches a single character that is in the range, including both endpoints. Note, the first character must be /before/ the second, according to the underlying __char_encoding_namespace__. Character mapping is inherently platform dependent. It is not guaranteed in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we purposely attach a specific __char_encoding_namespace__ (such as ASCII, ISO-8859-1) to the `char_` parser to eliminate such ambiguities. [note *Sparse bit vectors* To accommodate 16/32 and 64 bit characters, the char-set statically switches from a `std::bitset` implementation when the character type is not greater than 8 bits, to a sparse bit/boolean set which uses a sorted vector of disjoint ranges (`range_run`). The set is constructed from ranges such that adjacent or overlapping ranges are coalesced. `range_runs` are very space-economical in situations where there are lots of ranges and a few individual disjoint values. Searching is O(log n) where n is the number of ranges.] [heading char_(def)] Lastly, when given a string (a plain C string, a `std::basic_string`, etc.), the string is regarded as a char-set definition string following a syntax that resembles posix style regular expression character sets (except that double quotes delimit the set elements instead of square brackets and there is no special negation ^ character). Examples: char_("a-zA-Z") // alphabetic characters char_("0-9a-fA-F") // hexadecimal characters char_("actgACTG") // DNA identifiers char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E [heading lit(ch)] `lit`, when passed a single character, behaves like the single argument `char_` except that `lit` does not synthesize an attribute. A plain `char` or `wchar_t` is equivalent to a `lit`. [note `lit` is reused by both the [qi_lit_string string parsers] and the char parsers. In general, a char parser is created when you pass in a character and a string parser is created when you pass in a string. The exception is when you pass a single element literal string, e.g. `lit("x")`. In this case, we optimize this to create a char parser instead of a string parser.] Examples: 'x' lit('x') lit(L'x') lit(c) // c is a char [heading Header] // forwards to #include Also, see __include_structure__. [heading Namespace] [table [[Name]] [[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]] [[`ns::char_`]] ] In the table above, `ns` represents a __char_encoding_namespace__. [heading Model of] [:__primitive_parser_concept__] [variablelist Notation [[`c`, `f`, `l`] [A literal char, e.g. `'x'`, `L'x'` or anything that can be converted to a `char` or `wchar_t`, or a __qi_lazy_argument__ that evaluates to anything that can be converted to a `char` or `wchar_t`.]] [[`ns`] [A __char_encoding_namespace__.]] [[`cs`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__ that specifies a char-set definition string following a syntax that resembles posix style regular expression character sets (except the square brackets and the negation `^` character).]] [[`cp`] [A char parser, a char range parser or a char set parser.]] ] [heading Expression Semantics] Semantics of an expression is defined only where it differs from, or is not defined in __primitive_parser_concept__. [table [[Expression] [Semantics]] [[`c`] [Create char parser from a char, `c`.]] [[`lit(c)`] [Create a char parser from a char, `c`.]] [[`ns::char_`] [Create a char parser that matches any character in the `ns` encoding.]] [[`ns::char_(c)`] [Create a char parser with `ns` encoding from a char, `c`.]] [[`ns::char_(f, l)`][Create a char-range parser that matches characters from range (`f` to `l`, inclusive) with `ns` encoding.]] [[`ns::char_(cs)`] [Create a char-set parser with `ns` encoding from a char-set definition string, `cs`.]] [[`~cp`] [Negate `cp`. The result is a negated char parser that matches any character in the `ns` encoding except the characters matched by `cp`.]] ] [heading Attributes] [table [[Expression] [Attribute]] [[`c`] [__unused__ or if `c` is a __qi_lazy_argument__, the character type returned by invoking it.]] [[`lit(c)`] [__unused__ or if `c` is a __qi_lazy_argument__, the character type returned by invoking it.]] [[`ns::char_`] [The character type of the __char_encoding_namespace__, `ns`.]] [[`ns::char_(c)`] [The character type of the __char_encoding_namespace__, `ns`.]] [[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]] [[`ns::char_(cs)`] [The character type of the __char_encoding_namespace__, `ns`.]] [[`~cp`] [The attribute of `cp`.]] ] [heading Complexity] [:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g. `wchar_t`). These have *O(log N)* complexity, where N is the number of distinct character ranges in the set.] [heading Example] [note The test harness for the example(s) below is presented in the __qi_basics_examples__ section.] Some using declarations: [reference_using_declarations_lit_char] Basic literals: [reference_char_literals] Range: [reference_char_range] Character set: [reference_char_set] Lazy char_ using __phoenix__ [reference_char_phoenix] [endsect] [/ Char] [/------------------------------------------------------------------------------] [section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)] [heading Description] The library has the full repertoire of single character parsers for character classification. This includes the usual `alnum`, `alpha`, `digit`, `xdigit`, etc. parsers. These parsers have an associated __char_encoding_namespace__. This is needed when doing basic operations such as inhibiting case sensitivity. [heading Header] // forwards to #include Also, see __include_structure__. [heading Namespace] [table [[Name]] [[`ns::alnum`]] [[`ns::alpha`]] [[`ns::blank`]] [[`ns::cntrl`]] [[`ns::digit`]] [[`ns::graph`]] [[`ns::lower`]] [[`ns::print`]] [[`ns::punct`]] [[`ns::space`]] [[`ns::upper`]] [[`ns::xdigit`]] ] In the table above, `ns` represents a __char_encoding_namespace__. [heading Model of] [:__primitive_parser_concept__] [variablelist Notation [[`ns`] [A __char_encoding_namespace__.]] ] [heading Expression Semantics] Semantics of an expression is defined only where it differs from, or is not defined in __primitive_parser_concept__. [table [[Expression] [Semantics]] [[`ns::alnum`] [Matches alpha-numeric characters]] [[`ns::alpha`] [Matches alphabetic characters]] [[`ns::blank`] [Matches spaces or tabs]] [[`ns::cntrl`] [Matches control characters]] [[`ns::digit`] [Matches numeric digits]] [[`ns::graph`] [Matches non-space printing characters]] [[`ns::lower`] [Matches lower case letters]] [[`ns::print`] [Matches printable characters]] [[`ns::punct`] [Matches punctuation symbols]] [[`ns::space`] [Matches spaces, tabs, returns, and newlines]] [[`ns::upper`] [Matches upper case letters]] [[`ns::xdigit`] [Matches hexadecimal digits]] ] [heading Attributes] [:The character type of the __char_encoding_namespace__, `ns`.] [heading Complexity] [:O(N)] [heading Example] [note The test harness for the example(s) below is presented in the __qi_basics_examples__ section.] Some using declarations: [reference_using_declarations_char_class] Basic usage: [reference_char_class] [endsect] [/ Char Classification] [endsect]