[/
    Copyright (c) 2022 Vinnie Falco (vinnie.falco@gmail.com)

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

    Official repository: https://github.com/boostorg/url
]

[section Character Sets]

A ['character set] represents a subset of low-ASCII characters,
used as a building block for constructing rules. The library
models them as callable predicates invocable with this
equivalent signature:

```
/// Return true if ch is in the set
bool( char ch ) const noexcept;
```

The __CharSet__ concept describes the requirements on
syntax and semantics for these types. Here we declare
a character set type that includes the horizontal and
vertical whitespace characters:

[code_grammar_2_2]

The type trait __is_charset__ determines if a type meets
the requirements:

[code_grammar_2_3]

Character sets are always passed as values. As with rules,
we declare an instance of the type for notational convenience.
The `constexpr` designation is used to make it a zero-cost
abstraction:

[code_grammar_2_4]

For best results, ensure that user-defined character set types
are `constexpr` constructible.

The functions __find_if__ and __find_if_not__ are used to
search a string for the first matching or the first non-matching
character from a set. The example below skips any leading
whitespace and then returns everything from the first
non-whitespace character to the last non-whitespace
character:

[code_grammar_2_5]

The function can now be called thusly:

[code_grammar_2_6]

The library provides these often-used character sets:

[table Character Sets [
    [Value]
    [Description]
][
    [[link url.ref.boost__urls__grammar__alnum_chars `alnum_chars`]]
    [
        Contains the uppercase and lowercase letters, and digits.
    ]
][
    [[link url.ref.boost__urls__grammar__alpha_chars `alpha_chars`]]
    [
        Contains the uppercase and lowercase letters.
    ]
][
    [[link url.ref.boost__urls__grammar__digit_chars `digit_chars`]]
    [
        Contains the decimal digit characters.
    ]
][
    [[link url.ref.boost__urls__grammar__hexdig_chars `hexdig_chars`]]
    [
        Contains the uppercase and lowercase hexadecimal
        digit characters.
    ]
][
    [[link url.ref.boost__urls__grammar__vchars `vchars`]]
    [
        Contains the visible characters (i.e. non whitespace).
    ]
]]


Some of the character sets in the library have implementations
optimized for the particular character set or optimized in general,
often in ways that take advantage of opportunities not available
to standard library facilities. For example, custom code enhancements
using Streaming SIMD Extensions 2
([@https://en.wikipedia.org/wiki/SSE2 SSE2]),
available on all x86 and x64 architectures.

[heading The lut_chars Type]

The __lut_chars__ type satisfies the __CharSet__
requirements and offers an optimized `constexpr`
implementation which provides enhanced performance
and notational convenience for specifying character
sets. Compile-time instances can be constructed
from strings:

[code_grammar_2_7]

We can use `operator+` and `operator-` notation to add and
remove elements from the set at compile time. For example,
sometimes the character 'y' sounds like a vowel:

[code_grammar_2_8]

The type is named after its implementation, which is a
lookup table ("lut") of packed bits. This allows for a
variety of construction methods and flexible composition.
Here we create the set of visible characters using a lambda:

[code_grammar_2_9]

Alternatively:

[code_grammar_2_10]

Differences can be calculated with `operator-`:

[code_grammar_2_11]

We can also remove individual characters:

[code_grammar_2_12]

[endsect]