// // Copyright (c) 2023 Alan de Freitas (alandefreitas@gmail.com) // // Distributed under the Boost Software License, Version 1.0. (See accompanying // file LICENSE_1_0.txt or copy at https://www.boost.org/LICENSE_1_0.txt) // // Official repository: https://github.com/boostorg/url // = Compound Rules The rules shown so far have defined https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols[__terminal symbols__,window=blank_], representing indivisible units of grammar. To parse more complex things, a https://en.wikipedia.org/wiki/Parser_combinator[__parser combinator__,window=blank_] (or __compound rule__) is a rule which accepts as parameters one or more rules and combines them to form a higher order algorithm. In this section we introduce the compound rules provided by the library, and how they may be used to express more complex grammars. == Tuple Rule Consider the following grammar: [source,cpp] ---- version = "v" dec-octet "." dec-octet ---- We can express this using `tuple_rule`, which matches one or more specified rules in sequence. The folllowing defines a sequence using some character literals and two decimal octets, which is a fancy way of saying a number between 0 and 255: [source,cpp] ---- include::example$unit/doc_grammar.cpp[tag=code_grammar_3_1,indent=0] ---- This rule has a value type of cpp:std::tuple[], whose types correspond to the value type of each rule specified upon construction. The decimal octets are represented by the `dec_octet_rule` which stores its result in an `unsigned char`: [source,cpp] ---- include::example$unit/doc_grammar.cpp[tag=code_grammar_3_2,indent=0] ---- To extract elements from cpp:std::tuple[] the function cpp:std::get[] must be used. In this case, we don't care to know the value for the matching character literals. The `tuple_rule` discards match results whose value type is `void`. We can use the `squelch` compound rule to convert a matching value type to `void`, and reformulate our rule: [source,cpp] ---- include::example$unit/doc_grammar.cpp[tag=code_grammar_3_3,indent=0] ---- When all but one of the value types is `void`, the cpp:std::tuple[] is elided and the remaining value type is promoted to the result of the match: [source,cpp] ---- include::example$unit/doc_grammar.cpp[tag=code_grammar_3_4,indent=0] ---- == Optional Rule BNF elements in brackets denote optional components. These are expressed using `optional_rule`, whose value type is an `optional`. For example, we can adapt the port rule from above to be an optional component: [source,cpp] ---- include::example$unit/doc_grammar.cpp[tag=code_grammar_3_5,indent=0] ---- In this example we build up a rule to represent an endpoint as an IPv4 address with an optional port: [source,cpp] ---- include::example$unit/doc_grammar.cpp[tag=code_grammar_3_6,indent=0] ---- This can be simplified; the library provides `ipv4_address_rule` whose result type is `ipv4_address`, offering more utility than representing the address simply as a collection of four numbers: [source,cpp] ---- include::example$unit/doc_grammar.cpp[tag=code_grammar_3_7,indent=0] ---- == Variant Rule BNF elements separated by unquoted slashes represent a set of alternatives from which one element may match. We represent them using `variant_rule`, whose value type is a variant. Consider the following HTTP production rule which comes from https://datatracker.ietf.org/doc/html/rfc7230#section-5.3"[rfc7230,window=blank_]: [source,cpp] ---- request-target = origin-form / absolute-form / authority-form / asterisk-form ---- The __request-target__ can be exactly one of these things. Here we define the rule, using `origin_form_rule`, `absolute_uri_rule`, and `authority_rule` which come with the library, and obtain a result from parsing a string: [source,cpp] ---- include::example$unit/doc_grammar.cpp[tag=code_grammar_3_8,indent=0] ---- In the next section we discuss facilities to parse a repeating number of elements.