Ruiko EBNF¶

Grammar¶

ignore [token1, token2, ...]
# optional, discard some tokenizers with specific names.
# it only affects when you're using EBNFParser automatical tokenizing function.

deftoken directory1.directory2...directoryn.filename
# your custom token function. cannot be applied when you're using auto token.


token1  := ...;
token2  := ...;

token3 cast := ...;
# define a cast map

token4 cast as K := ...;
# def cast map and custom prefix

token5 as K := ...;
# only def custom prefix


token6 := ...;

token7 of token5 := ...;
# add more patterns to token5

parser1 ::=  token3 token5+ | [token6] token4* (parser2 parser3){3, 10};
# define a combined parser
/*
   `|` means `or`,
   `[<patterns>]` means `optional`,
   `(<patterns>)` means `make a new pattern by several patterns`,
   `pattern+` means one or more,
   `pattern*` means zero or more,
   `pattern{a, b}` means matching this pattern more than `a` times and less than b;
   `pattern{a}` means matching this pattern more than `a` times.
*/

parser2 throw [parser3 ';'] = parser3 parser1 ';';
/*
the result from `parser2` will not contains
    a term(`Tokenizer` or `Ast`) with name=parser3 or string=";"
*/

More accurately, see the bootstrap grammar here.

Regex Prefix¶

Regex prefix in ruiko EBNF would add a regex pattern to token_table, which might be used for generating an automatical tokenizing function(unless you use your custom tokenizing function).

When you want to use Regex prefix, just type R'<your regex pattern>'.

url.ruiko

url := R'https.*?\.(com|cn|org|net)';
other := R'.';
parserToTest throw [other] ::= (url | other)+;

test it .. code

ruiko url.ruiko url --test
python test_lang.py parserToTest "https://github.comasdas https://123.net"
=========================ebnfparser test script================================
parserToTest[
    [name: url, string: "https://github.com"]
    [name: url, string: "https://123.net"]
]

You should take care that there is only regex matching in tokenizing process, and when literal parsers and combined parsers are parsing tokenizers, they are matching whether the name is what they expect(in fact, what parsers are comparing by is not the name, it’s the memory address, so EBNFParser is very quick in this process).

Cast Map¶

SomeToken cast as S := 'abc';
Alpha               := R'[a-z]+';
F                   ::= S'abc' | Alpha;

The ruiko codes above defines a tokenize named SomeToken with a prefix S.

When the input source is splitted into a sequence of tokenizers , however, even the literal parser Alpha is supposed to match all string matched by regex pattern "[a-z]+", it cannot match a tokenizer with attribute string="abc" generated by EBNFParser automatical tokenizing, that’s because all the "all" has been casted into a unique string in a buffer pool, and all of them have the same name SomeToken, not Alpha.

Here is a string with value "abc" located at an unique memory address, and every literal parser defined by "abc" just matched it only.

Just as what I told you at Section Regex Prefix , The literal parser defined as Alpha := R'[a-z]+' just matches the tokenizer whose name is Alpha.

Custom Prefix¶

If you’re using custom tokenizing, several Ruikowa.ObjectRegex.Tokenizer objects with the same attribute string="abc" (and have the same memory address) could have different names.

To distinguish from each other, you can do as the following:

Grammar

SomeToken as S := 'abc';
Alpha          := R'[a-z]+';
F              ::= S'abc' | Alpha;
G              ::= 'abc';
H              ::= G | F ;

[name: SomeToken, string: "abc"]
...

If you are using combined parser G to match above tokenizers, you’ll fail, because in the grammar G is defined as G::='abc' , it means G only accepts the a tokenizer who has an attribute name="auto_const" and another attribute string="abc" (and it’s from the unique buff pool, not a string created by regex matching).