I am trying to build a parser for regular expressions by building the abstract syntax tree from scratch (without any project dependencies or tools such as the cup parser in Java, etc). I don't want to save all the information contained in the regex but, instead, I want to simplify it as much as possible.
As an example, x::=y|z should lead to the same AST as the character class x::=[yz]. However, since regexes can (and do) get very complicated, I can't decide what equivalences to implement. For instance, I don't know how to save negative choice x::=[^b], which would be equivalent to x::=a|c|d|e|...
What abstractions would you make? Could some of those abstractions lead to wrong ASTs later on?
Aucun commentaire:
Enregistrer un commentaire