jsparagus/js_parser: Generating a parser for JavaScript
In this directory:
- 
esgrammar.pgen A grammar for the mini-language the ECMAScript standard uses to describe ES grammar. 
- 
es.esgrammar - The actual grammar for ECMAScript, in emu-grammar format, extracted automatically from the spec. 
- 
extract_es_grammar.py - The script that creates es.esgrammar. 
- 
es-simplified.esgrammar - A hacked version of es.esgrammar that jsparagus can actually handle. 
- 
generate_js_parser_tables.py - A script to generate a JS parser based on es-simplified.esgrammar. Read on for instructions. 
How to run it
To generate a parser, follow these steps:
$ cd ..
$ make init
$ make all
Note: The last step currently takes about 35 seconds to run on my laptop. jsparagus is slow.
Once you're done, to see your parser run, try this:
$ cd crates/driver
$ cargo run --release
The build also produces a copy of the JS parser in Python.
After make all, you can use make jsdemo to run that.
How simplified is "es-simplified"?
Here are the differences between es.esgrammar, the actual ES grammar, and es-simplified.esgrammar, the simplified version that jsparagus can actually handle:
- 
The four productions with [~Yield] and [~Await] conditions are dropped. This means that yieldandawaitdo not match IdentifierReference or LabelIdentifier. I think it's better to do that in the lexer.
- 
Truncated lookahead. ValueError: unsupported: lookahead > 1 token, [['{'], ['function'], ['async', ('no-LineTerminator-here',), 'function'], ['class'], ['let', '[']]
- 
Delete a rule that uses but notsince it's not implemented.Identifier : IdentifierName but not ReservedWordMaking sense of this rule in the context of an LR parser is an interesting task; see issue #28. 
- 
Ban loops of the form for (async of EXPR) STMTby adjusting a lookahead assertion. The grammar is not LR(1).