[ACCEPTED]-Javascript parser for Java-parsing
From https://github.com/google/caja/blob/master/src/com/google/caja/parser/js/Parser.java
The grammar below is a context-free 14 representation of the grammar this parser 13 parses. It disagrees with EcmaScript 262 12 Edition 3 (ES3) where implementations 11 disagree with ES3. The rules for semicolon 10 insertion and the possible backtracking 9 in expressions needed to properly handle backtracking 8 are commented thoroughly in code, since 7 semicolon insertion requires information 6 from both the lexer and parser and is not 5 determinable with finite lookahead.
Noteworthy features
- Reports warnings on a queue where an error doesn't prevent any further errors, so that we can report multiple errors in a single compile pass instead of forcing developers to play whack-a-mole.
- Does not parse Firefox style
catch (<Identifier> if <Expression>)
since those don't work on IE and many other interpreters.- Recognizes
const
since many interpreters do (not IE) but warns.- Allows, but warns, on trailing commas in
Array
andObject
constructors.- Allows keywords as identifier names but warns since different interpreters have different keyword sets. This allows us to use an expansive keyword set.
To 4 parse strict code, pass in a
PedanticWarningMessageQueue
that converts 3MessageLevel#WARNING
and above toMessageLevel#FATAL_ERROR
.
CajaTestCase.js
shows how to set up a parser, and 2 [fromResource
] and [fromString
] in the same class show how to 1 get an input of the right kind.
When using Java V1.8, there is a trick you 20 can use to parse with the Nashorn implementation 19 that comes out the box. By looking at the 18 unit tests in the OpenSDK source code, you 17 can see how to use the parser only, without 16 doing all the extra compilation etc...
Options options = new Options("nashorn");
options.set("anon.functions", true);
options.set("parse.only", true);
options.set("scripting", true);
ErrorManager errors = new ErrorManager();
Context context = new Context(options, errors, Thread.currentThread().getContextClassLoader());
Source source = new Source("test", "var a = 10; var b = a + 1;" +
"function someFunction() { return b + 1; } ");
Parser parser = new Parser(context.getEnv(), source, errors);
FunctionNode functionNode = parser.parse();
Block block = functionNode.getBody();
List<Statement> statements = block.getStatements();
Once 15 this code runs, you will have the Abstract 14 Syntax Tree (AST) for the 3 expressions 13 in the 'statements' list.
This can then be 12 interpreted or manipulated to your needs.
The 11 previous example works with following imports:
import jdk.nashorn.internal.ir.Block;
import jdk.nashorn.internal.ir.FunctionNode;
import jdk.nashorn.internal.ir.Statement;
import jdk.nashorn.internal.parser.Parser;
import jdk.nashorn.internal.runtime.Context;
import jdk.nashorn.internal.runtime.ErrorManager;
import jdk.nashorn.internal.runtime.Source;
import jdk.nashorn.internal.runtime.options.Options;
You 10 might need to add an access rule to make 9 jdk/nashorn/internal/**
accessible.
In my context, I am using Java 8 Script as an expression language for my 7 own Domain Specific Language (DSL) which 6 I will then compile to Java classes at runtime 5 and use. The AST lets me generate appropriate 4 Java code that captures the intent of the 3 Java Script expressions.
Nashorn is available 2 with Java SE 8.
The link to information about 1 getting the Nashorn source code is here: https://wiki.openjdk.java.net/display/Nashorn/Building+Nashorn
A previous answer describes a way to get 17 under the covers of JDK 8 to parse javascript. They 16 are now mainlining it in Java 9. Nice!
This 15 will mean that you don't need to include 14 any libraries, instead we can rely on an 13 official implementation from the java guys. Parsing 12 javascript programmatically is much easier 11 to achieve without stepping into taboo areas 10 of java code.
Applications of this might be where you 9 want to use javascript for a rules engine 8 which gets parsed and compiled into some 7 other language at runtime. The AST lets 6 you 'understand' the logic as written in 5 the the concise javascript language and 4 then generate less pretty logic in some 3 other language or framework for execution 2 or evaluation.
http://openjdk.java.net/jeps/236
Summary from the link above:
Define a supported API for 1 Nashorn's ECMAScript abstract syntax tree.
Goals
- Provide interface classes to represent Nashorn syntax-tree nodes.
- Provide a factory to create a configured parser instance, with configuration done by passing Nashorn command-line options via an API.
- Provide a visitor-pattern API to visit AST nodes.
- Provide sample/test programs to use the API.
Non-Goals
- The AST nodes will represent notions in the ECMAScript specification insofar as possible, but they will not be exactly the same. Wherever possible the javac tree API's interfaces will be adopted for ECMAScript.
- No external parser/tree standard or API will be used.
- There will be no script-level parser API. This is a Java API, although scripts can call into Java and therefore make use of this API.
Here are two ANTLR more or less working or complete 22 (see comments on this post) grammars for 21 EcmaScript:
- http://www.antlr.org/grammar/1206736738015/JavaScript.g (incomplete?)
- http://www.antlr.org/grammar/1153976512034/ecmascriptA3.g (buggy?)
From ANTLR 5 minute intro:
ANTLR reads a language 20 description file called a grammar and generates 19 a number of source code files and other 18 auxiliary files. Most uses of ANTLR generates 17 at least one (and quite often both) of these 16 tools:
A Lexer: This reads an input character 15 or byte stream (i.e. characters, binary 14 data, etc.), divides it into tokens using 13 patterns you specify, and generates a token 12 stream as output. It can also flag some 11 tokens such as whitespace and comments as 10 hidden using a protocol that ANTLR parsers 9 automatically understand and respect.
A Parser: This 8 reads a token stream (normally generated 7 by a lexer), and matches phrases in your 6 language via the rules (patterns) you specify, and 5 typically performs some semantic action 4 for each phrase (or sub-phrase) matched. Each 3 match could invoke a custom action, write 2 some text via StringTemplate, or generate 1 an Abstract Syntax Tree for additional processing.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.