Book review - The definitive ANTLR 4 reference

Sat 18 February 2023 | tags: books

Book review - The definitive ANTLR 4 reference

The Definitive ANTLR 4 Reference Second Edition by Terence Parr (Pragmatic Bookshelf, Feb 2013). ISBN: 978-1934356999.

Summary: Programmers run into parsing problems all the time. Whether it's a data format like JSON, a network protocol like SMTP, a server configuration file for Apache, a PostScript/PDF file, or a simple spreadsheet macro language--ANTLR v4 and this book will demystify the process. ANTLR v4 has been rewritten from scratch to make it easier than ever to build parsers and the language applications built on top. This completely rewritten new edition of the bestselling Definitive ANTLR Reference shows you how to take advantage of these new features.

I had a problem to solve where I needed to take user search strings like "from:2023-01-01 to:2023-01-31 type:large" and turn it into a SQL query with a WHERE clause similar to "SELECT * FROM contoso_products t WHERE 1=1 AND t.startdate >= @P1 AND t.enddate <= @P2 AND t.frobtype IN (@P3)". The number of WHERE clauses was dynamically controlled by the user, so different clauses might not occur at all, or they might occur multiple times (possibly up to 200 parameters). The parameter values were dynamically set by the user, which meant that just dumping the values directly into SQL was unsafe due to injection attacks and/or bugs. Therefore, the input string had to be parsed and carefully translated into valid SQL.

A reasonable question is: why not use an ORM? The problem was that the available ORM would not support dynamic WHERE clauses. Each possible query would have to be compiled by the ORM, which could generate bad query execution plans and/or cause problems with joins. It was possible for multiple tables to be involved in the queries, and exposing that complexity to the user was unlikely to be easy to support.

Fortunately, ANTL 4 made it relatively easy to define a grammar and use the Listener interface to build up the WHERE clause. By starting the SQL WHERE clause with "1=1", it meant that all subsequent clauses could generate SQL of the form " AND foo = @P" or similar variations. The parameter numbers started at 1 and counted up, so setting the values was relatively straightforward, as ANTLR provides the matched text in the context provided to the ParseTreeListener.

My main problem with actually using ANTLR 4 from the book was that the practical differences and limitations were not emphasized or even stated at all. I got more help from this stackoverflow post (How does the ANTLR lexer disambiguate its rules) than from the book chapter about lexers. It would really have helped if the book started out stating that lexer rules must start with a capital letter, and the longest matching lexer rule will be preferred over other rules.

I'm also using ANTLR 4 to automatically parse a golang program, but that's a topic for another day.

Overall rating of the book: 3 out of 5 stars. It was very helpful for understanding how to use Listeners, but it was not helpful with several practical problems.

blogroll