lingo
Lingo
A parser generator for Crystal, inspired by Parslet.
Lingo provides text processing by:
- parsing the string into a tree of nodes
- providing a visitor to allow you to work from the tree
Installation
Add this to your application's shard.yml
:
dependencies:
lingo:
github: rmosolgo/lingo
Usage
Let's write a parser for highway names. The result will be a method for turning strings into useful objects:
def parse_road(input_str)
ast = RoadParser.new.parse(input_str)
visitor = RoadVisitor.new
visitor.visit(ast)
visitor.road
end
road = parse_road("I-5N")
# <Road @interstate=true, @number=5, @direction="N">
(See more examples in /examples
.)
In the USA, we write highway names like this:
50 # Route 50
I-64 # Interstate 64
I-95N # Interstate 95, Northbound
29B # Business Route 29
Parser
The general structure is {interstate?}{number}{direction?}{business?}
. Let's express that with Lingo rules:
class RoadParser < Lingo::Parser
# Match a string:
rule(:interstate) { str("I-") }
rule(:business) { str("B") }
# Match a regex:
rule(:digit) { match(/\d/) }
# Express repetition with `.repeat`
rule(:number) { digit.repeat }
rule(:north) { str("N") }
rule(:south) { str("S") }
rule(:east) { str("E") }
rule(:west) { str("W") }
# Compose rules by name
# Express alternation with |
rule(:direction) { north | south | east | west }
# Express sequence with >>
# Express optionality with `.maybe`
# Name matched strings with `.named`
rule(:road_name) {
interstate.named(:interstate).maybe >>
number.named(:number) >>
direction.named(:direction).maybe >>
business.named(:business).maybe
}
# You MUST name a starting rule:
root(:road_name)
end
Applying the Parser
An instance of a Lingo::Parser
subclass has a .parse
method which returns a tree of Lingo::Node
s.
RoadParser.new.parse("250B") # => <Lingo::Node ... >
It uses the rule named by root
.
Making Rules
These methods help you create rules:
str("string")
matches string exactlymatch(/[abc]/)
matches the regex exactlya | b
matchesa
orb
a >> b
matchesa
followed byb
a.maybe
matchesa
or nothinga.repeat
matches one-or-morea
sa.repeat(0)
matches zero-or-morea
sa.absent
matches not-a
a.named(:a)
names the result:a
for handling by a visitor
Visitor
After parsing, you get a tree of Lingo::Node
s. To turn that into an application object, write a visitor.
The visitor may define enter
and exit
hooks for nodes named with .named
in the Parser. It may set up some state during #initialize
, then access itself from the visitor
variable during hooks.
class RoadVisitor < Lingo::Visitor
# Set up an accumulator
getter :road
def initialize
@road = Road.new
end
# When you find a named node, you can do something with it.
# You can access the current visitor as `visitor`
enter(:interstate) {
# since we found this node, this is a business route
visitor.road.interstate = true
}
# You can access the named Lingo::Node as `node`.
# Get the matched string with `.full_value`
enter(:number) {
visitor.road.number = node.full_value.to_i
}
enter(:direction) {
visitor.road.direction = node.full_value
}
enter(:business) {
visitor.road.business = true
}
end
Visitor Hooks
During the depth-first visitation of the resulting tree of Lingo::Node
s, you can handle visits to nodes named with .named
:
enter(:match)
is called when entering a node named:match
exit(:match)
is called when exiting a node named:match
Within the hooks, you can access two magic variables:
visitor
is the Visitor itselfnode
is the matchedLingo::Node
which exposes:#full_value
: the full matched string#line
,#column
: position information for this match
About this Project
Goals
- Low barrier to entry: easy-to-learn API, short zero-to-working time
- Easy-to-read code, therefore easy-to-modify
- Useful errors (not accomplished)
Non-goals
- Blazing-fast performance
- Theoretical correctness
TODO
- [ ] Add some kind of debug output
How slow is it?
Let's compare the built-in JSON parser to a Lingo JSON parser:
./lingo/benchmark $ crystal run --release slow_json.cr
Stdlib JSON 126.45k (± 1.55%) fastest
Lingo::JSON 660.18 (± 1.28%) 191.54× slower
Ouch, that's a lot slower.
But, it's on par with Ruby and parslet
, the inspiration for this project:
$ ruby parslet_json_benchmark.rb
Calculating -------------------------------------
Parslet JSON 4.000 i/100ms
Built-in JSON 3.657k i/100ms
-------------------------------------------------
Parslet JSON 45.788 (± 4.4%) i/s - 232.000
Built-in JSON 38.285k (± 5.3%) i/s - 193.821k
Comparison:
Built-in JSON: 38285.2 i/s
Parslet JSON : 45.8 i/s - 836.13x slower
Both Parslet and Lingo are slower than handwritten parsers. But, they're easier to write!
Development
- Run the tests with
crystal spec
- Install Ruby &
guard
, then start a watcher withguard