Parsek

Parser library for Kotlin consisting of a tokenizer and expression parser.

Tokenization

Tokenization is the process of splitting the input into a stream of token that is consumed by a parser.

In Parsek, this is distributed between two classes called Lexer and Scanner.

Lexer

The lexer (source, kdoc) is basically an iterator for a stream of tokens that is generated by splitting the input using regular expressions.

Regular expressions are mapped to token types using a function which typically just returns a fixed token type inline. The function can be used to implement a second layer of mapping, but this should be fairly uncommon. Input mapped to null (typically whitespace) will not be reported.

The lexer is usually not used directly; instead, it's handed in to the Scanner, which in turn is used by the parser.

The reason for the Lexer/Scanner split is to separate "raw" parsing from providing a nice and convenient API. The small API surface of the Lexer allows us to easily install additional processing between the Lexer and Scanner, for instance for context-sensitive newline filtering.

Typically, the Lexer is constructed directly inline where the Scanner is constructed.

Token

The token class (source, kdoc) stores the token type (typically a user-defined enum), the token text and the token position. Token instances are generated by the Lexer.

RegularExpressions

The RegularExpressions object (source, kdoc) contains a set of useful regular expressions for source code and data format tokenization.

Scanner

The Scanner class (source, kdoc) provides a simple API for convenient access to the token stream generated by the Lexer.

The scanner provides a notion of a "current" token that can be inspected multiple times -- opposed to iterator.next(), where the current token is "gone" after the call. This makes it easy to hand the scanner with the current token down in a recursive descend parser until it is consumed and processed by the corresponding handler.
It provides unlimited dynamic lookahead.
It provides a tryConsume() convenience method that checks for a given token text and consumes the token and returns true when it was found.

Scanner Use Cases

Typical use cases that only need a scanner and no expression parser are data formats such as JSON or CSV.

For a simple example, please refer to the JSON parser example.

Expression Parser

The configurable expression parser (source, kdoc) operates on a tokenizer, is stateless and should be shared / reused.

For ternary expressions, create a suffix expression and use the supplied tokenizer to consume the rest of the ternary.
Functions / "Apply" can be implemented in a similar way. Alternatively, this can be implemented in primary expression
parsing by checking for an opening brace after the primary expression.
"Grouping" brackets should be implemented where primary expressions are processed, too.

Expression Parser-Based Examples

A simple example evaluating mathematical expressions directly (opposed to building an explicit parse tree) can be found in the tests
A complete PL/0 parser is included in the examples module to illustrate how to use the expression parser and tokenizer for a simple but computational complete language: Parser.kt, Pl0Test.kt
A parser for mathematical expressions: ExpressionParser.kt, ExpressionsTest.kt
A simple example for using the scanner and expression parser to implement a simple indentation-based programming language: mython, MythonTest.kt
A BASIC interpreter using Parsek: https://github.com/stefanhaustein/basik

Name	Name	Last commit message	Last commit date
Latest commit stefanhaustein gradle issue workaround Jan 11, 2025 a6abc4a · Jan 11, 2025 History 90 Commits
convention-plugins	convention-plugins	gradle issue workaround	Jan 11, 2025
core	core	gradle issue workaround	Jan 11, 2025
examples	examples	gradle issue workaround	Jan 11, 2025
gradle/wrapper	gradle/wrapper	Gradle upgrade, version bump	Jan 11, 2025
.gitignore	.gitignore	Support an offset for token positions	Jan 18, 2023
LICENSE	LICENSE	Initial commit	Feb 5, 2022
README.md	README.md	Gradle upgrade, version bump	Jan 11, 2025
build.gradle.kts	build.gradle.kts	gradle issue workaround	Jan 11, 2025
gradle.properties	gradle.properties	Version bump	Jan 11, 2025
gradlew	gradlew	Gradle upgrade, version bump	Jan 11, 2025
gradlew.bat	gradlew.bat	Gradle upgrade, version bump	Jan 11, 2025
settings.gradle.kts	settings.gradle.kts	package name adjustment	May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parsek

Tokenization

Lexer

Token

RegularExpressions

Scanner

Scanner Use Cases

Expression Parser

Expression Parser-Based Examples

About

Releases 6

Packages

Languages

License

kobjects/parsek

Folders and files

Latest commit

History

Repository files navigation

Parsek

Tokenization

Lexer

Token

RegularExpressions

Scanner

Scanner Use Cases

Expression Parser

Expression Parser-Based Examples

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages