74 lines
3.6 KiB
Markdown
74 lines
3.6 KiB
Markdown
# Known limitations of `tree-sitter-pico8-lua`
|
|
|
|
This document used to track parse incorrectness around PICO-8's
|
|
line-significant shorthand `if (cond) ...` / `while (cond) ...`
|
|
constructs. As of v0.3 the external scanner emits a `LINE_END` token
|
|
when the parser is at the body-or-terminator decision point of a
|
|
shorthand statement and the next byte is `\n` / `\r` / EOF, so the body
|
|
of a shorthand is correctly bounded to its source line.
|
|
|
|
There are no other known parse-incorrectness issues at this time.
|
|
Removing this file (or leaving it as a brief stub) is fine once you're
|
|
confident no documentation links still point at the old limitation
|
|
sections.
|
|
|
|
## How line-significance is wired up (for reference)
|
|
|
|
PICO-8 deviates from standard Lua in two places where a newline is
|
|
syntactically significant:
|
|
|
|
- `if (cond) <stmts...>` — the consequence (and any same-line `else`
|
|
alternative) extends to end-of-line, not to a matching `end`.
|
|
- `while (cond) <stmts...>` — same line-bounded body as the
|
|
shorthand `if`.
|
|
|
|
Tree-sitter has no built-in concept of newlines as syntactic tokens
|
|
when `/\s/` is in `extras` (and we want it there: every other
|
|
construct treats whitespace transparently). The canonical fix is an
|
|
**external scanner** that gates a synthetic terminator token on
|
|
`valid_symbols`. We do exactly that:
|
|
|
|
- `src/scanner.c` exposes a `LINE_END` external symbol. The scanner
|
|
looks at the raw lookahead before the lexer has a chance to skip
|
|
extras, and emits `LINE_END` only when the parser actually expects
|
|
one (i.e., `valid_symbols[LINE_END] == true`). At any other
|
|
position, the scanner's LINE_END branch returns false, and the `\n`
|
|
falls through to be eaten silently by the `/\s/` extras pattern.
|
|
- `LINE_END` is **zero-width** — the scanner does not consume the
|
|
newline. This matters for nested shorthands: `if (a) if (b) c()\nd()`
|
|
has to terminate BOTH shorthands at the same `\n`. With a zero-width
|
|
terminator, each enclosing shorthand sees the same `\n` in turn and
|
|
reduces. Once no shorthand is on the stack, `LINE_END` is no longer
|
|
in `valid_symbols`, the scanner returns false, and the `\n` is
|
|
consumed by extras. The emit chain is bounded by static nesting
|
|
depth, so there's no infinite-loop risk despite the zero width.
|
|
|
|
The shorthand rules in `grammar.js` end with `$._line_end`; the body
|
|
and the optional `else` alternative are both `$.statement, repeat($.statement)`,
|
|
allowing PICO-8's multi-statement single-line bodies
|
|
(`if (falling) wheeee() splat()`).
|
|
|
|
The cross-language pattern is "external scanner + valid_symbols-gated
|
|
terminator," same as `tree-sitter-r` (the closest analogue) and
|
|
similar in spirit to Ruby's paired `_line_break` / `_no_line_break`
|
|
hint tokens. Reaching for `\s` removal or per-rule extras is **not**
|
|
necessary for this style of line-significance; only Python-style
|
|
INDENT/DEDENT requires the heavier refactor.
|
|
|
|
## Test coverage
|
|
|
|
`test/corpus/shorthand_line_end.txt` exercises:
|
|
|
|
- Single- and multi-statement shorthand bodies, terminated by `\n` and
|
|
by EOF.
|
|
- Same-line `else` (single- and multi-statement alternative).
|
|
- The historical dangling-else case (shorthand inside a standard `if`,
|
|
with `else` on a later line — must bind to the outer `if`).
|
|
- Line comment trailing the shorthand body (the comment is in extras
|
|
and the trailing `\n` still triggers `LINE_END`).
|
|
- Shorthand inside a `do`-block (the `\n` before the closing `end`
|
|
terminates the shorthand cleanly).
|
|
- Nested shorthand `if`s on the same line (one `\n` must close both).
|
|
- Coexistence with standard `if (parenthesized) then ... end` — the
|
|
GLR conflict resolves on whether `then` follows.
|