Files
zed-p8/grammars/pico-8-lua/KNOWN_LIMITATIONS.md
T
2026-05-15 00:16:13 -07:00

74 lines
3.6 KiB
Markdown

# Known limitations of `tree-sitter-pico8-lua`
This document used to track parse incorrectness around PICO-8's
line-significant shorthand `if (cond) ...` / `while (cond) ...`
constructs. As of v0.3 the external scanner emits a `LINE_END` token
when the parser is at the body-or-terminator decision point of a
shorthand statement and the next byte is `\n` / `\r` / EOF, so the body
of a shorthand is correctly bounded to its source line.
There are no other known parse-incorrectness issues at this time.
Removing this file (or leaving it as a brief stub) is fine once you're
confident no documentation links still point at the old limitation
sections.
## How line-significance is wired up (for reference)
PICO-8 deviates from standard Lua in two places where a newline is
syntactically significant:
- `if (cond) <stmts...>` — the consequence (and any same-line `else`
alternative) extends to end-of-line, not to a matching `end`.
- `while (cond) <stmts...>` — same line-bounded body as the
shorthand `if`.
Tree-sitter has no built-in concept of newlines as syntactic tokens
when `/\s/` is in `extras` (and we want it there: every other
construct treats whitespace transparently). The canonical fix is an
**external scanner** that gates a synthetic terminator token on
`valid_symbols`. We do exactly that:
- `src/scanner.c` exposes a `LINE_END` external symbol. The scanner
looks at the raw lookahead before the lexer has a chance to skip
extras, and emits `LINE_END` only when the parser actually expects
one (i.e., `valid_symbols[LINE_END] == true`). At any other
position, the scanner's LINE_END branch returns false, and the `\n`
falls through to be eaten silently by the `/\s/` extras pattern.
- `LINE_END` is **zero-width** — the scanner does not consume the
newline. This matters for nested shorthands: `if (a) if (b) c()\nd()`
has to terminate BOTH shorthands at the same `\n`. With a zero-width
terminator, each enclosing shorthand sees the same `\n` in turn and
reduces. Once no shorthand is on the stack, `LINE_END` is no longer
in `valid_symbols`, the scanner returns false, and the `\n` is
consumed by extras. The emit chain is bounded by static nesting
depth, so there's no infinite-loop risk despite the zero width.
The shorthand rules in `grammar.js` end with `$._line_end`; the body
and the optional `else` alternative are both `$.statement, repeat($.statement)`,
allowing PICO-8's multi-statement single-line bodies
(`if (falling) wheeee() splat()`).
The cross-language pattern is "external scanner + valid_symbols-gated
terminator," same as `tree-sitter-r` (the closest analogue) and
similar in spirit to Ruby's paired `_line_break` / `_no_line_break`
hint tokens. Reaching for `\s` removal or per-rule extras is **not**
necessary for this style of line-significance; only Python-style
INDENT/DEDENT requires the heavier refactor.
## Test coverage
`test/corpus/shorthand_line_end.txt` exercises:
- Single- and multi-statement shorthand bodies, terminated by `\n` and
by EOF.
- Same-line `else` (single- and multi-statement alternative).
- The historical dangling-else case (shorthand inside a standard `if`,
with `else` on a later line — must bind to the outer `if`).
- Line comment trailing the shorthand body (the comment is in extras
and the trailing `\n` still triggers `LINE_END`).
- Shorthand inside a `do`-block (the `\n` before the closing `end`
terminates the shorthand cleanly).
- Nested shorthand `if`s on the same line (one `\n` must close both).
- Coexistence with standard `if (parenthesized) then ... end` — the
GLR conflict resolves on whether `then` follows.