3.6 KiB
Known limitations of tree-sitter-pico8-lua
This document used to track parse incorrectness around PICO-8's
line-significant shorthand if (cond) ... / while (cond) ...
constructs. As of v0.3 the external scanner emits a LINE_END token
when the parser is at the body-or-terminator decision point of a
shorthand statement and the next byte is \n / \r / EOF, so the body
of a shorthand is correctly bounded to its source line.
There are no other known parse-incorrectness issues at this time. Removing this file (or leaving it as a brief stub) is fine once you're confident no documentation links still point at the old limitation sections.
How line-significance is wired up (for reference)
PICO-8 deviates from standard Lua in two places where a newline is syntactically significant:
if (cond) <stmts...>— the consequence (and any same-lineelsealternative) extends to end-of-line, not to a matchingend.while (cond) <stmts...>— same line-bounded body as the shorthandif.
Tree-sitter has no built-in concept of newlines as syntactic tokens
when /\s/ is in extras (and we want it there: every other
construct treats whitespace transparently). The canonical fix is an
external scanner that gates a synthetic terminator token on
valid_symbols. We do exactly that:
src/scanner.cexposes aLINE_ENDexternal symbol. The scanner looks at the raw lookahead before the lexer has a chance to skip extras, and emitsLINE_ENDonly when the parser actually expects one (i.e.,valid_symbols[LINE_END] == true). At any other position, the scanner's LINE_END branch returns false, and the\nfalls through to be eaten silently by the/\s/extras pattern.LINE_ENDis zero-width — the scanner does not consume the newline. This matters for nested shorthands:if (a) if (b) c()\nd()has to terminate BOTH shorthands at the same\n. With a zero-width terminator, each enclosing shorthand sees the same\nin turn and reduces. Once no shorthand is on the stack,LINE_ENDis no longer invalid_symbols, the scanner returns false, and the\nis consumed by extras. The emit chain is bounded by static nesting depth, so there's no infinite-loop risk despite the zero width.
The shorthand rules in grammar.js end with $._line_end; the body
and the optional else alternative are both $.statement, repeat($.statement),
allowing PICO-8's multi-statement single-line bodies
(if (falling) wheeee() splat()).
The cross-language pattern is "external scanner + valid_symbols-gated
terminator," same as tree-sitter-r (the closest analogue) and
similar in spirit to Ruby's paired _line_break / _no_line_break
hint tokens. Reaching for \s removal or per-rule extras is not
necessary for this style of line-significance; only Python-style
INDENT/DEDENT requires the heavier refactor.
Test coverage
test/corpus/shorthand_line_end.txt exercises:
- Single- and multi-statement shorthand bodies, terminated by
\nand by EOF. - Same-line
else(single- and multi-statement alternative). - The historical dangling-else case (shorthand inside a standard
if, withelseon a later line — must bind to the outerif). - Line comment trailing the shorthand body (the comment is in extras
and the trailing
\nstill triggersLINE_END). - Shorthand inside a
do-block (the\nbefore the closingendterminates the shorthand cleanly). - Nested shorthand
ifs on the same line (one\nmust close both). - Coexistence with standard
if (parenthesized) then ... end— the GLR conflict resolves on whetherthenfollows.