# Known limitations of `tree-sitter-pico8-lua` This document used to track parse incorrectness around PICO-8's line-significant shorthand `if (cond) ...` / `while (cond) ...` constructs. As of v0.3 the external scanner emits a `LINE_END` token when the parser is at the body-or-terminator decision point of a shorthand statement and the next byte is `\n` / `\r` / EOF, so the body of a shorthand is correctly bounded to its source line. There are no other known parse-incorrectness issues at this time. Removing this file (or leaving it as a brief stub) is fine once you're confident no documentation links still point at the old limitation sections. ## How line-significance is wired up (for reference) PICO-8 deviates from standard Lua in two places where a newline is syntactically significant: - `if (cond) ` — the consequence (and any same-line `else` alternative) extends to end-of-line, not to a matching `end`. - `while (cond) ` — same line-bounded body as the shorthand `if`. Tree-sitter has no built-in concept of newlines as syntactic tokens when `/\s/` is in `extras` (and we want it there: every other construct treats whitespace transparently). The canonical fix is an **external scanner** that gates a synthetic terminator token on `valid_symbols`. We do exactly that: - `src/scanner.c` exposes a `LINE_END` external symbol. The scanner looks at the raw lookahead before the lexer has a chance to skip extras, and emits `LINE_END` only when the parser actually expects one (i.e., `valid_symbols[LINE_END] == true`). At any other position, the scanner's LINE_END branch returns false, and the `\n` falls through to be eaten silently by the `/\s/` extras pattern. - `LINE_END` is **zero-width** — the scanner does not consume the newline. This matters for nested shorthands: `if (a) if (b) c()\nd()` has to terminate BOTH shorthands at the same `\n`. With a zero-width terminator, each enclosing shorthand sees the same `\n` in turn and reduces. Once no shorthand is on the stack, `LINE_END` is no longer in `valid_symbols`, the scanner returns false, and the `\n` is consumed by extras. The emit chain is bounded by static nesting depth, so there's no infinite-loop risk despite the zero width. The shorthand rules in `grammar.js` end with `$._line_end`; the body and the optional `else` alternative are both `$.statement, repeat($.statement)`, allowing PICO-8's multi-statement single-line bodies (`if (falling) wheeee() splat()`). The cross-language pattern is "external scanner + valid_symbols-gated terminator," same as `tree-sitter-r` (the closest analogue) and similar in spirit to Ruby's paired `_line_break` / `_no_line_break` hint tokens. Reaching for `\s` removal or per-rule extras is **not** necessary for this style of line-significance; only Python-style INDENT/DEDENT requires the heavier refactor. ## Test coverage `test/corpus/shorthand_line_end.txt` exercises: - Single- and multi-statement shorthand bodies, terminated by `\n` and by EOF. - Same-line `else` (single- and multi-statement alternative). - The historical dangling-else case (shorthand inside a standard `if`, with `else` on a later line — must bind to the outer `if`). - Line comment trailing the shorthand body (the comment is in extras and the trailing `\n` still triggers `LINE_END`). - Shorthand inside a `do`-block (the `\n` before the closing `end` terminates the shorthand cleanly). - Nested shorthand `if`s on the same line (one `\n` must close both). - Coexistence with standard `if (parenthesized) then ... end` — the GLR conflict resolves on whether `then` follows.