PICO-8's shorthand `if (cond) stmt [else stmt]` is line-bounded, but tree-sitter has no built-in newline awareness. Without an external scanner ( the same mechanism tree-sitter-python uses for INDENT / DEDENT / NEWLINE ), the grammar greedily binds `else` to the nearest `if` and takes only one consequence statement for the shorthand body. Token classification is unaffected, so syntax highlighting renders identically to a correct parse; only auto-indent and semantic selection are subtly off, in a code pattern that is uncommon in real PICO-8 code. New `grammars/pico-8-lua/KNOWN_LIMITATIONS.md` walks through both incorrect cases ( the dangling-else mis-bind and the multi-statement shorthand body ), tabulates which Zed features are and aren't affected, and sketches the fix. README cross-links it from the "Known limitations" block and adds it as a prerequisite to the v0.3 LSP work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.5 KiB
Known limitations of tree-sitter-pico8-lua
PICO-8's Lua dialect is line-significant in two places: the body of a
shorthand if (cond) ... / while (cond) ... extends to end-of-line, and
the optional else of a shorthand if must be on the same line as the
opening if. Tree-sitter has no built-in concept of newlines as syntactic
tokens — to encode line-significance correctly we'd need an external
scanner ( a C file that emits synthetic line-end tokens, the same
mechanism tree-sitter-python uses for INDENT/DEDENT/NEWLINE ).
We have intentionally not written that scanner yet. This document tracks the resulting parse incorrectness so it isn't forgotten when we revisit.
1. Dangling-else mis-bind in nested if
-- intended: outer if/else, with shorthand-if as a single statement
-- inside the outer if's consequence.
if is_noisy then
if (is_goose()) honk()
else
toot()
end
The grammar's shorthand if rule uses prec.right on its optional else
clause, so it greedily eats any else it can see — matching the
classic "associate else with nearest if" convention from C / Java.
That's wrong for PICO-8, where the line break after honk() should
have closed the shorthand. The bound-too-tight parse:
elseis parsed as the shorthand's alternative, not the outer if's.- The outer
if_statementends up with noelse_statementchild. - The trailing
endstill resolves to the outerif_statement, so the source still parses cleanly ( noERRORnode ).
Indistinguishable case — both parses are correct here, because the
else really is on the same line as the shorthand:
if is_noisy then
if (is_goose()) honk() else toot()
end
2. Multi-statement shorthand body
-- both statements are conditional in PICO-8.
if (is_falling()) wheeee() splat()
The grammar's shorthand_if_statement rule takes exactly one
consequence statement, so this parses as:
shorthand_if_statementwith consequencewheeee()- followed by an unconditional
splat()statement
A line-aware grammar would gather every statement up to end-of-line into the shorthand body. Visually:
-- this and the previous example produce the SAME parse tree under
-- the current grammar, which is wrong for the previous example.
if (is_falling()) wheeee()
splat()
What does this break?
The parse is structurally wrong but token classification stays correct, because every keyword and identifier is still itself regardless of which parent node owns it. So:
| Feature | Affected? | Notes |
|---|---|---|
highlights.scm ( syntax highlighting ) |
No | else is @keyword.conditional whether it's a child of shorthand_if_statement or else_statement. |
outline.scm ( file outline ) |
No | Doesn't traverse if-bodies. |
| Bracket matching | No | Independent of if/else structure. |
| Injections | No | Independent. |
indents.scm ( auto-indent ) |
Subtly | A mis-bound else is inside a shorthand_if_statement, which is not an @indent node; so the next line may land at the wrong indent column. |
| Semantic selection ( "expand selection" ) | Subtly | Cursor on toot() expands to shorthand_if_statement instead of else_statement → outer if_statement. |
folds.scm / textobjects.scm |
Potentially | Not currently shipped; would inherit the structural bug if we add them. |
| Static analysis / LSP-style features | Yes | Anything that walks the AST to reason about reachability or scope ( e.g. "unreachable code", goto-definition through a conditional branch ) will mis-report. None of this is shipped today. |
For v0.2's stated scope ( syntax highlighting + a basic outline ), the visible symptom is "auto-indent occasionally off by one column inside a nested-if-with-out-of-line-else", which only bites a relatively uncommon code pattern. Deferred until v0.3 LSP work, which needs a correct AST.
Fixing it later
The canonical approach is an external scanner. Sketch:
- Add an
externalsymbol like_logical_line_endthat emits at every\nnot preceded by line-continuation context. - Make
shorthand_if_statementtake the formseq('if', '(', expr, ')', stmt, optional(seq(\ /* not _logical_line_end yet */ 'else', stmt)), $._logical_line_end). - Allow
shorthand_if_statementconsequence to berepeat1(stmt)so a one-lineif (x) a() b()puts both calls in the shorthand body.
The scanner needs to be written in C, registered via the externals
field, and built into src/scanner.c. tree-sitter-python's scanner is
a good reference for the pattern.