Document line-significance limitations in the Pico-8 Lua grammar
PICO-8's shorthand `if (cond) stmt [else stmt]` is line-bounded, but tree-sitter has no built-in newline awareness. Without an external scanner ( the same mechanism tree-sitter-python uses for INDENT / DEDENT / NEWLINE ), the grammar greedily binds `else` to the nearest `if` and takes only one consequence statement for the shorthand body. Token classification is unaffected, so syntax highlighting renders identically to a correct parse; only auto-indent and semantic selection are subtly off, in a code pattern that is uncommon in real PICO-8 code. New `grammars/pico-8-lua/KNOWN_LIMITATIONS.md` walks through both incorrect cases ( the dangling-else mis-bind and the multi-statement shorthand body ), tabulates which Zed features are and aren't affected, and sketches the fix. README cross-links it from the "Known limitations" block and adds it as a prerequisite to the v0.3 LSP work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -10,3 +10,6 @@ build/
|
||||
.tree-sitter/
|
||||
grammars/p8_cart/
|
||||
grammars/pico8_lua/
|
||||
|
||||
# scratch directory for stuff to show an AI agent or reference in the IDE
|
||||
.local/
|
||||
|
||||
@@ -53,6 +53,20 @@ their PICO-8 work continue to get standard Lua treatment.
|
||||
|
||||
### Known limitations
|
||||
|
||||
- **Line-significant dialect features are not modeled.** PICO-8's
|
||||
shorthand `if (cond) stmt [else stmt]` is line-bounded — a
|
||||
later-line `else` belongs to an enclosing standard `if`, not the
|
||||
shorthand. Without an external scanner, the grammar can't see
|
||||
newlines, so it greedily binds `else` to the nearest `if` ( the
|
||||
C / Java convention ) and treats a multi-statement single-line
|
||||
shorthand body as one statement plus a sequence of unconditional
|
||||
follow-ups. The parse is structurally wrong but **tokens still
|
||||
classify correctly**, so syntax highlighting renders identically
|
||||
to a correct parse; only auto-indent and semantic selection are
|
||||
subtly affected. Full write-up:
|
||||
[`grammars/pico-8-lua/KNOWN_LIMITATIONS.md`](grammars/pico-8-lua/KNOWN_LIMITATIONS.md).
|
||||
Slated for v0.3 work alongside LSP integration ( which needs a
|
||||
correct AST ).
|
||||
- **No language server.** No completion, hover docs, or diagnostics for
|
||||
PICO-8 builtins yet — only a static `function.builtin` highlight on
|
||||
recognized names. See Roadmap.
|
||||
@@ -163,7 +177,13 @@ fallback `unknown_section` rule.
|
||||
|
||||
### v0.3 — Language server integration
|
||||
|
||||
Wire up [`japhib/pico8-ls`](https://github.com/japhib/pico8-ls) ( or whichever
|
||||
Prerequisite: an external scanner for `tree-sitter-pico8-lua` so the
|
||||
shorthand-if and shorthand-while bodies are line-bounded the way PICO-8
|
||||
defines them. See [`grammars/pico-8-lua/KNOWN_LIMITATIONS.md`](grammars/pico-8-lua/KNOWN_LIMITATIONS.md).
|
||||
LSP features that walk the AST ( unreachable-code lint, goto-definition
|
||||
through a conditional branch ) need correct structure.
|
||||
|
||||
Then wire up [`japhib/pico8-ls`](https://github.com/japhib/pico8-ls) ( or whichever
|
||||
PICO-8 LSP is most maintained at the time ) for:
|
||||
|
||||
- Completion of PICO-8 builtins ( `spr`, `circfill`, `btn`, `flr`, … ).
|
||||
|
||||
@@ -0,0 +1,106 @@
|
||||
# Known limitations of `tree-sitter-pico8-lua`
|
||||
|
||||
PICO-8's Lua dialect is **line-significant** in two places: the body of a
|
||||
shorthand `if (cond) ...` / `while (cond) ...` extends to end-of-line, and
|
||||
the optional `else` of a shorthand `if` must be on the same line as the
|
||||
opening `if`. Tree-sitter has no built-in concept of newlines as syntactic
|
||||
tokens — to encode line-significance correctly we'd need an **external
|
||||
scanner** ( a C file that emits synthetic line-end tokens, the same
|
||||
mechanism `tree-sitter-python` uses for `INDENT`/`DEDENT`/`NEWLINE` ).
|
||||
|
||||
We have intentionally not written that scanner yet. This document tracks
|
||||
the resulting parse incorrectness so it isn't forgotten when we revisit.
|
||||
|
||||
## 1. Dangling-`else` mis-bind in nested `if`
|
||||
|
||||
```lua
|
||||
-- intended: outer if/else, with shorthand-if as a single statement
|
||||
-- inside the outer if's consequence.
|
||||
if is_noisy then
|
||||
if (is_goose()) honk()
|
||||
else
|
||||
toot()
|
||||
end
|
||||
```
|
||||
|
||||
The grammar's shorthand `if` rule uses `prec.right` on its optional `else`
|
||||
clause, so it greedily eats any `else` it can see — matching the
|
||||
classic "associate else with nearest if" convention from C / Java.
|
||||
That's wrong for PICO-8, where the line break after `honk()` should
|
||||
have closed the shorthand. The bound-too-tight parse:
|
||||
|
||||
- `else` is parsed as the shorthand's alternative, not the outer if's.
|
||||
- The outer `if_statement` ends up with no `else_statement` child.
|
||||
- The trailing `end` still resolves to the outer `if_statement`,
|
||||
so the source still parses cleanly ( no `ERROR` node ).
|
||||
|
||||
**Indistinguishable case** — both parses are correct here, because the
|
||||
`else` really is on the same line as the shorthand:
|
||||
|
||||
```lua
|
||||
if is_noisy then
|
||||
if (is_goose()) honk() else toot()
|
||||
end
|
||||
```
|
||||
|
||||
## 2. Multi-statement shorthand body
|
||||
|
||||
```lua
|
||||
-- both statements are conditional in PICO-8.
|
||||
if (is_falling()) wheeee() splat()
|
||||
```
|
||||
|
||||
The grammar's `shorthand_if_statement` rule takes exactly one
|
||||
consequence statement, so this parses as:
|
||||
|
||||
- `shorthand_if_statement` with consequence `wheeee()`
|
||||
- followed by an unconditional `splat()` statement
|
||||
|
||||
A line-aware grammar would gather every statement up to end-of-line
|
||||
into the shorthand body. Visually:
|
||||
|
||||
```lua
|
||||
-- this and the previous example produce the SAME parse tree under
|
||||
-- the current grammar, which is wrong for the previous example.
|
||||
if (is_falling()) wheeee()
|
||||
splat()
|
||||
```
|
||||
|
||||
## What does this break?
|
||||
|
||||
The parse is structurally wrong but **token classification stays
|
||||
correct**, because every keyword and identifier is still itself
|
||||
regardless of which parent node owns it. So:
|
||||
|
||||
| Feature | Affected? | Notes |
|
||||
|---|---|---|
|
||||
| `highlights.scm` ( syntax highlighting ) | No | `else` is `@keyword.conditional` whether it's a child of `shorthand_if_statement` or `else_statement`. |
|
||||
| `outline.scm` ( file outline ) | No | Doesn't traverse if-bodies. |
|
||||
| Bracket matching | No | Independent of if/else structure. |
|
||||
| Injections | No | Independent. |
|
||||
| `indents.scm` ( auto-indent ) | Subtly | A mis-bound `else` is inside a `shorthand_if_statement`, which is not an `@indent` node; so the next line may land at the wrong indent column. |
|
||||
| Semantic selection ( "expand selection" ) | Subtly | Cursor on `toot()` expands to `shorthand_if_statement` instead of `else_statement` → outer `if_statement`. |
|
||||
| `folds.scm` / `textobjects.scm` | Potentially | Not currently shipped; would inherit the structural bug if we add them. |
|
||||
| Static analysis / LSP-style features | Yes | Anything that walks the AST to reason about reachability or scope ( e.g. "unreachable code", goto-definition through a conditional branch ) will mis-report. None of this is shipped today. |
|
||||
|
||||
For v0.2's stated scope ( syntax highlighting + a basic outline ), the
|
||||
visible symptom is "auto-indent occasionally off by one column inside a
|
||||
nested-if-with-out-of-line-else", which only bites a relatively
|
||||
uncommon code pattern. Deferred until v0.3 LSP work, which needs a
|
||||
correct AST.
|
||||
|
||||
## Fixing it later
|
||||
|
||||
The canonical approach is an external scanner. Sketch:
|
||||
|
||||
1. Add an `external` symbol like `_logical_line_end` that emits at every
|
||||
`\n` *not* preceded by line-continuation context.
|
||||
2. Make `shorthand_if_statement` take the form
|
||||
`seq('if', '(', expr, ')', stmt, optional(seq(\
|
||||
/* not _logical_line_end yet */ 'else', stmt)), $._logical_line_end)`.
|
||||
3. Allow `shorthand_if_statement` consequence to be `repeat1(stmt)` so a
|
||||
one-line `if (x) a() b()` puts both calls in the shorthand body.
|
||||
|
||||
The scanner needs to be written in C, registered via the `externals`
|
||||
field, and built into `src/scanner.c`. `tree-sitter-python`'s scanner is
|
||||
a good reference for the pattern.
|
||||
Reference in New Issue
Block a user