diff --git a/.gitignore b/.gitignore index 778c74b..813885a 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,6 @@ build/ .tree-sitter/ grammars/p8_cart/ grammars/pico8_lua/ + +# scratch directory for stuff to show an AI agent or reference in the IDE +.local/ diff --git a/README.md b/README.md index 75e3eeb..9e58f00 100644 --- a/README.md +++ b/README.md @@ -53,6 +53,20 @@ their PICO-8 work continue to get standard Lua treatment. ### Known limitations +- **Line-significant dialect features are not modeled.** PICO-8's + shorthand `if (cond) stmt [else stmt]` is line-bounded — a + later-line `else` belongs to an enclosing standard `if`, not the + shorthand. Without an external scanner, the grammar can't see + newlines, so it greedily binds `else` to the nearest `if` ( the + C / Java convention ) and treats a multi-statement single-line + shorthand body as one statement plus a sequence of unconditional + follow-ups. The parse is structurally wrong but **tokens still + classify correctly**, so syntax highlighting renders identically + to a correct parse; only auto-indent and semantic selection are + subtly affected. Full write-up: + [`grammars/pico-8-lua/KNOWN_LIMITATIONS.md`](grammars/pico-8-lua/KNOWN_LIMITATIONS.md). + Slated for v0.3 work alongside LSP integration ( which needs a + correct AST ). - **No language server.** No completion, hover docs, or diagnostics for PICO-8 builtins yet — only a static `function.builtin` highlight on recognized names. See Roadmap. @@ -163,7 +177,13 @@ fallback `unknown_section` rule. ### v0.3 — Language server integration -Wire up [`japhib/pico8-ls`](https://github.com/japhib/pico8-ls) ( or whichever +Prerequisite: an external scanner for `tree-sitter-pico8-lua` so the +shorthand-if and shorthand-while bodies are line-bounded the way PICO-8 +defines them. See [`grammars/pico-8-lua/KNOWN_LIMITATIONS.md`](grammars/pico-8-lua/KNOWN_LIMITATIONS.md). +LSP features that walk the AST ( unreachable-code lint, goto-definition +through a conditional branch ) need correct structure. + +Then wire up [`japhib/pico8-ls`](https://github.com/japhib/pico8-ls) ( or whichever PICO-8 LSP is most maintained at the time ) for: - Completion of PICO-8 builtins ( `spr`, `circfill`, `btn`, `flr`, … ). diff --git a/grammars/pico-8-lua/KNOWN_LIMITATIONS.md b/grammars/pico-8-lua/KNOWN_LIMITATIONS.md new file mode 100644 index 0000000..aca73a8 --- /dev/null +++ b/grammars/pico-8-lua/KNOWN_LIMITATIONS.md @@ -0,0 +1,106 @@ +# Known limitations of `tree-sitter-pico8-lua` + +PICO-8's Lua dialect is **line-significant** in two places: the body of a +shorthand `if (cond) ...` / `while (cond) ...` extends to end-of-line, and +the optional `else` of a shorthand `if` must be on the same line as the +opening `if`. Tree-sitter has no built-in concept of newlines as syntactic +tokens — to encode line-significance correctly we'd need an **external +scanner** ( a C file that emits synthetic line-end tokens, the same +mechanism `tree-sitter-python` uses for `INDENT`/`DEDENT`/`NEWLINE` ). + +We have intentionally not written that scanner yet. This document tracks +the resulting parse incorrectness so it isn't forgotten when we revisit. + +## 1. Dangling-`else` mis-bind in nested `if` + +```lua +-- intended: outer if/else, with shorthand-if as a single statement +-- inside the outer if's consequence. +if is_noisy then + if (is_goose()) honk() +else + toot() +end +``` + +The grammar's shorthand `if` rule uses `prec.right` on its optional `else` +clause, so it greedily eats any `else` it can see — matching the +classic "associate else with nearest if" convention from C / Java. +That's wrong for PICO-8, where the line break after `honk()` should +have closed the shorthand. The bound-too-tight parse: + +- `else` is parsed as the shorthand's alternative, not the outer if's. +- The outer `if_statement` ends up with no `else_statement` child. +- The trailing `end` still resolves to the outer `if_statement`, + so the source still parses cleanly ( no `ERROR` node ). + +**Indistinguishable case** — both parses are correct here, because the +`else` really is on the same line as the shorthand: + +```lua +if is_noisy then + if (is_goose()) honk() else toot() +end +``` + +## 2. Multi-statement shorthand body + +```lua +-- both statements are conditional in PICO-8. +if (is_falling()) wheeee() splat() +``` + +The grammar's `shorthand_if_statement` rule takes exactly one +consequence statement, so this parses as: + +- `shorthand_if_statement` with consequence `wheeee()` +- followed by an unconditional `splat()` statement + +A line-aware grammar would gather every statement up to end-of-line +into the shorthand body. Visually: + +```lua +-- this and the previous example produce the SAME parse tree under +-- the current grammar, which is wrong for the previous example. +if (is_falling()) wheeee() +splat() +``` + +## What does this break? + +The parse is structurally wrong but **token classification stays +correct**, because every keyword and identifier is still itself +regardless of which parent node owns it. So: + +| Feature | Affected? | Notes | +|---|---|---| +| `highlights.scm` ( syntax highlighting ) | No | `else` is `@keyword.conditional` whether it's a child of `shorthand_if_statement` or `else_statement`. | +| `outline.scm` ( file outline ) | No | Doesn't traverse if-bodies. | +| Bracket matching | No | Independent of if/else structure. | +| Injections | No | Independent. | +| `indents.scm` ( auto-indent ) | Subtly | A mis-bound `else` is inside a `shorthand_if_statement`, which is not an `@indent` node; so the next line may land at the wrong indent column. | +| Semantic selection ( "expand selection" ) | Subtly | Cursor on `toot()` expands to `shorthand_if_statement` instead of `else_statement` → outer `if_statement`. | +| `folds.scm` / `textobjects.scm` | Potentially | Not currently shipped; would inherit the structural bug if we add them. | +| Static analysis / LSP-style features | Yes | Anything that walks the AST to reason about reachability or scope ( e.g. "unreachable code", goto-definition through a conditional branch ) will mis-report. None of this is shipped today. | + +For v0.2's stated scope ( syntax highlighting + a basic outline ), the +visible symptom is "auto-indent occasionally off by one column inside a +nested-if-with-out-of-line-else", which only bites a relatively +uncommon code pattern. Deferred until v0.3 LSP work, which needs a +correct AST. + +## Fixing it later + +The canonical approach is an external scanner. Sketch: + +1. Add an `external` symbol like `_logical_line_end` that emits at every + `\n` *not* preceded by line-continuation context. +2. Make `shorthand_if_statement` take the form + `seq('if', '(', expr, ')', stmt, optional(seq(\ + /* not _logical_line_end yet */ 'else', stmt)), $._logical_line_end)`. +3. Allow `shorthand_if_statement` consequence to be `repeat1(stmt)` so a + one-line `if (x) a() b()` puts both calls in the shorthand body. + +The scanner needs to be written in C, registered via the `externals` +field, and built into `src/scanner.c`. `tree-sitter-python`'s scanner is +a good reference for the pattern.