parse EOL as a token
This commit is contained in:
@@ -8,7 +8,7 @@ standard Lua 5.2 (compound assignments, `?` print shorthand, single-line
|
|||||||
operators `@`/`%`/`$`, the rotate/logical-shift family `<<>` / `>><` / `>>>`,
|
operators `@`/`%`/`$`, the rotate/logical-shift family `<<>` / `>><` / `>>>`,
|
||||||
and `#include`).
|
and `#include`).
|
||||||
|
|
||||||
## Status — v0.2
|
## Status — v0.3 (unreleased)
|
||||||
|
|
||||||
Working today:
|
Working today:
|
||||||
|
|
||||||
@@ -51,22 +51,22 @@ their PICO-8 work continue to get standard Lua treatment.
|
|||||||
| `#include path` directive | ✓ |
|
| `#include path` directive | ✓ |
|
||||||
| `_init` / `_update` / `_update60` / `_draw` highlighted as builtins | ✓ |
|
| `_init` / `_update` / `_update60` / `_draw` highlighted as builtins | ✓ |
|
||||||
|
|
||||||
|
### Line-significance (resolved in v0.3)
|
||||||
|
|
||||||
|
PICO-8's shorthand `if (cond) ...` and `while (cond) ...` are
|
||||||
|
line-bounded: a later-line `else` belongs to an enclosing standard
|
||||||
|
`if`, not the shorthand, and a multi-statement single-line shorthand
|
||||||
|
body collects every statement on the line. The external scanner emits
|
||||||
|
a zero-width `LINE_END` token at `\n` / `\r` / EOF when (and only
|
||||||
|
when) the parser is at the body-or-terminator decision point of a
|
||||||
|
shorthand statement, so the AST now matches PICO-8 semantics — see
|
||||||
|
[`grammars/pico-8-lua/KNOWN_LIMITATIONS.md`](grammars/pico-8-lua/KNOWN_LIMITATIONS.md)
|
||||||
|
for the wiring detail and
|
||||||
|
[`grammars/pico-8-lua/test/corpus/shorthand_line_end.txt`](grammars/pico-8-lua/test/corpus/shorthand_line_end.txt)
|
||||||
|
for the test corpus.
|
||||||
|
|
||||||
### Known limitations
|
### Known limitations
|
||||||
|
|
||||||
- **Line-significant dialect features are not modeled.** PICO-8's
|
|
||||||
shorthand `if (cond) stmt [else stmt]` is line-bounded — a
|
|
||||||
later-line `else` belongs to an enclosing standard `if`, not the
|
|
||||||
shorthand. Without an external scanner, the grammar can't see
|
|
||||||
newlines, so it greedily binds `else` to the nearest `if` ( the
|
|
||||||
C / Java convention ) and treats a multi-statement single-line
|
|
||||||
shorthand body as one statement plus a sequence of unconditional
|
|
||||||
follow-ups. The parse is structurally wrong but **tokens still
|
|
||||||
classify correctly**, so syntax highlighting renders identically
|
|
||||||
to a correct parse; only auto-indent and semantic selection are
|
|
||||||
subtly affected. Full write-up:
|
|
||||||
[`grammars/pico-8-lua/KNOWN_LIMITATIONS.md`](grammars/pico-8-lua/KNOWN_LIMITATIONS.md).
|
|
||||||
Slated for v0.3 work alongside LSP integration ( which needs a
|
|
||||||
correct AST ).
|
|
||||||
- **No language server.** No completion, hover docs, or diagnostics for
|
- **No language server.** No completion, hover docs, or diagnostics for
|
||||||
PICO-8 builtins yet — only a static `function.builtin` highlight on
|
PICO-8 builtins yet — only a static `function.builtin` highlight on
|
||||||
recognized names. See Roadmap.
|
recognized names. See Roadmap.
|
||||||
@@ -173,17 +173,22 @@ a Lua identifier resembles a section marker ( e.g. `local __foo__ = 1`
|
|||||||
must remain a `line`, not be re-tokenized as a marker ), and the
|
must remain a `line`, not be re-tokenized as a marker ), and the
|
||||||
fallback `unknown_section` rule.
|
fallback `unknown_section` rule.
|
||||||
|
|
||||||
|
The Lua grammar has a corpus under `grammars/pico-8-lua/test/corpus/` —
|
||||||
|
run `( cd grammars/pico-8-lua && npx tree-sitter test )`. The corpus
|
||||||
|
exercises shorthand `if`/`while` line-end behavior: dangling-else,
|
||||||
|
multi-statement bodies, EOF termination, nested same-line shorthands,
|
||||||
|
and coexistence with standard `if (parenthesized) then ... end`.
|
||||||
|
|
||||||
## Roadmap
|
## Roadmap
|
||||||
|
|
||||||
### v0.3 — Language server integration
|
### v0.3 — Language server integration
|
||||||
|
|
||||||
Prerequisite: an external scanner for `tree-sitter-pico8-lua` so the
|
The line-significance prerequisite is now satisfied (see *Line-significance*
|
||||||
shorthand-if and shorthand-while bodies are line-bounded the way PICO-8
|
above), so LSP features that walk the AST — unreachable-code lint,
|
||||||
defines them. See [`grammars/pico-8-lua/KNOWN_LIMITATIONS.md`](grammars/pico-8-lua/KNOWN_LIMITATIONS.md).
|
goto-definition through a conditional branch — have a correct structure
|
||||||
LSP features that walk the AST ( unreachable-code lint, goto-definition
|
to work against.
|
||||||
through a conditional branch ) need correct structure.
|
|
||||||
|
|
||||||
Then wire up [`japhib/pico8-ls`](https://github.com/japhib/pico8-ls) ( or whichever
|
Wire up [`japhib/pico8-ls`](https://github.com/japhib/pico8-ls) ( or whichever
|
||||||
PICO-8 LSP is most maintained at the time ) for:
|
PICO-8 LSP is most maintained at the time ) for:
|
||||||
|
|
||||||
- Completion of PICO-8 builtins ( `spr`, `circfill`, `btn`, `flr`, … ).
|
- Completion of PICO-8 builtins ( `spr`, `circfill`, `btn`, `flr`, … ).
|
||||||
|
|||||||
@@ -1,106 +1,73 @@
|
|||||||
# Known limitations of `tree-sitter-pico8-lua`
|
# Known limitations of `tree-sitter-pico8-lua`
|
||||||
|
|
||||||
PICO-8's Lua dialect is **line-significant** in two places: the body of a
|
This document used to track parse incorrectness around PICO-8's
|
||||||
shorthand `if (cond) ...` / `while (cond) ...` extends to end-of-line, and
|
line-significant shorthand `if (cond) ...` / `while (cond) ...`
|
||||||
the optional `else` of a shorthand `if` must be on the same line as the
|
constructs. As of v0.3 the external scanner emits a `LINE_END` token
|
||||||
opening `if`. Tree-sitter has no built-in concept of newlines as syntactic
|
when the parser is at the body-or-terminator decision point of a
|
||||||
tokens — to encode line-significance correctly we'd need an **external
|
shorthand statement and the next byte is `\n` / `\r` / EOF, so the body
|
||||||
scanner** ( a C file that emits synthetic line-end tokens, the same
|
of a shorthand is correctly bounded to its source line.
|
||||||
mechanism `tree-sitter-python` uses for `INDENT`/`DEDENT`/`NEWLINE` ).
|
|
||||||
|
|
||||||
We have intentionally not written that scanner yet. This document tracks
|
There are no other known parse-incorrectness issues at this time.
|
||||||
the resulting parse incorrectness so it isn't forgotten when we revisit.
|
Removing this file (or leaving it as a brief stub) is fine once you're
|
||||||
|
confident no documentation links still point at the old limitation
|
||||||
|
sections.
|
||||||
|
|
||||||
## 1. Dangling-`else` mis-bind in nested `if`
|
## How line-significance is wired up (for reference)
|
||||||
|
|
||||||
```lua
|
PICO-8 deviates from standard Lua in two places where a newline is
|
||||||
-- intended: outer if/else, with shorthand-if as a single statement
|
syntactically significant:
|
||||||
-- inside the outer if's consequence.
|
|
||||||
if is_noisy then
|
|
||||||
if (is_goose()) honk()
|
|
||||||
else
|
|
||||||
toot()
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
The grammar's shorthand `if` rule uses `prec.right` on its optional `else`
|
- `if (cond) <stmts...>` — the consequence (and any same-line `else`
|
||||||
clause, so it greedily eats any `else` it can see — matching the
|
alternative) extends to end-of-line, not to a matching `end`.
|
||||||
classic "associate else with nearest if" convention from C / Java.
|
- `while (cond) <stmts...>` — same line-bounded body as the
|
||||||
That's wrong for PICO-8, where the line break after `honk()` should
|
shorthand `if`.
|
||||||
have closed the shorthand. The bound-too-tight parse:
|
|
||||||
|
|
||||||
- `else` is parsed as the shorthand's alternative, not the outer if's.
|
Tree-sitter has no built-in concept of newlines as syntactic tokens
|
||||||
- The outer `if_statement` ends up with no `else_statement` child.
|
when `/\s/` is in `extras` (and we want it there: every other
|
||||||
- The trailing `end` still resolves to the outer `if_statement`,
|
construct treats whitespace transparently). The canonical fix is an
|
||||||
so the source still parses cleanly ( no `ERROR` node ).
|
**external scanner** that gates a synthetic terminator token on
|
||||||
|
`valid_symbols`. We do exactly that:
|
||||||
|
|
||||||
**Indistinguishable case** — both parses are correct here, because the
|
- `src/scanner.c` exposes a `LINE_END` external symbol. The scanner
|
||||||
`else` really is on the same line as the shorthand:
|
looks at the raw lookahead before the lexer has a chance to skip
|
||||||
|
extras, and emits `LINE_END` only when the parser actually expects
|
||||||
|
one (i.e., `valid_symbols[LINE_END] == true`). At any other
|
||||||
|
position, the scanner's LINE_END branch returns false, and the `\n`
|
||||||
|
falls through to be eaten silently by the `/\s/` extras pattern.
|
||||||
|
- `LINE_END` is **zero-width** — the scanner does not consume the
|
||||||
|
newline. This matters for nested shorthands: `if (a) if (b) c()\nd()`
|
||||||
|
has to terminate BOTH shorthands at the same `\n`. With a zero-width
|
||||||
|
terminator, each enclosing shorthand sees the same `\n` in turn and
|
||||||
|
reduces. Once no shorthand is on the stack, `LINE_END` is no longer
|
||||||
|
in `valid_symbols`, the scanner returns false, and the `\n` is
|
||||||
|
consumed by extras. The emit chain is bounded by static nesting
|
||||||
|
depth, so there's no infinite-loop risk despite the zero width.
|
||||||
|
|
||||||
```lua
|
The shorthand rules in `grammar.js` end with `$._line_end`; the body
|
||||||
if is_noisy then
|
and the optional `else` alternative are both `$.statement, repeat($.statement)`,
|
||||||
if (is_goose()) honk() else toot()
|
allowing PICO-8's multi-statement single-line bodies
|
||||||
end
|
(`if (falling) wheeee() splat()`).
|
||||||
```
|
|
||||||
|
|
||||||
## 2. Multi-statement shorthand body
|
The cross-language pattern is "external scanner + valid_symbols-gated
|
||||||
|
terminator," same as `tree-sitter-r` (the closest analogue) and
|
||||||
|
similar in spirit to Ruby's paired `_line_break` / `_no_line_break`
|
||||||
|
hint tokens. Reaching for `\s` removal or per-rule extras is **not**
|
||||||
|
necessary for this style of line-significance; only Python-style
|
||||||
|
INDENT/DEDENT requires the heavier refactor.
|
||||||
|
|
||||||
```lua
|
## Test coverage
|
||||||
-- both statements are conditional in PICO-8.
|
|
||||||
if (is_falling()) wheeee() splat()
|
|
||||||
```
|
|
||||||
|
|
||||||
The grammar's `shorthand_if_statement` rule takes exactly one
|
`test/corpus/shorthand_line_end.txt` exercises:
|
||||||
consequence statement, so this parses as:
|
|
||||||
|
|
||||||
- `shorthand_if_statement` with consequence `wheeee()`
|
- Single- and multi-statement shorthand bodies, terminated by `\n` and
|
||||||
- followed by an unconditional `splat()` statement
|
by EOF.
|
||||||
|
- Same-line `else` (single- and multi-statement alternative).
|
||||||
A line-aware grammar would gather every statement up to end-of-line
|
- The historical dangling-else case (shorthand inside a standard `if`,
|
||||||
into the shorthand body. Visually:
|
with `else` on a later line — must bind to the outer `if`).
|
||||||
|
- Line comment trailing the shorthand body (the comment is in extras
|
||||||
```lua
|
and the trailing `\n` still triggers `LINE_END`).
|
||||||
-- this and the previous example produce the SAME parse tree under
|
- Shorthand inside a `do`-block (the `\n` before the closing `end`
|
||||||
-- the current grammar, which is wrong for the previous example.
|
terminates the shorthand cleanly).
|
||||||
if (is_falling()) wheeee()
|
- Nested shorthand `if`s on the same line (one `\n` must close both).
|
||||||
splat()
|
- Coexistence with standard `if (parenthesized) then ... end` — the
|
||||||
```
|
GLR conflict resolves on whether `then` follows.
|
||||||
|
|
||||||
## What does this break?
|
|
||||||
|
|
||||||
The parse is structurally wrong but **token classification stays
|
|
||||||
correct**, because every keyword and identifier is still itself
|
|
||||||
regardless of which parent node owns it. So:
|
|
||||||
|
|
||||||
| Feature | Affected? | Notes |
|
|
||||||
|---|---|---|
|
|
||||||
| `highlights.scm` ( syntax highlighting ) | No | `else` is `@keyword.conditional` whether it's a child of `shorthand_if_statement` or `else_statement`. |
|
|
||||||
| `outline.scm` ( file outline ) | No | Doesn't traverse if-bodies. |
|
|
||||||
| Bracket matching | No | Independent of if/else structure. |
|
|
||||||
| Injections | No | Independent. |
|
|
||||||
| `indents.scm` ( auto-indent ) | Subtly | A mis-bound `else` is inside a `shorthand_if_statement`, which is not an `@indent` node; so the next line may land at the wrong indent column. |
|
|
||||||
| Semantic selection ( "expand selection" ) | Subtly | Cursor on `toot()` expands to `shorthand_if_statement` instead of `else_statement` → outer `if_statement`. |
|
|
||||||
| `folds.scm` / `textobjects.scm` | Potentially | Not currently shipped; would inherit the structural bug if we add them. |
|
|
||||||
| Static analysis / LSP-style features | Yes | Anything that walks the AST to reason about reachability or scope ( e.g. "unreachable code", goto-definition through a conditional branch ) will mis-report. None of this is shipped today. |
|
|
||||||
|
|
||||||
For v0.2's stated scope ( syntax highlighting + a basic outline ), the
|
|
||||||
visible symptom is "auto-indent occasionally off by one column inside a
|
|
||||||
nested-if-with-out-of-line-else", which only bites a relatively
|
|
||||||
uncommon code pattern. Deferred until v0.3 LSP work, which needs a
|
|
||||||
correct AST.
|
|
||||||
|
|
||||||
## Fixing it later
|
|
||||||
|
|
||||||
The canonical approach is an external scanner. Sketch:
|
|
||||||
|
|
||||||
1. Add an `external` symbol like `_logical_line_end` that emits at every
|
|
||||||
`\n` *not* preceded by line-continuation context.
|
|
||||||
2. Make `shorthand_if_statement` take the form
|
|
||||||
`seq('if', '(', expr, ')', stmt, optional(seq(\
|
|
||||||
/* not _logical_line_end yet */ 'else', stmt)), $._logical_line_end)`.
|
|
||||||
3. Allow `shorthand_if_statement` consequence to be `repeat1(stmt)` so a
|
|
||||||
one-line `if (x) a() b()` puts both calls in the shorthand body.
|
|
||||||
|
|
||||||
The scanner needs to be written in C, registered via the `externals`
|
|
||||||
field, and built into `src/scanner.c`. `tree-sitter-python`'s scanner is
|
|
||||||
a good reference for the pattern.
|
|
||||||
|
|||||||
@@ -68,6 +68,12 @@ export default grammar({
|
|||||||
$._block_string_start,
|
$._block_string_start,
|
||||||
$._block_string_content,
|
$._block_string_content,
|
||||||
$._block_string_end,
|
$._block_string_end,
|
||||||
|
|
||||||
|
// PICO-8 line-significance: terminates the body of `if (cond) ...` /
|
||||||
|
// `while (cond) ...` shorthand. The scanner emits this only when the
|
||||||
|
// parser is at a state expecting it; everywhere else a newline falls
|
||||||
|
// through to /\s/ in extras and is skipped. See src/scanner.c.
|
||||||
|
$._line_end,
|
||||||
],
|
],
|
||||||
|
|
||||||
supertypes: ($) => [$.statement, $.expression, $.declaration, $.variable],
|
supertypes: ($) => [$.statement, $.expression, $.declaration, $.variable],
|
||||||
@@ -168,14 +174,20 @@ export default grammar({
|
|||||||
'end'
|
'end'
|
||||||
),
|
),
|
||||||
|
|
||||||
// PICO-8 single-line: while (cond) stmt
|
// PICO-8 single-line: while (cond) stmt {stmt}
|
||||||
|
// Body extends to end-of-line (or EOF). The $._line_end terminator
|
||||||
|
// is emitted by the external scanner when it sees \n/\r/EOF at a
|
||||||
|
// position where the parser expects line-end; until then, additional
|
||||||
|
// statements on the same line accumulate into the body.
|
||||||
shorthand_while_statement: ($) =>
|
shorthand_while_statement: ($) =>
|
||||||
seq(
|
seq(
|
||||||
'while',
|
'while',
|
||||||
'(',
|
'(',
|
||||||
field('condition', $.expression),
|
field('condition', $.expression),
|
||||||
')',
|
')',
|
||||||
field('body', $.statement)
|
field('body', $.statement),
|
||||||
|
repeat(field('body', $.statement)),
|
||||||
|
$._line_end
|
||||||
),
|
),
|
||||||
|
|
||||||
repeat_statement: ($) =>
|
repeat_statement: ($) =>
|
||||||
@@ -205,19 +217,28 @@ export default grammar({
|
|||||||
),
|
),
|
||||||
else_statement: ($) => seq('else', field('body', optional_block($))),
|
else_statement: ($) => seq('else', field('body', optional_block($))),
|
||||||
|
|
||||||
// PICO-8 single-line: if (cond) stmt [else stmt]
|
// PICO-8 single-line: if (cond) stmt {stmt} [else stmt {stmt}]
|
||||||
// prec.right resolves the dangling-else ambiguity in favor of greedy
|
// Both the consequence and the alternative extend to end-of-line.
|
||||||
// attach to the nearest preceding shorthand `if`, matching PICO-8
|
// The $._line_end terminator (emitted by the external scanner on
|
||||||
// semantics where shorthand if/else live on one line.
|
// \n/\r/EOF) prevents a later-line `else` from binding to a
|
||||||
|
// shorthand `if` on a previous line, matching PICO-8 semantics.
|
||||||
shorthand_if_statement: ($) =>
|
shorthand_if_statement: ($) =>
|
||||||
prec.right(seq(
|
seq(
|
||||||
'if',
|
'if',
|
||||||
'(',
|
'(',
|
||||||
field('condition', $.expression),
|
field('condition', $.expression),
|
||||||
')',
|
')',
|
||||||
field('consequence', $.statement),
|
field('consequence', $.statement),
|
||||||
optional(seq('else', field('alternative', $.statement)))
|
repeat(field('consequence', $.statement)),
|
||||||
)),
|
optional(
|
||||||
|
seq(
|
||||||
|
'else',
|
||||||
|
field('alternative', $.statement),
|
||||||
|
repeat(field('alternative', $.statement))
|
||||||
|
)
|
||||||
|
),
|
||||||
|
$._line_end
|
||||||
|
),
|
||||||
|
|
||||||
for_statement: ($) =>
|
for_statement: ($) =>
|
||||||
seq(
|
seq(
|
||||||
|
|||||||
@@ -538,6 +538,21 @@
|
|||||||
"type": "SYMBOL",
|
"type": "SYMBOL",
|
||||||
"name": "statement"
|
"name": "statement"
|
||||||
}
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "REPEAT",
|
||||||
|
"content": {
|
||||||
|
"type": "FIELD",
|
||||||
|
"name": "body",
|
||||||
|
"content": {
|
||||||
|
"type": "SYMBOL",
|
||||||
|
"name": "statement"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "SYMBOL",
|
||||||
|
"name": "_line_end"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -729,9 +744,6 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"shorthand_if_statement": {
|
"shorthand_if_statement": {
|
||||||
"type": "PREC_RIGHT",
|
|
||||||
"value": 0,
|
|
||||||
"content": {
|
|
||||||
"type": "SEQ",
|
"type": "SEQ",
|
||||||
"members": [
|
"members": [
|
||||||
{
|
{
|
||||||
@@ -762,6 +774,17 @@
|
|||||||
"name": "statement"
|
"name": "statement"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"type": "REPEAT",
|
||||||
|
"content": {
|
||||||
|
"type": "FIELD",
|
||||||
|
"name": "consequence",
|
||||||
|
"content": {
|
||||||
|
"type": "SYMBOL",
|
||||||
|
"name": "statement"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"type": "CHOICE",
|
"type": "CHOICE",
|
||||||
"members": [
|
"members": [
|
||||||
@@ -779,6 +802,17 @@
|
|||||||
"type": "SYMBOL",
|
"type": "SYMBOL",
|
||||||
"name": "statement"
|
"name": "statement"
|
||||||
}
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "REPEAT",
|
||||||
|
"content": {
|
||||||
|
"type": "FIELD",
|
||||||
|
"name": "alternative",
|
||||||
|
"content": {
|
||||||
|
"type": "SYMBOL",
|
||||||
|
"name": "statement"
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -786,9 +820,12 @@
|
|||||||
"type": "BLANK"
|
"type": "BLANK"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "SYMBOL",
|
||||||
|
"name": "_line_end"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
|
||||||
},
|
},
|
||||||
"for_statement": {
|
"for_statement": {
|
||||||
"type": "SEQ",
|
"type": "SEQ",
|
||||||
@@ -3696,6 +3733,10 @@
|
|||||||
{
|
{
|
||||||
"type": "SYMBOL",
|
"type": "SYMBOL",
|
||||||
"name": "_block_string_end"
|
"name": "_block_string_end"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "SYMBOL",
|
||||||
|
"name": "_line_end"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"inline": [],
|
"inline": [],
|
||||||
|
|||||||
@@ -1195,7 +1195,7 @@
|
|||||||
"named": true,
|
"named": true,
|
||||||
"fields": {
|
"fields": {
|
||||||
"alternative": {
|
"alternative": {
|
||||||
"multiple": false,
|
"multiple": true,
|
||||||
"required": false,
|
"required": false,
|
||||||
"types": [
|
"types": [
|
||||||
{
|
{
|
||||||
@@ -1215,7 +1215,7 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"consequence": {
|
"consequence": {
|
||||||
"multiple": false,
|
"multiple": true,
|
||||||
"required": true,
|
"required": true,
|
||||||
"types": [
|
"types": [
|
||||||
{
|
{
|
||||||
@@ -1231,7 +1231,7 @@
|
|||||||
"named": true,
|
"named": true,
|
||||||
"fields": {
|
"fields": {
|
||||||
"body": {
|
"body": {
|
||||||
"multiple": false,
|
"multiple": true,
|
||||||
"required": true,
|
"required": true,
|
||||||
"types": [
|
"types": [
|
||||||
{
|
{
|
||||||
|
|||||||
+16418
-15205
File diff suppressed because it is too large
Load Diff
@@ -11,6 +11,13 @@ enum TokenType {
|
|||||||
BLOCK_STRING_START,
|
BLOCK_STRING_START,
|
||||||
BLOCK_STRING_CONTENT,
|
BLOCK_STRING_CONTENT,
|
||||||
BLOCK_STRING_END,
|
BLOCK_STRING_END,
|
||||||
|
|
||||||
|
// PICO-8 line-significance: terminates the body of `if (cond) ...` /
|
||||||
|
// `while (cond) ...` shorthand. Emitted only when the parser expects it
|
||||||
|
// (see scan() — this token is gated on valid_symbols[LINE_END]) so that
|
||||||
|
// newlines outside of shorthand contexts continue to fall through to
|
||||||
|
// extras and be skipped silently.
|
||||||
|
LINE_END,
|
||||||
};
|
};
|
||||||
|
|
||||||
static inline void consume(TSLexer *lexer) { lexer->advance(lexer, false); }
|
static inline void consume(TSLexer *lexer) { lexer->advance(lexer, false); }
|
||||||
@@ -157,6 +164,34 @@ static bool scan_comment_content(Scanner *scanner, TSLexer *lexer) {
|
|||||||
bool tree_sitter_pico8_lua_external_scanner_scan(void *payload, TSLexer *lexer, const bool *valid_symbols) {
|
bool tree_sitter_pico8_lua_external_scanner_scan(void *payload, TSLexer *lexer, const bool *valid_symbols) {
|
||||||
Scanner *scanner = (Scanner *)payload;
|
Scanner *scanner = (Scanner *)payload;
|
||||||
|
|
||||||
|
// LINE_END must be checked before any whitespace-skipping path below,
|
||||||
|
// because the bytes that signal it (\n, \r, EOF) would otherwise be
|
||||||
|
// consumed as extras and be invisible to us. The check is also
|
||||||
|
// intentionally placed before the block_string / block_comment branches
|
||||||
|
// so that those branches' skip_whitespaces() can't eat our newline.
|
||||||
|
//
|
||||||
|
// The scanner emits LINE_END only when the parser's current state lists
|
||||||
|
// it as valid (i.e., we're at the body-or-terminator decision point of a
|
||||||
|
// shorthand_if_statement / shorthand_while_statement). Everywhere else,
|
||||||
|
// \n falls through to the /\s/ extras pattern and is skipped silently,
|
||||||
|
// so this branch is invisible to the rest of the grammar.
|
||||||
|
//
|
||||||
|
// LINE_END is intentionally zero-width: we do NOT consume the newline.
|
||||||
|
// That lets nested shorthands on the same line each see the same \n and
|
||||||
|
// close in turn (e.g. `if (a) if (b) c()\nd()` — the \n must terminate
|
||||||
|
// BOTH shorthands so that `d()` is a top-level statement). Once every
|
||||||
|
// enclosing shorthand has reduced, LINE_END is no longer in any parser
|
||||||
|
// state's valid_symbols, the scanner returns false, and the trailing
|
||||||
|
// \n is consumed by /\s/ in extras as usual. There is no infinite-loop
|
||||||
|
// risk: each LINE_END shift reduces one shorthand statement, so the
|
||||||
|
// emit chain is bounded by static nesting depth.
|
||||||
|
if (valid_symbols[LINE_END] &&
|
||||||
|
(lexer->lookahead == '\n' || lexer->lookahead == '\r' ||
|
||||||
|
lexer->lookahead == 0)) {
|
||||||
|
lexer->result_symbol = LINE_END;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
if (valid_symbols[BLOCK_STRING_END] && scan_block_end(scanner, lexer)) {
|
if (valid_symbols[BLOCK_STRING_END] && scan_block_end(scanner, lexer)) {
|
||||||
reset_state(scanner);
|
reset_state(scanner);
|
||||||
lexer->result_symbol = BLOCK_STRING_END;
|
lexer->result_symbol = BLOCK_STRING_END;
|
||||||
|
|||||||
@@ -0,0 +1,249 @@
|
|||||||
|
================================================================
|
||||||
|
shorthand if — single statement body, terminated by newline
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if (cond) honk()
|
||||||
|
toot()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments)))
|
||||||
|
(function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments)))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand if — single statement body, terminated by EOF
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if (cond) honk()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand if — multi-statement body collected into shorthand
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if (is_falling()) wheeee() splat()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand if — same-line else
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if (cond) honk() else toot()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))
|
||||||
|
alternative: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand if — same-line multi-statement else
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if (cond) honk() else toot() squawk()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))
|
||||||
|
alternative: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))
|
||||||
|
alternative: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand if nested in standard if — `else` on later line binds
|
||||||
|
to OUTER if, not the shorthand (PICO-8 line-significance)
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if is_noisy then
|
||||||
|
if (is_goose()) honk()
|
||||||
|
else
|
||||||
|
toot()
|
||||||
|
end
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (block
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
|
alternative: (else_statement
|
||||||
|
body: (block
|
||||||
|
(function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand if — line comment between body and newline still
|
||||||
|
terminates the shorthand at the newline (line comment is in
|
||||||
|
extras and is attached to the deepest enclosing node)
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if (cond) honk() -- inline
|
||||||
|
toot()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))
|
||||||
|
(comment
|
||||||
|
content: (comment_content)))
|
||||||
|
(function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments)))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand if inside a do-block — newline before `end` terminates
|
||||||
|
shorthand, then `end` closes the do-block
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
do
|
||||||
|
if (cond) honk()
|
||||||
|
end
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(do_statement
|
||||||
|
body: (block
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand while — multi-statement body, terminated by newline
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
while (running) tick() draw()
|
||||||
|
cleanup()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_while_statement
|
||||||
|
condition: (identifier)
|
||||||
|
body: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))
|
||||||
|
body: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments)))
|
||||||
|
(function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments)))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
shorthand while — single statement body, terminated by EOF
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
while (cond) tick()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_while_statement
|
||||||
|
condition: (identifier)
|
||||||
|
body: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
nested shorthand ifs on the same line — a single newline must
|
||||||
|
terminate BOTH shorthands (otherwise the outer one greedily
|
||||||
|
absorbs the next-line statement)
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if (a) if (b) c()
|
||||||
|
d()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
|
(function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments)))
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
standard if with parenthesized condition coexists with shorthand
|
||||||
|
— GLR resolves on the token after `)` (then vs statement)
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
if (cond) then a() end
|
||||||
|
if (cond) a()
|
||||||
|
|
||||||
|
----------------------------------------------------------------
|
||||||
|
|
||||||
|
(chunk
|
||||||
|
(if_statement
|
||||||
|
condition: (parenthesized_expression
|
||||||
|
(identifier))
|
||||||
|
consequence: (block
|
||||||
|
(function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
|
(shorthand_if_statement
|
||||||
|
condition: (identifier)
|
||||||
|
consequence: (function_call
|
||||||
|
name: (identifier)
|
||||||
|
arguments: (arguments))))
|
||||||
Reference in New Issue
Block a user