Carbide: Partial Parsing & Supersets

This weekend I finally released Carbon 1.0.0! It was technically about three weeks past the last deadline and two months or so past the initial release goal, but it’s here now. I took that extra time writing a brand new partial grammar parser for Carbide and fixing up a ton of math library mishaps.

That partial grammar parser is what I’m here to talk about in this post.

Carbide has requirements undoubtedly familiar to those aware of the numerous JavaScript abstractions out there, notably those where regular JS code stays valid code, like TypeScript. They add features to the language without changing any existing semantics, which makes adopting them a lot easier.

There are two popular options for implementing a language extension like this:

  • Write a full lexer, transform the AST, generate target language code
  • Hack together some regular expressions

The first option is complex and requires a lot of code, and the latter option is error prone. So what do we do? A hybrid!

Partial parsing is a term I use to describe how I built the Carbide parser. It lets me have the ease of regular pattern matching with the power of a full-on parser.

Let’s take a look at one feature in Carbide and how it’s implemented in Carbon 1.0.0, Direct Arrow Notation.

What is DAN?

Direct Arrow Notation allows vector index lookups by name, swizzling, and fast unpacking of arrays with a single operator. Here’s a bit of basic sample usage:

1
2
3
4
5
6
7
8
9
10
11
local vec3 = {1, 2, 3}

print(vec3->x, vec3->y, vec3->z) -- 1  2  3
print(vec3->xyz) -- 1  2  3

print(vec3->rgb) -- 1  2  3
print(vec3->stp) -- 1  2  3
print(vec3->uv)  -- 1  2

print(vec3->yyy) -- 2  2  2
print(vec3->zyx) -- 3  2  1

As you can see, it takes a table and expands it into a tuple of a fixed length. It supports compound lookups, swizzling, and various aliases like vectors, colors, and texture coordinates.

DAN also supports a few extra, less-obvious uses:

1
2
3
4
5
6
7
8
9
local scale = 2
local a = {1, 2}
local b = {3, 4}

-- Scalar, back to vector
print({a->xy * scale})

-- Distribution
print(a->xy * b->xy) -- 3  4  6  8

This is to say that operators before and after compound DAN lookups are defined! The last code block compiles to:

1
2
3
4
5
6
7
8
9
local scale = 2
local a = {1, 2}
local b = {3, 4}

-- Scalar, back to vector
print({a[1] * scale, a[2] * scale})

-- Distribution
print(a[1] * b[1], a[1] * b[2], a[2] * b[1],  a[2] * b[2])

In the first example, we were able to scale a vector and pack it back into a vector very compactly. In the second example, we got compile-time distribution done for us!

A more practical example might use the LÖVE API to map objects down to graphics methods a little better:

1
2
3
4
5
6
7
local rect = {
	scale = 2,
	pos = {10, 10},
	size = {50, 50}
}

love.graphics.draw("fill", rect.pos->xy, rect.size->xy * scale)

Unlike Lua’s vanilla ‘unpack’ function, Carbide’s DAN doesn’t lose arguments when used in the middle of an argument list.

So how would you implement a feature like this in a stable manner without a full Lua parser alongside your extended syntax? Partial parsing!

Implementation

Carbide is powered by Lua patterns and an internal function called ‘matchexpr’ that matches a single Lua expression. That function only spans 100 lines!

All the code specific to DAN has to do is find a DAN lookup (using the pattern “%->([%w_]+)” in 1.0.0) and then match the previous and next expressions. The expressions are then used as wrappers for every lookup individually.

There are downsides to parsing like this!

  • Strings have to be stripped before any parsing occurs or they’ll be butchered
  • Comments aren’t preserved (at least in Carbon’s implementation)
  • Order of feature application matters!
  • Features must be aware of other features’ syntax

Someday I may switch to a full parser like other Lua abstractions, but for now, partial parsing suites Carbide’s needs very well.