Implementing Brainfuck

Posted on September 27, 2024 by Brandon Wilson

APL is a pretty great language for quickly iterating on program architecture. In particular, here is a quick comparison of a couple brainfuck implementations.

The brainfuck virtual machine is essentially a minimal Harvard architecture: we have an instruction space, a data tape, and indices into each. If you are unfamiliar with the (tiny!) specification, take a look at brainfuck’s official website.

Anyway, we start our implementation by initializing state:

run←{
 ⎕IO∘←0 ⋄ n t b∘←0 (1e6⍴0) ⍬ ⋄ prg←⍵              ⍝ Data index, tape, and input buffer
 _←2{p[⍵]←⍺[⍺⍸⍵]}⌿⊢∘⊂⌸¯1⌽+⍀1 ¯1 0['[]'⍳⍵⊣p∘←⍳≢⍵]  ⍝ Parent vector for brackets
 _←⍵ ⍺⍺⍣{⍵=¯1+≢prg}0                              ⍝ Run instructions from 0 to last
}

This operator runs single-instruction executors recursively over the full instruction input, which lets us plug and play different implementations of the execution unit.

Version 0

One obvious implementation mirrors the language specification directly:

v0←{
 ⍺[⍵]='>':⍵+1⊣n+←1
 ⍺[⍵]='<':⍵+1⊣n-←1
 ⍺[⍵]='+':⍵+1⊣t[n]+←1
 ⍺[⍵]='-':⍵+1⊣t[n]-←1
 ⍺[⍵]='.':⍵+1⊣⍞←⎕UCS t[n]
 ⍺[⍵]=',':⍵+1⊣t[n]←⍞
 (⍺[⍵]='[')∧t[n]≠0:⍵+1
 (⍺[⍵]='[')∧t[n]=0:⊃⌽⍸p=⍵
 (⍺[⍵]=']')∧t[n]=0:⍵+1
 (⍺[⍵]=']')∧t[n]≠0:p[⍵]
}

This essentially gives us a big switch statement that executes each instruction. Squinting our eyes a bit, this architecture is spiritially in the same family as an LR parser, since it fundamentally is limited to a small (1 token) parse window.

Since brainfuck has jumps with [ and ] instructions, our single-instruction executors need to also update the instruction pointer address. We’re managing this state by taking the address in ⍵ and returning the address of the next instruction to execute.

The repetition of ⍵+1 is a tad ugly, and it really seems like we should be able to merge the n+←1 and n-←1 cases. Ditto for some of the other structural repetitions. For example, a strategy like the following lets us merge < and > cases:

2>i←a[⍵]∊'<>': ⍵+1⊣n+←¯1 1[i]

Version 1

Very similar code works for + and - as well; however, the similarity between <> and +- cases also begs for further fusion. And what about bracket instructions and ⍵+1 repetition?

Staring at the problem for a bit, we may notice that fusing instruction cases sometimes requires updating multiple state variables in a single case. This suggests an approach that updates all state in a single “case”. Said another way, we can transpose how we think about the relationship between instructions and state. Instead of dispatching a specific state update per instruction, we can holistically update all state at once:

v1←{
 n+←¯1 1 0['<>'⍳⍺[⍵]]                            ⍝ Update data index
 t[n]+←¯1 1 0['-+'⍳⍺[⍵]]                         ⍝ Update data cell
 _←{⍞←⎕UCS⍵}⍣(⍺[⍵]='.')⊢t[n]                     ⍝ Handle output
 _←{t[n] b∘←1 1⊂{⎕UCS⍞}⍣(b≡⍬)⊢b}⍣(⍺[⍵]=',')⊢⍬    ⍝ Handle input
 1+(3 2⍴⍵(⊃⌽⍸p=⍵)(p[⍵])⍵ ⍵ ⍵)['[]'⍳⍺[⍵];t[n]=0]  ⍝ Update instruction index
}

Not only is the code shorter, but now it has us thinking in a data-first paradigm, compared to the more procedural flavor of v0. Indeed, each line of v1 corresponds to a single state variable, modulo input/output handling.

A Desideratum

Speaking of input/output, notice the change in our input handler. Brainfuck’s I/O instructions respectively read and write single characters at a time. However, ⍞ accepts input up until a newline, and brainfuck applications in the wild also typically process line-oriented input. By maintaining a buffer of one input line in b and only calling ⍞ when b is empty, we get much improved input UX.

Backporting this to v0 changes the line

⍺[⍵]=',':⍵+1⊣t[n]←⍞

into

⍺[⍵]=',':⍵+1⊣t[n] b∘←1 1⊂{⎕UCS⍞}⍣(b≡⍬)⊢b

Performance

How do these implementations stack up in terms of execution performance?

On the one hand, apart from the case-detection, v0 executions only enough code to execute the current instruction, while v1 is virtually constant time for every instruction. On the other hand, perhaps the guard statements in v0 overwhelm any such advantage.

As a simple first-order comparison, we use the Hello World example provided on Wikipedia:

    PRG ←'++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++.'
    PRG⍪←'.+++.>>.<-.<.+++.------.--------.>>+.>++.'
    cmpx 'v0 run PRG' 'v1 run PRG'
bf_v0 PRG → 1.2E¯1 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
bf_v1 PRG → 3.7E¯1 | +198% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Future Exploration

It’s also tempting to write a brainfuck compiler. In fact, there are obvious opportunities for instruction fusion and parallel execution. Apart from it’s parsimony, what stands out to me with the v1 architecture is that it doesn’t bake in the “single instruction at a time” constraint, making data-parallelism more manifestly accessible.

Despite the above being ostensibly about a toy implementation, the procedural vs. data-first architectural difference touches on some deep topics. More serious projects that lean heavily on these ideas include my dayaml parser, the Co-dfns compiler, and data-parallel proof verification6.

All this merits much more discussion, but that’s whole ’nother story.

Acknowledgements

Many thanks to Adám Brudzewsky for taking the time to read my code and suggest several spelling improvements in expressions. The depth of his knowledge, easy facility with APL, and sharp wit are always mutually impressive and a joy.