Implementing Brainfuck
APL is a pretty great language for quickly iterating on program architecture. In particular, here is a quick comparison of a couple brainfuck implementations.
The brainfuck virtual machine is essentially a minimal Harvard architecture: we have an instruction space, a data tape, and indices into each. If you are unfamiliar with the (tiny!) specification, take a look at brainfuck’s official website.
Anyway, we start our implementation by initializing state:
run←{
⎕IO∘←0 ⋄ n t b∘←0 (1e6⍴0) ⍬ ⋄ prg←⍵ ⍝ Data index, tape, and input buffer
_←2{p[⍵]←⍺[⍺⍸⍵]}⌿⊢∘⊂⌸¯1⌽+⍀1 ¯1 0['[]'⍳⍵⊣p∘←⍳≢⍵] ⍝ Parent vector for brackets
_←⍵ ⍺⍺⍣{⍵=¯1+≢prg}0 ⍝ Run instructions from 0 to last
}
This operator runs single-instruction executors recursively over the full instruction input, which lets us plug and play different implementations of the execution unit.
Version 0
One obvious implementation mirrors the language specification directly:
v0←{
⍺[⍵]='>':⍵+1⊣n+←1
⍺[⍵]='<':⍵+1⊣n-←1
⍺[⍵]='+':⍵+1⊣t[n]+←1
⍺[⍵]='-':⍵+1⊣t[n]-←1
⍺[⍵]='.':⍵+1⊣⍞←⎕UCS t[n]
⍺[⍵]=',':⍵+1⊣t[n]←⍞
(⍺[⍵]='[')∧t[n]≠0:⍵+1
(⍺[⍵]='[')∧t[n]=0:⊃⌽⍸p=⍵
(⍺[⍵]=']')∧t[n]=0:⍵+1
(⍺[⍵]=']')∧t[n]≠0:p[⍵]
}
This essentially gives us a big switch statement that executes each instruction. Squinting our eyes a bit, this architecture is spiritially in the same family as an LR parser, since it fundamentally is limited to a small (1 token) parse window.
Since brainfuck has jumps with [
and ]
instructions, our single-instruction
executors need to also update the instruction pointer address. We’re managing
this state by taking the address in ⍵
and returning the address of the next
instruction to execute.
The repetition of ⍵+1
is a tad ugly, and it really seems like we should be
able to merge the n+←1
and n-←1
cases. Ditto for some of the other
structural repetitions. For example, a strategy like the following lets us
merge <
and >
cases:
2>i←a[⍵]∊'<>': ⍵+1⊣n+←¯1 1[i]
Version 1
Very similar code works for +
and -
as well; however, the similarity
between <>
and +-
cases also begs for further fusion. And what about
bracket instructions and ⍵+1
repetition?
Staring at the problem for a bit, we may notice that fusing instruction cases sometimes requires updating multiple state variables in a single case. This suggests an approach that updates all state in a single “case”. Said another way, we can transpose how we think about the relationship between instructions and state. Instead of dispatching a specific state update per instruction, we can holistically update all state at once:
v1←{
n+←¯1 1 0['<>'⍳⍺[⍵]] ⍝ Update data index
t[n]+←¯1 1 0['-+'⍳⍺[⍵]] ⍝ Update data cell
_←{⍞←⎕UCS⍵}⍣(⍺[⍵]='.')⊢t[n] ⍝ Handle output
_←{t[n] b∘←1 1⊂{⎕UCS⍞}⍣(b≡⍬)⊢b}⍣(⍺[⍵]=',')⊢⍬ ⍝ Handle input
1+(3 2⍴⍵(⊃⌽⍸p=⍵)(p[⍵])⍵ ⍵ ⍵)['[]'⍳⍺[⍵];t[n]=0] ⍝ Update instruction index
}
Not only is the code shorter, but now it has us thinking in a data-first
paradigm, compared to the more procedural flavor of v0
. Indeed, each line of
v1
corresponds to a single state variable, modulo input/output handling.
A Desideratum
Speaking of input/output, notice the change in our input handler. Brainfuck’s
I/O instructions respectively read and write single characters at a time.
However, ⍞
accepts input up until a newline, and brainfuck applications in
the wild also typically process line-oriented input. By maintaining a buffer of
one input line in b
and only calling ⍞
when b
is empty, we get much
improved input UX.
Backporting this to v0
changes the line
⍺[⍵]=',':⍵+1⊣t[n]←⍞
into
⍺[⍵]=',':⍵+1⊣t[n] b∘←1 1⊂{⎕UCS⍞}⍣(b≡⍬)⊢b
Performance
How do these implementations stack up in terms of execution performance?
On the one hand, apart from the case-detection, v0
executions only enough
code to execute the current instruction, while v1
is virtually constant time
for every instruction. On the other hand, perhaps the guard statements in v0
overwhelm any such advantage.
As a simple first-order comparison, we use the Hello World example provided on Wikipedia:
PRG ←'++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++.'
PRG⍪←'.+++.>>.<-.<.+++.------.--------.>>+.>++.'
cmpx 'v0 run PRG' 'v1 run PRG'
bf_v0 PRG → 1.2E¯1 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
bf_v1 PRG → 3.7E¯1 | +198% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
Future Exploration
It’s also tempting to write a brainfuck compiler. In fact, there are obvious
opportunities for instruction fusion and parallel execution. Apart from it’s
parsimony, what stands out to me with the v1
architecture is that it doesn’t
bake in the “single instruction at a time” constraint, making data-parallelism
more manifestly accessible.
Despite the above being ostensibly about a toy implementation, the procedural vs. data-first architectural difference touches on some deep topics. More serious projects that lean heavily on these ideas include my dayaml parser, the Co-dfns compiler, and data-parallel proof verification6.
All this merits much more discussion, but that’s whole ’nother story.
Acknowledgements
Many thanks to Adám Brudzewsky for taking the time to read my code and suggest several spelling improvements in expressions. The depth of his knowledge, easy facility with APL, and sharp wit are always mutually impressive and a joy.