Easy Random Trees

Posted on February 27, 2026 by Brandon Wilson

Tags: math, apl, trees

Can you think of a way to efficiently generate a random plane tree?

Richard P. Stanley in his book Catalan Numbers has a really nifty combinatorial proof of why Catalan numbers have the formula

\[ C_n = {1 \over n+1}{2n \choose n} \]

The standard proof uses generating functions applied to an inductive definition of the Catalan numbers, which frankly does little to illumiate their connection with combinatorial objects, despite the fact that the appearance of a binomial coefficient gives a hint that it’s counting something, and the division suggests we might be looking at some kind of equivalence class.

Stanley’s proof is so direct and beautiful, I cannot help but share. However, let us instantiate the proof as a little program that generates a random tree with \(n\) nodes:

D←{n←⍵-1                 ⍝ The ⍵-1st Catalan number counts number of ⍵-node trees
 x←x[?⍨≢x←1⍪(n⍴1)⍪n⍴¯1]  ⍝ Random step vector of length 1+2×⍵
 d←0⍪¯1↓+⍀x              ⍝ Depth vector
 i←⊃⌽⍸(⌊⍀d)=d            ⍝ Righmost lowest node
 x⌽⍨←i                   ⍝ Strict ballot sequence
 d←¯1+(x=1)⌿+⍀x          ⍝ Corresponding tree node depths
}

At center stage here is the isomorphism between plane trees and strict ballot sequences, which is directly connected via the depth vector tree representation, which I’ll assume is already familiar to readers here. The image to have in mind is a depth-first search traversal pattern.

To construct a depth vector, we simply write down the current depth every time we first visit a node.

We can construct a related sequence by starting at the root and writing down a \(1\), then writing down a \(1\) every time we descend to a node, and finally, writing a \(¯1\) every time we ascend. Notice, in particular, that we both descend to and ascend from every node exactly once, except for the root. This guarantees that the partial sums are at least \(1\) and the total sum is exactly \(1\). Such a vector of \(1\)’s and \(¯1\)’s is called a strict ballot sequence, and given one, it’s straighforward enough to convert it into the depth vector of the corresponding tree:

      +⍀x←1 1 ¯1 1 1 1 ¯1 ¯1 ¯1 1 ¯1 1 1 ¯1 ¯1 1 1 ¯1 ¯1  ⍝ Strict ballot sequences
1 2 1 2 3 4 3 2 1 2 1 2 3 2 1 2 3 2 1                     ⍝  have positive partial sums

      ⊢d←¯1+(x=1)⌿+⍀x                                     ⍝ Depth vectors ignore the ¯1's
0 1 1 2 3 1 1 2 1 2

Now let’s think about random sequences of \(1\)’s and \(¯1\)’s. To have any hope of being a strict ballot sequence, the sum must be \(1\), so we know a priori that there must be one more \(1\) than \(¯1\)’s. We can parameterize this as seqeuences of length \(2n+1\), with \(n+1\) ones and \(n\) negative ones.

      ⊢x←x[?⍨≢x←1⍪(n⍴1)⍪n⍴¯1]  ⍝ Compare with D above
¯1 ¯1 ¯1 1 ¯1 ¯1 ¯1 1 ¯1 ¯1 1 1 1 1 ¯1 1 1 1 1 1 ¯1

Most likely, such a random vector will not be a strict ballot sequence, since partial sums can drop below \(1\). However, consider the righmost lowest partial sum, \(N\), at index \(i\):

      i←⍸msk←s=N←⌊⌿s←+⍀x
      s⍪⍉⍪msk
¯1 ¯2 ¯3 ¯2 ¯3 ¯4 ¯5 ¯4 ¯5 ¯6 ¯5 ¯4 ¯3 ¯2 ¯3 ¯2 ¯1 0 1 2 1
 0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0 0 0 0 0

Looking at the tail sequence starting at \(s_i\), we know it must end in \(1\). That means \(s_i\) is \(N+1\) below the tail. However, everything before \(s_i\) is at most \(N\) below the start. In other words, if we transplant the tail \(x_i ... x_{2n}\) to the start and recompute partial sums, they will remain positive. Said another way, rotating the ballot sequence by \(i\) produces a strict ballot sequence. Or more consicely,

 d←0⍪¯1↓+⍀x    ⍝ Depth vector
 i←⊃⌽⍸(⌊⍀d)=d  ⍝ Righmost lowest node
 x⌽⍨←i         ⍝ Strict ballot sequence

Combining with our conversion to a depth vector finishes the impelemntation of D.

Better yet, the Catalan number formula falls right out of the above by noting that there are exactly \(2n+1 \choose n\) strings of \(1\)’s and \(¯1\)’s and that since \(n\) and \(2n+1\) are coprime, no rotations produce equal sequences. This means that the sequence length, \(2n+1\), must divide \(2n+1 \choose n\). Thus the number of strict ballot sequences with length \(2n+1\) is

\[ {1 \over 2n+1}{2n+1 \choose n} \]

It is a small matter of algebra to check that this equals the value of \(C_n\) above. For \(n = 0\), we have a strict ballot sequence of length \(1\), which corresponds to a tree of just a since root node as per the isomorphism outlined above. This means that \(C_n\) must count plane trees with \(n+1\) nodes.

Stanley’s book is delightful and worth perusing. The above just uses one isomorphism, but the book contains a veritable cornucopia and is replete with nicely illustrative diagrams to boot. Highly recommended!