A Mundane Exotic Number System

Posted on August 29, 2023 by Brandon Wilson
Tags: ,

Numeral systems contain a surprising amount of depth. Our familiarity and monolingualism with the base 10 positional system runs so deep that we tend to think of numbers as being their base 10 representation. Doing this actually has merit, given the simple algorithms for arithmetic it offers. However, other representations often provide interesting insights that seem completely opaque otherwise.

Take base \(2\), for example. The correspondence between sets of things and base \(2\) numbers can be mined for all sorts of interesting gems, given their correspondence with subsets of some set. Also of note is that the rows of Pascal’s triangle sum to powers of two. If this is new to you, it’s tons of fun to puzzle out why and how it’s related to base \(2\).

A more exotic example is the factorial number system, wherein each position represents the multiple of a factorial—as opposed to the normal power of some base. Given that permutations are counted by the factorials, this numeral system provides a really direct enumeration for permutations.

Another system that pops up in surprisingly mundane places is the bijective numeration. Excel, for example, uses to label its columns with strings of letters, like A, AZ, GDJ, etc. At first blush, this appears to be a straightforward base \(26\) system, and indeed there’s a very simple APL expression to convert from column names to their index:

       (26⊥1+⎕A⍳⊢)¨'A' 'AZ' 'ZZ' 'GDJ'
 1 52 702 4846

which feels like it’s just doing base \(26\). However, the intuitive inverse expression fails:

       (⊂⌷⎕A⍨)∘(26⊥⍣¯1⊢)¨1 52 702 4846
  B  CA  BBA  HEK

There’s actually two issues with the above. We’re off by one in the index into ⎕A,

       (⊂⌷⎕A⍨)∘(¯1+26⊥⍣¯1⊢)¨1
  A

however, when ⊥⍣¯1 decodes into digits with 0, we get ¯1 indices:

       (⊂⌷⎕A⍨)∘(¯1+26⊥⍣¯1⊢)¨26
 INDEX ERROR
       (⊂⌷⎕A⍨)∘(¯1+26⊥⍣¯1⊢)¨26
                         ∧

In general, the standard representation in a base-\(k\) positional system has the same representation in the base-\(k\) bijection numeration as long as the former contains no zeroes. In the case when there are zeroes, the bijective system needs to (recursively) borrow from the left, for essentially the same reason that borrowing happens in the pen-and-paper subtraction algorithm.

      26{⎕A⌷⍨⊂¯1+⍺{
       n←⍵+(m×⍺)-0,⍨1↓m←0=⍵    ⍝ Borrow
       (∨\0≠n)/n               ⍝ Trim leading zeroes
      }⍣≡(⍺⊥⍣¯1⊢)⍵}¨1 52 702 4846
  A  AZ  ZZ  GDJ

While the above is a fully general solution, IMHO, its verbosity leaves a somewhat unsatisfying aftertaste. The borrow algorithm is ubiquitous, so there is likely some more parsimonious expression.

Paul Mansour shares a completely different approach in his article Excel Column Names, which leverages Excel’s maximum column count to enumerate all names and simply index into them. Our algorithm above is probably \(O(\log n)\), so if you were performing this conversion a whole lot, having a cached index likely performs vastly better.

Thanks for reading.