Writing an Immutable WASM VM
And the reason why
One of the most fascinating experiments which intrigued me was when Percepta tried to run LLMs as a computer. This meant a language model itself becomes a mode of computation, as opposed to it’s usual role as a mathematical operation. This peeked my interest, and hence I’ve started this series where I’ll implement their experiment, while also exploring further interesting avenues I come upon. Currently my goal is to simply explore, with the hopes of extension in the near future!
Since there is already a vast amount of literature on WebAssembly and Virtual Machines, I would skip the explanations for those technologies. However, given how simple the code for writing the WASM VM in python was, there would be absolutely no need to have an in-depth knowledge of either (though one should always explore!), since the VM isn’t particularly designed to be fast (yet), and is extremely simple.
I.
Before I even begin designing a model which executes WASM instructions, I decided to implement a simple WASM VM which can:
Execute the 25 WASM instructions
Maintain immutability
The second requirement was a unique constraint, since the concept of mutability does not exist when performing mathematical operations during the forward pass of a deep learning model. We must express the execution of instructions without mutation of existing variables. This constraint forces us to design the VM as a loop which performs a pure transition function that does not mutate the input’s state and outputs a new one instead1.
The implementation is really straightforward: we must start with a simple state for the VM, which has a stack, a linear memory, a program counter and a trace.
While most of this is straightforward, a quick recap:
Stack is a special LIFO structure which WASM uses. It’s a way to abstract away the need to use physical registers, in order to be portable among various hardwares. Here, we simply have an array of integers.
Trace is simply a log of every instruction that the VM executes
Program Counter is the index which tells the VM which instruction to execute next. Most important usage here is for branching.
Linear Memory is where our VM would store and load variables from. The Heap is generally build on top of this structure, using memory allocation algorithms.
With this structure out of the way, our transition function would be something like:
def Transition (m: MachineState, i: List[Instruction]) → MachineState
And out execution function becomes a very simple loop:
With this, we have implemented a very crude, immutable WASM VM!
II.
To prove that our VM can indeed perform higher level complex operations, I asked an agent to write the instructions for implementing2:
Having verified that the WASM instructions are indeed Turing-Complete, we I could finally move on to the next step: designing a deep learning model which emulates this Virtual Machine within it’s weights, and which can execute the same code as above, but only on GPUs !
Code Repository here.
Thanks for reading this far! Subscribe to follow this series and other such experiments.
The reason immutability is not followed by real life systems is simple: memory explosion and execution speed. In case of forcing immutability, every single instruction must allocate memory and perform other operations, while keep track of previous ones, we will run into both memory and speed limitations. This would be unacceptable in low-level programming, as the instruction count is, on average, in the millions. This is mitigated in our case using GPU’s speed, more in later part of the series!
The low instruction level algorithms (especially for the Heap) were beyond my ability and the scope of this project, hence I resorted to using an agent. Everything else is hand-written.



