Demo -- a source video (left) has 99% of the pixels randomly removed (middle). Our NCA model (right) forms a reconstruction of the original based on the damaged video. Each pixel in the NCA model runs the same code, and only communicates to its immediately surrounding neighbor cell.

Neural cellular automata (NCA) are a fascinating class of computational models. I’m going to try convincing you of the following claims:

It’s worthwhile emulating biological/neuronal computation.
NCA are strong algorithmic solution for designing neuromorphic algorithms.
Demo: NCA doing active inference \(\to\) integration with LLMs.

What are Neural Cellular Automata? Like a conventional cellular automata (e.g., Conway’s famous Game of Life), neural cellular automata are composed of a regular grid of “cells”, each of which posesses a “state”. The “state” of a given cell is updated over time based on its own value and the values of its neighbors. As the name suggests, neural cellular automata adapt the traditionally discrete state space and update ruleset to a continuous state space with a neural network to parameterize the update rule.

Acknowledgements: Thank you to my former PI’s Erik Winfree and Milad Lankarany for teaching me and exploring with me the world of neuro/bio computation. Shoutout to Emre Alca for introducing me to Mordvintsev’s NCA paper in 2020, and to Salvador Buse for the many insightful conversations on these topics.

1: Why Emulate Biological Computation?

Emulating biological computation is one of the more convincing avenues for building artificial intelligence. The human brain, after all, is the one thing in the universe we all agree is intelligent.

“If only we could understand how the neurons in the brain work,” says the Neuro-AI researcher, “then we could build a computer that works in the same way and build true AI!”.

Emulating biological computation is also an inviting way to improve our understanding (i.e., modeling) the brain. For instance, by simulating neural networks and perturbing them, perhaps we can learn more about the nature of the brain. If the behavior of our simulation differs from what we observe in real brains, we get to examine the differences to improve our models.

1.1: Challenges in Emulating Biological Computation

While emulating biocomputation sounds great, you might have noticed that there is a lot of “perhaps” and “if only” in our premise.

The main problem is that emulating biocomputation is hard. It’s so hard that it’s hard to even know if you’re asking the right question, let alone getting the right answer.

Biology computes using cells, which are tiny bags of biochemicals. The hard part is that we don’t really know exactly how the biochemicals inside each cell change over time (e.g., transcription regulation), and we also don’t know exactly how the cells communicate as a function of their state (e.g., synaptic plasticity).

We may not know exactly how cells (e.g., neurons) work, but we are still able to build models based on these imperfect assumptions. The models can be useful (e.g., for understanding and preventing seizures, understanding how tumor cells move around, etc.), but they often fall short, especially when it comes to understanding how humans think and learn.

Due to the chaotic nature of cellular computation, small perturbations in model parameters can lead to large deviations in the final results. We may observe “rules of thumb”, but even a small deviation from reality (1 missing biochemical reaction, 1 missing regulatory loop) can make a big difference. Moreover, it is unclear if it is possible to simpify the behavior of cells to a physics-esque mathematical equation and retain delicate emergent properties like “intelligence”.

All we really know is the general appearance of biological computation: each cell has the same “source code” (DNA) but may have a radically different state or cell type. Cells generally interact locally or in a diffusive manner. And, assuming you believe in the data processing inequality, the DNA sequence expresses normative rule set each cell follows –perhaps influenced by randomness and interaction with the world and other cells– to “decide” how the its internal state will change, and in turn how its communication will change with other cells.

2: NCA to the Rescue!

So, simulating our knowledge of biology alone isn’t going to cut it. Guessing at the missing pieces in our biology knowledge is also hard. It’s extremely hard to predict the effects of even small changes in cellular/”complex” systems.

A self learning solution is extremely inviting. Deep learning is currently our most effective self-learning technology. In fact, it developed from the pursuit of computationally modelling neural networks. They successfully “guessed at the missing pieces” of our biology knowledge and developed backpropagation! Unfortunately, much of the cellular nature of the computation was lost to matrix computations. Operations like convolutions are more “cellular”, but they lack the dynamical character of true biocomputation.

Neural cellular automata leverage deep learning to train cellular (dynamical) programs to do an immense variety of tasks – any task with a differentiably expressible loss function. The best part is that advances in deep learning, including easy GPU integration and advances in Neural ODE’s can be leveraged for NCA. The key difference between NCA and convnets is that NCA use convolutions to express a dynamical update rule, whereas convnets use convolutions in a feed-forward manner.

Differentiably optimized neural cellular automata were popularized by Mordvintsev et al in their 2020 paper (Growing Neural Cellular Automata). NCA are composed of a regular grid of cells where each cell’s state is a \(d\)-dimensional vector of real numbers. For a 2-dimensional grid, the full state of the system may be viewed as a rank 3 tensor with two spatial dimensions and one state dimension. The update to each cell at each point in time is based on the current cell state and the states of the neighboring cells. Importantly, the update rule is a neural network. NCA are trained using back propagation through time (BPTT), similar to a recurrent neural network (RNN).

To optimize the ruleset, we need only to define a differentiable loss function that maps the state tensor at time \(T\) to a loss value (e.g., how much does it resemble a lizard?). Let your favourite deep learning library optimize the weights of the update rule neural network, and you have your very own fresh optimized NCA ruleset! Thanks to all the labor put into making deep learning libraries fast and efficient with modern GPU’s over the last decade, you can optimize over fairly lengthy time horizons very quickly.

So – at a high level – we now have the ability to train these previously unweildy bio-esque cellular programs. NCA retain the computation advantages of cellular automata (i.e., local information proccessing, high efficienty, easily realized in physical reality/physical circuits). The only limitation is your cleverness at designing a differentiable loss function.

3: Fun and Games (Active Inference/Scotopic Vision NCA)

Active inference and predictive coding are promising theories of intelligence that appear to be here to stay. Both theories revolve around the idea that simply predicting subsequent input – e.g., the next word, the next visual input, the next sound – gets you pretty far in terms of being a sensible/intelligent agent in the world. The cortex – the outer layer of the brain where the unique “secret sauce” of human intelligence appears to lie – appears to be a predictive coding machine (e.g., Jeff Hawkins, Karl Friston, Andy Clark, Rao & Ballard). Moreover, the same cortical cellular structure appears equally adept at learning to process vision, sound, touch, and motor control (sensorimotor masquerade). Our most intelligent AI systems (e.g., ChatGPT) are largely based on predicting the next word based on the previous context. The representations learned by systems performing predictive coding are highly versatile and semantically rich. Active inference offers a unifying perspective of how predicting sensory input can be linked (and is largely the same thing) as taking intelligent actions, but that’s a long and dark rabbit hole for another time.

The point is, learning to predict is a fundamental problem in neuroscience, AI, and intelligence in general. So let’s train a cellular automata to do it!

Scotopic vision: Let’s train the NCA to predict the value of the pixels of video over time. Since pixel values tend not to change too fast, let’s make the prediction problem more challenging by removing 99% of the pixels at random.

Scotopic NCA: Similar to Mordvintsev’s 2020 paper, we will allocate one “layer” of the NCA to represent the model’s prediction of the video pixel value at that point. Whenever a cell receives a pixel value (i.e., a pixel value that remains after 99% are removed), we will compare the cell’s estimate value to the incoming value to compute loss. We will use \(\ell_2\) loss for now. We will allocate another layer as the “input register”. When a pixel value is received, we will store it in that register so the cell “knows” what information was received. The rest of the cell state indices are learned/allocated by the cells themselves.

Results:

It’s extremely rare that ideas like this work on the first try, but it actually does a solid job reconstructing the original video based on 99\% sparsified video! Since the model doesn’t have that many parameters, it even works well with new videos after only training on one. Any military contractors looking to build better night vision, shoot me an email ;)

Discussion & Next Steps

NCA offer a tractable path to building bio-esque cellular computation systems that are actually performant. Unlike real cells, we explore the inner workings of NCA at our leisure. It’s also straight-forward to implement highly efficient NCA in silicon/physical circuitry. Thanks to their resemblance to convolutional neural networks (Cellular automata as convolutional neural networks by Giilpin) and neural ordinary differential equations, we are able to leverage a rich body of engineering and theoretical work to accelerate, improve, and interrogate these systems (i.e., deep learning libraries, GPU integration, optimization tricks, theory).

Next steps:

Test out running more update steps between each frame – currently information can only travel 1 pixel at a time.
Benchmark performance with different architectures, compare to a statistical baseline (e.g., caching the most recently observed pixel value, perhaps including a Gaussian prior on inter-pixel correlation).
Experiment with different pixel representations. Currently, each cell predicts exactly 1 pixel worth of information. Perhaps it would be better for each cell to represent an \(n\times n\) group of pixels.
Test non-grid architectures (e.g., Transformer cellular automata – see below).
Make neuromorphic night vision glasses(?) or silicon retina with Piotr Dudek – extending PixelRNN (arXiv:2304.05440).
Find the AGI cellular automata rule. Implement in silicon. Summon the thermodynamic godhead before Beff and his gang.

A simple transformer cellular automata particle field (Tweet)