Euan Ong

I've just finished my undergraduate degree in Computer Science at the University of Cambridge (where I ranked 1st / ~120 every year). I'm currently working on adversarial robustness at Anthropic for a year, before starting a Ph.D. in 'abstractions-first' mechanistic interpretability at MIT with Jacob Andreas and Armando Solar-Lezama.

My ambition is to develop powerful, yet safe and interpretable abstract reasoners, whose internal state and behaviour remain transparent to the end user.

To this end, I'm particularly interested in exploring how the mathematical toolkits we use to understand and structure programs ‒ such as formal methods, types and category theory ‒ can inspire new ways to both reverse-engineer existing neural networks, and build scalable neurosymbolic systems.

Email  /  CV  /  Google Scholar  /  Twitter  /  LinkedIn  /  Github

profile photo

Research

So far, my research has broadly focused on studying the behaviour of neural networks in vitro: understanding both how they generalise when learning to perform abstract tasks, and what this tells us about the algorithms they've learned in order to do so.

Previously, I've probed the foundations of neural algorithmic reasoning, explored attacks on vision-language models, and poked language model representations with a stick.

Published work

nap Probing the Foundations of Neural Algorithmic Reasoning
Euan Ong
Technical Report, 2023; ICML Differentiable Almost Everything (Spotlight), 2024
abstract / full text / project page

I explored a fundamental claim of neural algorithmic reasoning, and found evidence to refute it through statistically robust ablations. Based on my observations, I developed a way to parallelise differentiable algorithms that preserves their efficiency and correctness guarantees while alleviating their performance bottlenecks. This work formed part of my Bachelor's thesis, which won the CS department's Best Dissertation Award.

successor Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould, Euan Ong, George Ogden, Arthur Conmy
ICLR, 2024; NeurIPS ATTRIB (Oral), 2023
arXiv / reviews / project page / tweeprint

We discovered successor heads: attention heads present in a range of LLMs that increment tokens from ordinal sequences (e.g. numbers, months and days). We isolated a common numeric subspace within embedding space, that for any given token (e.g. 'February'), encodes the index of that token within its ordinal sequence (e.g. months). We also found that numeric token representations can be decomposed into interpretable features representing the value of the token mod 10, which can be used to edit the numeric value of the representation via vector arithmetic.

hijacks Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey*, Euan Ong*, Stuart Russell, Scott Emmons (* denotes equal contribution)
ICML, 2024
arXiv / project page + demo / tweeprint

We discovered that adversarial images can hijack the behaviour of vision-language models (VLMs) at runtime. We developed a general method for crafting these image hijacks, and trained image hijacks forcing VLMs to output arbitrary text, leak their context window and comply with harmful instructions. We also derived an algorithm to train hijacks forcing VLMs to behave as though they were given an arbitrary prompt, which we used to make them believe the Eiffel Tower is in Rome.

monoids Learnable Commutative Monoids for Graph Neural Networks
Euan Ong, Petar Veličković
Learning on Graphs Conference, 2022
arXiv / reviews / project page / tweeprint

Using ideas from abstract algebra and functional programming, we built a new GNN aggregator that beats the state of the art on complex aggregation problems (especially out-of-distribution), while remaining efficient and parallelisable on large graphs.

Other projects

attention heads Personality Machine
Euan Ong*, Jamie Chen*, Kyra Zhou*, Marcus Handley*, Mingle Chen*, Ori Vasilescu*, Eleanor Drage (* denotes equal contribution)
CST Group Project, 2022

In collaboration with the Centre for Gender Studies at Cambridge, we built a tool highlighting the questionable logic behind the use of AI-driven personality assessments often used in hiring. Our tool demonstrates how arbitrary changes in facial expression, clothing, lighting and background can give radically different personality readings, and was featured in the BBC and the Telegraph.

attention heads Dissecting Deep Learning for Systematic Generalisation
Euan Ong, Etaash Katiyar, Kai-En Chong, Albert Qiaochu Jiang
Informal research, 2021

We investigated the capabilities of transformers to systematically generalise when learning to recognise formal languages (such as Parity and 2-Dyck), empirically corroborating various theoretical claims about transformer generalisation. Inspired by our observations, we derived a parallel, stackless algorithm for recognising 2-Dyck that could (in principle) be implemented by a transformer with a constant number of attention layers.

oxtd Object Detection in Thermal Imagery via Convolutional Neural Networks
Euan Ong, Niki Trigoni, Pedro Porto Barque de Gusmão
Technical report, 2019

We trained a Faster R-CNN object detection network to identify landmarks (e.g. doors and windows) in thermal images of indoor environments, with applications in the development of navigational aids for search and rescue operations.


Design inspired by Jon Barron's site.