Euan Ong
I've just finished my undergraduate degree in Computer Science at the University of Cambridge (where I ranked 1st / ~120 every year). I'm currently working on adversarial robustness at Anthropic for a year, before starting a Ph.D. in 'abstractions-first' mechanistic interpretability at MIT with Jacob Andreas and Armando Solar-Lezama.
My ambition is to develop powerful, yet safe and interpretable
abstract reasoners, whose
internal
state and behaviour remain transparent to the end
user.
To this end, I'm particularly interested in exploring how the mathematical toolkits we use to
understand and structure programs ‒ such as formal methods,
types and category theory ‒ can
inspire new ways
to both reverse-engineer existing neural networks, and build scalable neurosymbolic systems.
Email /
CV /
Google Scholar
/
Twitter /
LinkedIn /
Github
|
|
Research
So far, my research has broadly focused on studying the behaviour of neural networks in
vitro:
understanding both how they generalise when learning to perform abstract tasks, and
what
this tells us about the algorithms they've learned in order to do so.
Previously, I've probed the foundations of neural
algorithmic
reasoning, explored attacks on vision-language models, and poked language model
representations
with a stick.
|
Published work
|
|
Probing the Foundations of Neural Algorithmic Reasoning
Euan Ong
Technical Report, 2023; ICML Differentiable Almost Everything (Spotlight), 2024
abstract
/
full text
/
project page
I explored a fundamental claim of neural algorithmic reasoning, and found evidence to refute it through statistically robust ablations. Based on my observations, I developed a way to parallelise differentiable algorithms that preserves their efficiency and correctness guarantees while alleviating their performance bottlenecks. This work formed part of my Bachelor's thesis, which won the CS department's Best Dissertation Award.
|
|
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould,
Euan Ong,
George Ogden,
Arthur Conmy
ICLR, 2024;
NeurIPS ATTRIB (Oral), 2023
arXiv
/
reviews
/
project page
/
tweeprint
We discovered successor heads: attention heads present in a range of LLMs that increment tokens from ordinal sequences (e.g. numbers, months and days). We isolated a common numeric subspace within embedding space, that for any given token (e.g. 'February'), encodes the index of that token within its ordinal sequence (e.g. months). We also found that numeric token representations can be decomposed into interpretable features representing the value of the token mod 10, which can be used to edit the numeric value of the representation via vector arithmetic.
|
|
Image Hijacks: Adversarial Images can Control Generative Models at
Runtime
Luke Bailey*, Euan Ong*, Stuart Russell, Scott Emmons (* denotes equal contribution)
ICML, 2024
arXiv
/
project page + demo
/
tweeprint
We discovered that adversarial images can hijack the behaviour of vision-language models (VLMs) at
runtime. We developed a general method for crafting these image hijacks,
and trained image hijacks forcing VLMs to output arbitrary text, leak their context window and
comply with harmful instructions. We also derived an algorithm to train hijacks forcing
VLMs to behave as though they were given an arbitrary prompt, which we used to make them believe the Eiffel Tower is in Rome.
|
|
Learnable Commutative Monoids for Graph Neural Networks
Euan Ong, Petar Veličković
Learning on Graphs Conference, 2022
arXiv
/
reviews
/
project page
/
tweeprint
Using ideas from abstract algebra and functional programming, we built a new GNN aggregator that
beats the state of the art on complex aggregation problems (especially out-of-distribution), while
remaining efficient and parallelisable on large graphs.
|
Other projects
|
|
Personality Machine
Euan Ong*,
Jamie Chen*,
Kyra Zhou*,
Marcus Handley*,
Mingle Chen*,
Ori Vasilescu*,
Eleanor Drage
(* denotes equal contribution)
CST Group Project, 2022
In collaboration with the Centre for Gender Studies at Cambridge, we built a tool highlighting the questionable logic behind the use of AI-driven personality assessments often used in hiring. Our tool demonstrates how arbitrary changes in facial expression, clothing, lighting and background can give radically different personality readings, and was featured in the BBC and the Telegraph.
|
|
Dissecting Deep Learning for Systematic Generalisation
Euan Ong, Etaash
Katiyar, Kai-En Chong, Albert
Qiaochu Jiang
Informal research, 2021
We investigated the capabilities of transformers to systematically generalise when learning to
recognise formal languages (such as
Parity and 2-Dyck), empirically corroborating various theoretical claims about transformer generalisation.
Inspired by our observations, we
derived a parallel, stackless algorithm for recognising 2-Dyck that could (in principle) be
implemented by a transformer with a constant number of attention layers.
|
|
Object Detection in Thermal Imagery via Convolutional Neural
Networks
Euan Ong,
Niki Trigoni,
Pedro Porto Barque de Gusmão
Technical report, 2019
We trained a Faster R-CNN object detection network to identify landmarks (e.g. doors and windows) in
thermal images of indoor environments, with applications in the development of navigational aids for
search and rescue operations.
|
|