Euan Ong

I've just finished my undergraduate degree in Computer Science at the University of Cambridge (where I ranked 1st / ~120 every year). I'm currently working on adversarial robustness at Anthropic for a year, before starting a Ph.D. in 'abstractions-first' mechanistic interpretability at MIT with Jacob Andreas and Armando Solar-Lezama.

My ambition is to develop powerful, yet safe and interpretable abstract reasoners, whose internal state and behaviour remain transparent to the end user.

To this end, I'm particularly interested in exploring how the mathematical toolkits we use to understand and structure programs ‒ such as formal methods, types and category theory ‒ can inspire new ways to both reverse-engineer existing neural networks, and build scalable neurosymbolic systems.

Email / CV / Google Scholar / Twitter / LinkedIn / Github

Research

So far, my research has broadly focused on studying the behaviour of neural networks in vitro: understanding both how they generalise when learning to perform abstract tasks, and what this tells us about the algorithms they've learned in order to do so.

Previously, I've probed the foundations of neural algorithmic reasoning, explored attacks on vision-language models, and poked language model representations with a stick.

Published work
	Probing the Foundations of Neural Algorithmic Reasoning Euan Ong Technical Report, 2023; ICML Differentiable Almost Everything (Spotlight), 2024 abstract / full text / project page I explored a fundamental claim of neural algorithmic reasoning, and found evidence to refute it through statistically robust ablations. Based on my observations, I developed a way to parallelise differentiable algorithms that preserves their efficiency and correctness guarantees while alleviating their performance bottlenecks. This work formed part of my Bachelor's thesis, which won the CS department's Best Dissertation Award.
	Successor Heads: Recurring, Interpretable Attention Heads In The Wild Rhys Gould, Euan Ong, George Ogden, Arthur Conmy ICLR, 2024; NeurIPS ATTRIB (Oral), 2023 arXiv / reviews / project page / tweeprint We discovered successor heads: attention heads present in a range of LLMs that increment tokens from ordinal sequences (e.g. numbers, months and days). We isolated a common numeric subspace within embedding space, that for any given token (e.g. 'February'), encodes the index of that token within its ordinal sequence (e.g. months). We also found that numeric token representations can be decomposed into interpretable features representing the value of the token mod 10, which can be used to edit the numeric value of the representation via vector arithmetic.
	Image Hijacks: Adversarial Images can Control Generative Models at Runtime Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons (* denotes equal contribution) ICML, 2024 arXiv / project page + demo / tweeprint We discovered that adversarial images can hijack the behaviour of vision-language models (VLMs) at runtime. We developed a general method for crafting these image hijacks, and trained image hijacks forcing VLMs to output arbitrary text, leak their context window and comply with harmful instructions. We also derived an algorithm to train hijacks forcing VLMs to behave as though they were given an arbitrary prompt, which we used to make them believe the Eiffel Tower is in Rome.
	Learnable Commutative Monoids for Graph Neural Networks Euan Ong, Petar Veličković Learning on Graphs Conference, 2022 arXiv / reviews / project page / tweeprint Using ideas from abstract algebra and functional programming, we built a new GNN aggregator that beats the state of the art on complex aggregation problems (especially out-of-distribution), while remaining efficient and parallelisable on large graphs.
Other projects
	Personality Machine Euan Ong, Jamie Chen, Kyra Zhou, Marcus Handley, Mingle Chen, Ori Vasilescu, Eleanor Drage (* denotes equal contribution) CST Group Project, 2022 In collaboration with the Centre for Gender Studies at Cambridge, we built a tool highlighting the questionable logic behind the use of AI-driven personality assessments often used in hiring. Our tool demonstrates how arbitrary changes in facial expression, clothing, lighting and background can give radically different personality readings, and was featured in the BBC and the Telegraph.
	Dissecting Deep Learning for Systematic Generalisation Euan Ong, Etaash Katiyar, Kai-En Chong, Albert Qiaochu Jiang Informal research, 2021 We investigated the capabilities of transformers to systematically generalise when learning to recognise formal languages (such as Parity and 2-Dyck), empirically corroborating various theoretical claims about transformer generalisation. Inspired by our observations, we derived a parallel, stackless algorithm for recognising 2-Dyck that could (in principle) be implemented by a transformer with a constant number of attention layers.
	Object Detection in Thermal Imagery via Convolutional Neural Networks Euan Ong, Niki Trigoni, Pedro Porto Barque de Gusmão Technical report, 2019 We trained a Faster R-CNN object detection network to identify landmarks (e.g. doors and windows) in thermal images of indoor environments, with applications in the development of navigational aids for search and rescue operations.

Design inspired by Jon Barron's site.

Research

Published work

Other projects