Dat Nguyen

Short bio

I am a Joint Postdoctoral Fellow at Harvard’s Programming Languages and Formal Methods groups and the Basis Research Institute.

My research focuses on program synthesis and probabilistic programming, with a track record in graph-based learning for code and documents. I completed my PhD at the University of Melbourne and previously worked at Cinnamon AI Lab on visually rich document information extraction.

At Harvard, I work on proof automation in Lean and causal systems for drug repurposing. At Basis, I contribute to MARA and R-ADA.

Research interests

Program synthesis and probabilistic programming
Graph-based learning for code and documents
Neuro-symbolic systems with LLMs and SMT
Reliable and explainable ML for software

News

May 24, 2026
Awarded Gold Reviewer at ICML’26. Thanks to the area chairs and to the authors whose submissions were a pleasure to read.
May 22, 2026
Our work, WorldTest, is accepted at ICML! WorldTest formulates world-model learning evaluation with environment-level queries that pose general questions about the environments, and we instantiated it with AutumnBench. See you in Korea! arXiv, project.
May 7, 2026
Preprint, follow-up to NeuroSymbolicDG. We re-formulated image classification as spatial predicate induction over learned image primitives! arXiv.
Jan 26, 2026
ExoPredicator learns symbolic state and causal processes (agent actions plus exogenous mechanisms) via variational Bayesian inference with LLM proposals. Accepted at ICLR’26. arXiv, openreview.
Jul 27, 2025
AutumnBench featured on the Basis Research Institute blog.

Technical blogs

Grammars that generalize. Combining a small DSL with a neural network for domain-invariant bird recognition.
Bayesian Synthesis. Probabilistic programs for automatic data modeling.
All posts.

Project demos

	NeuroSymbolicDG Domain-invariant classifier head for fine-grained bird recognition, via a PCFG over spatial layouts. code · paper · blog · checkpoints
	VRDSynth (ISSTA '24) Program synthesis for multilingual document information extraction. code · paper
	Autumn.cpp (ICML '26) Autumn interpreter in C++. Powers MARA and AutumnBench. Try it live ← code · AutumnBench paper · blog · playground ↓ to spin droplet, click cloud & sun to interact
	ExoPredicator (ICLR '26) Learning abstract models of dynamic worlds for robot planning. paper · openreview
	VirDA (TMLR '25) Unsupervised domain adaptation by reusing the backbone with visual reprogramming. code · paper
	GNNInfer (ICSE '22, arXiv '24) Inferring properties of graph neural networks. paper
	FFL (ICSME '22) Fine-grained fault localization for student programs. code · paper

Positions

2025 to present

Joint Post-doctoral Fellow

Harvard SEAS & Basis Research Institute

2021 to 2024

PhD, School of Computing & Information Systems

University of Melbourne · Melbourne Research Scholarship

2016 to 2021

AI Research Engineer

Cinnamon AI Lab

Selected Publications

arXiv '26Domain Generalization through Spatial Relation Induction over Visual Primitives. Dat Nguyen, Duc-Duy Nguyen.
ICML '26Benchmarking World-Model Learning with Environment-Level Queries. Archana Warrier, Dat Nguyen, Michelangelo Naim, Moksh Jain, Yichao Liang, Karen Schroeder, Cambridge Yang, Joshua B. Tenenbaum, Sebastian Vollmer, Kevin Ellis, Zenna Tavares.
ICLR '26ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning. Yichao Liang, Dat Nguyen, Cambridge Yang, Tianyang Li, Joshua B. Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, Kevin Ellis.
arXiv '25A Systematic Survey on Debugging Techniques for Machine Learning Systems. Dat Nguyen, Haoye Tian, Bach Le, Patanamon Thongtanunam, Shane McIntosh.
TMLR '25VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming. Duc-Duy Nguyen, Dat Nguyen.
arXiv '24Inferring Properties of Graph Neural Networks. Dat Nguyen, Hieu M. Vu, Cong-Thanh Le, Bach Le, David Lo, ThanhVu Nguyen, Corina Pasareanu.
ISSTA '24VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction. Dat Nguyen, Tung Do-Viet, Hung Nguyen-Duy, Tuan-Hai Luu, Hung Le, Bach Le, Patanamon Thongtanunam.
arXiv '24Combining Induction and Transduction for Abstract Reasoning. Wen-Ding Li, Keya Hu, Carter Larsen, Yuqing Wu, Simon Alford, Caleb Woo, Spencer M. Dunn, Hao Tang, Michelangelo Naim, Dat Nguyen, Wei-Long Zheng, Zenna Tavares, Yewen Pu, Kevin Ellis.
arXiv '23Adversarial Attacks on Code Models with Discriminative Graph Patterns. Dat Nguyen, Yang Zhou, Xuan Bach D. Le, Patanamon Thongtanunam, David Lo.
ICSME '22FFL: Fine grained Fault Localization for Student Programs via Syntactic and Semantic Reasoning. Dat Nguyen, Thanh Le-Cong, Duc-Minh Luong, Van-Hai Duong, Xuan Bach Le Dinh, David Lo, Thang Huynh-Quyet.
ICSE '22Toward the Analysis of Graph Neural Networks. Dat Nguyen, Thanh Le-Cong*, ThanhVu H. Nguyen, Xuan-Bach D. Le, Quyet-Thang Huynh.
ICPR '20End-to-End Hierarchical Relation Extraction for Generic Form Understanding. Tuan-Anh Nguyen Dang, Duc Thanh Hoang, Quang Bach Tran, Chih-wei Pan, Dat Nguyen.
MAPR '20PCA-based 3D Facial Reenactment From Single Image. Dat Nguyen, Tuan-Anh Nguyen Dang, Viet Sang Dinh.
BMVC '19End-to-End Information Extraction by Character-Level Embedding and Multi-Stage Attentional UNet. Tuan Anh Nguyen Dang, Dat Nguyen.