Dat Nguyen
Post‑doctoral Fellow in Computer Science, Harvard SEAS · Basis Research Institute
Short bio
I am a Postdoctoral Fellow at Harvard’s Programming Languages and Formal Methods groups and an incoming Postdoctoral Scientist at the Basis Research Institute.
My research focuses on program synthesis and probabilistic programming, with a track record in graph-based learning for code and documents. I completed my PhD at the University of Melbourne and previously worked at Cinnamon AI Lab on visually rich document information extraction.
At Harvard, I work on proof automation in Lean and causal systems for drug repurposing. At Basis, I contribute to MARA and R-ADA.
Research interests
- Program synthesis and probabilistic programming
- Graph-based learning for code and documents
- Neuro-symbolic systems with LLMs and SMT
- Reliable and explainable ML for software
Technical blogs
- Grammars that generalize – Combining a small DSL with a neural network for domain-invariant bird recognition
- Bayesian Synthesis – Bayesian synthesis of probabilistic programs for automatic data modeling
- All posts
Project demos
| | NeuroSymbolicDG PCFG over a spatial layout DSL as a domain-invariant classifier head for fine-grained bird recognition. code · blog · checkpoints |
| VRDSynth (ISSTA '24) Synthesizing programs for multilingual visually rich document information extraction. code · paper |
| Autumn.cpp An Autumn interpreter in C++ for MARA. code |
| ExoPredicator (ICLR '26) Learning abstract models of dynamic worlds for robot planning. paper · openreview |
| VirDA (TMLR '25) Reusing backbone for unsupervised domain adaptation with visual reprogramming. code · paper |
| GNNInfer (ICSE '22, arXiv '24) Inferring properties of graph neural networks. paper |
| FFL (ICSME '22) Fine-grained fault localization for student programs via syntactic and semantic reasoning. code · paper |
Positions
| Period | Role & Affiliation |
|---|---|
| 2025 – present | Post-doctoral Fellow, Harvard SEAS & Basis Research Institute |
| 2021 – 2024 | PhD, School of Computing & Information Systems, University of Melbourne (Melbourne Research Scholarship) |
| 2016 – 2021 | AI Research Engineer, Cinnamon AI Lab |
News
| Jan 26, 2026 | Co-authored paper “ExoPredicator” accepted at ICLR’26. Authors: Yichao Liang, Thanh Dat Nguyen, Cambridge Yang, Tianyang Li, Joshua B. Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, Kevin Ellis. |
|---|---|
| Jul 27, 2025 | AutumnBench featured on the Basis Research Institute blog. |
| Mar 5, 2025 | ArXiv: “A Systematic Survey on Debugging Techniques for Machine Learning Systems” (link). |
| Jan 15, 2025 | Paper “VirDA” published in TMLR’25. Authors: Duc-Duy Nguyen, Dat Nguyen. |
| Dec 2, 2024 | “Combining Induction and Transduction for Abstract Reasoning” won Best Paper at the ARC contest (arXiv). |
Selected Publications [Full List]
- ICLRExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot PlanningIn 2026
- arXivA Systematic Survey on Debugging Techniques for Machine Learning SystemsarXiv preprint arXiv:2503.03158 Mar 2025
- TMLRVirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual ReprogrammingTransactions on Machine Learning Research Mar 2025
- arXivInferring Properties of Graph Neural NetworksarXiv preprint arXiv:2401.03790 Jan 2024
- ISSTAVRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information ExtractionarXiv preprint arXiv:2407.06826 Jul 2024
- arXivCombining Induction and Transduction for Abstract ReasoningarXiv preprint arXiv:2411.02272 Nov 2024
- arXivAdversarial Attacks on Code Models with Discriminative Graph PatternsarXiv preprint arXiv:2308.11161 Aug 2023
- ICSMEFFL: Fine grained Fault Localization for Student Programs via Syntactic and Semantic ReasoningIn 2022 IEEE 38th International Conference on Software Maintenance and Evolution, Research Track Aug 2022
- ICSEToward the Analysis of Graph Neural NetworksIn 2022 IEEE/ACM 44th International Conference on Software Engineering: New Ideas and Emerging Results) Aug 2022
- ICPREnd-to-End Hierarchical Relation Extraction for Generic Form UnderstandingICPR 2020 Aug 2020
- MAPRPCA-based 3D Facial Reenactment From Single ImageIn Aug 2020
- BMVCEnd-to-End Information Extraction by Character-Level Embedding and Multi-Stage Attentional UNetBritish Machine Vision Conference (BMVC) Aug 2019
- Non-local DenseNet for plant CLEF 2019 contestCEUR-Workshop Aug 2019