Resume

Academic and industry experience in program synthesis, ML, and software engineering.

General Information

Name Dat Nguyen (Marc)
Email (works) dat@basis.ai, datnguyen@seas.harvard.edu
Email (personal) datnt.hust59@gmail.com
Location Cambridge, MA

Summary

  • Postdoctoral Fellow at Harvard University and incoming Postdoctoral Scientist at the Basis Research Institute, focusing on LLM-based program synthesis.
  • PhD research on mining patterns in images, graphs, and code to analyze and improve graph neural networks and transformer-based language models.
  • Engineering experience in C++, Java, Python and libraries for CV, graphics, code analysis, and graph mining.

Work Experience

  • Jan 2025 – Present
    Joint Postdoc – LLM and Program Synthesis
    Harvard University & Basis Research Institute
    • Truth‑Maintaining Agent: LLM exploration with knowledge‑graph and causal rules.
    • CASP: causal abstractions for proof search in Lean.
    • Effectful: effect‑handler framework for LLM inference.
    • MARA: agents that build and maintain world‑model code.
    • R‑ADA: probabilistic inference for robot morphology sampling.
  • Feb 2017 – Feb 2021
    Senior Research Engineer
    Cinnamon AI Lab
    • Segmentation-based information extraction from visually rich documents (MSAU, MSAU-PAF).
    • Deployed projects using Docker, Flask, and FastAPI.
    • Solution architect for client projects (Sony, Toyota, Prudential).

Education

  • Sep 2021 – Nov 2024
    PhD in Computer Science
    University of Melbourne, Melbourne, Australia
    • Advisor: Dr. Bach Le, Dr. Patanamon Thongtanunam.
    • Projects: VRDSynth, GraphCodeAttack, GNNInfer, FFL.
    • Skills: Bash, C++, Java, Python, PyTorch, Scikit-learn, ONNX, OpenCV, Z3, Gurobi.

Selected Projects

  • VRDSynth: DSL and program synthesis for visually rich document extraction.
  • GraphCodeAttack: mining AST patterns to attack code LMs.
  • GNNInfer: GNN to FNN conversion and constraint inference.
  • FFL: fault localization on graph-based code representations.

Side Projects

  • TensorFlowC: interface to load trained TensorFlow models in C for embedded devices.
  • Origami: graph mining maintenance and Python bindings.
  • LLVM dataflow instrumentation: LLVM pass for dataflow extraction.
  • OpenGL tutorials: lighting, shaders, and rendering basics.
  • 3D morphable model face swapping app (C++, Dlib, Eigen, 3DDFA).
  • Algorithms in C/CPP for competitive programming.

Skills

  • Languages
    • C++
    • Java
    • Python
    • Bash
  • ML/DS
    • PyTorch
    • Scikit-learn
    • ONNX
  • CV/Graphics
    • OpenCV
    • OpenGL
  • Systems/Tools
    • LLVM
    • CMake
    • Docker
    • Flask
    • FastAPI
  • Formal/Optimization
    • Z3
    • Gurobi

Publications (Selected)

  • Selected publications are listed on the Publications page: /publications/.