Curriculum Vitae

Collin Zoeller

Pre-doctoral RA, CMU Tepper · Incoming MSEC, Duke University · Fall 2026

Financial economist working at the intersection of machine learning, quantum computing, and empirical research. Using IRS administrative tax data to study firms, labor markets, and entrepreneurship.

Education

Carnegie Mellon University (non-degree)

Graduate Coursework

  • Intro to Machine Learning (10-601, Graduate)
  • Quantum Integer Programming and Machine Learning (Graduate)

Brigham Young University

Bachelor of Science — Economics (Emphasis in Econometrics)

Minors: Mathematics, Spanish  ·  GPA: 3.71  ·  Honors Program

Relevant Coursework

  • Advanced Econometrics, Advanced Macroeconomics, Advanced Linear Algebra
  • Machine Learning for Economists

Research Experience

Tepper School of Business, Carnegie Mellon University

Pre-doctoral Research Associate

  • Lead IRS administrative data project (~1.5B observations) involving high-dimensional econometrics
  • Build pipelines for data engineering: SQL data lake, file storage, project workflow (SQL, Linux, Stata)
  • Develop ANN/NLP methods for extracting structure from 1B+ open-text fields (Python)
  • Technical training and assistance to 3 other RAs in Linux, Python, Stata, and computational optimization

Harvard Business School

Computer Vision Programmer & Research Assistant for HBS Ph.D. Candidate

  • Constructed custom image segmentation + OCR + text-recognition for dissertation using 25k+ images
  • Extracted and structured data from 1M+ data points for downstream economic and ML analysis
  • Provided econometric design support and generalized OCR tools for new research applications

Brigham Young University Record Linking Lab

Research Assistant · Computer Vision Team Lead

  • Managed 6+ CV projects for FamilySearch and historical data partners (Python, AWS, CV/DL)
  • Delivered key insights and research to support genealogical data transcription using CV techniques

Working Papers & Research Projects

GAMA-IV: Quantum Annealing for Globally Optimal Instrument Selection

Duke MSEC Thesis · Working Paper

  • Introduces a quantum-classical hybrid methodology for instrument selection in IV estimation
  • Applies GAMA framework to the disaggregated Bartik setting of Goldsmith-Pinkham et al. (2020)
  • Casts GPS information criterion as an integer program solved to global optimality on D-Wave hardware

GAMA Quantum Optimizations for High-Dimensional Least Squares

  • Studies conditions under which the GAMA hybrid-quantum optimization solves the OLS problem
  • Examines theory and implements it in D-Wave simulated annealers

Artificial Intelligence in Economic Research: Applications, Promises, Pitfalls

  • Meta-analysis examining how applications of AI can harm and help researchers
  • Produced slides and paper for future presentation
  • Intended for general use and non-specialized researchers in the social sciences

Efficient Large-Scale Text Classification with ANN

  • Develop a scalable pipeline for clustering large text corpora using ANNOY + SLM
  • Demonstrates improved computational efficiency for feature engineering over standard clustering
  • Authored technical paper and an open-source implementation publicly available on GitHub

The Big Book of Estimators

  • Conspectus of estimators, assumptions, and use cases
  • Intended for non-specialized researchers as a non-textbook reference for model selection

Contributions RA Work

Evolution of the Relationship between the Gig Economy and Entrepreneurship: The Heterogeneous Effects of Labor Market Disruptions

Research Associate · Denes, Lagaras, and Tsoutsoura

  • Estimate DiD using IRS taxpayer microdata to study entrepreneurial responses to labor disruptions
  • Built ANN text clustering pipeline for 1+ billion deduction fields
  • Constructed a framework to parallelize Stata (Python/Stata; precursor to StataHelper)
  • Provided data engineering, replication, and publication support

Entrepreneurship and the Gig Economy: Evidence from U.S. Tax Returns

Journal of Financial Economics · Research Associate · Denes, Lagaras, and Tsoutsoura

  • Estimate DiD using IRS microdata to identify entrepreneurial transitions among platform workers
  • Cleaned and structured datasets, maintained empirical codebase for journal submission

First Come, First Served: The Timing of Government Support and Its Impact on Firms

Research Associate · Denes, Lagaras, and Tsoutsoura

  • Studies small businesses receiving paycheck protection loans during the COVID-19 pandemic
  • Estimate DiD effects of PPP timing on firm financial health using credit bureau data
  • Data cleaning, workflows, and publication preparation

Take-up of Flexible Labor

Research Associate · Denes, Lagaras, and Tsoutsoura

  • Studies employment of flexible labor through tax deduction claims
  • Trained text classifier RNN on 1 billion+ taxpayer-inputted deductions

Computational Tools & Research Software

'StataAgent' — Agentic AI for Natural Language Stata Interfacing (in progress)

Author

  • Manipulates and analyzes data within Stata from user-inputted questions
  • Useful for simple tasks (summary statistics, regressions) when correct syntax is unclear

'Explain' — Stata Package for LLM Interfacing

Author  ·  Open-source on GitHub

  • Interfaces Stata with LLMs for code debugging, explanation, and error analysis
  • Useful for learning Stata and for commenting and digesting code

'StataHelper' — Python Package for PyStata Parallelization

Author  ·  Open-source on PyPI & GitHub

  • Systematically parallelizes variable Stata processes (simulations, tests, etc.)
  • Wraps PyStata API and bridges syntax for simpler implementation

Presentations & Conferences

Family History Technology Workshop

City Directories Automated Indexing

  • Presented custom OCR/NLP pipeline extracting data from city directories for economic research

BYU President's Leadership Council

Growing Together: Growing the Tree

  • Presented research and methodological implementations of computer vision in genealogical research

Teaching Experience

Brigham Young University Economics Department

Econ 110 Teaching Assistant

  • Taught 500+ students in introductory macro and microeconomics
  • Hosted 15 lab hours each week and 1-hour lecture review for exam preparation
  • Crafted test questions and graded all assignments for the instructor

Technical Skills

Languages & Platforms
Python Stata R MATLAB SQL Linux LaTeX Git / GitHub
Cloud & Infrastructure
AWS Google SDK ArcGIS
Quantum Computing
Quantum ML Quantum DL Quantum Annealing Qiskit D-Wave