Hi đź‘‹
I’m a master’s student in Computer Science at Stanford, advised by the amazing Chris Ré in the Hazy Research lab and supported by the Kwanjeong Scholarship. I am also a part-time Research Scientist at Cursor.
My research focuses on ML systems. I design hardware-aware abstractions and methods to accelerate training and inference of large language models. Some of my work includes megakernels, efficient MXFP8 training, GPU networking, and ThunderKittens.
Prior to Stanford and Cursor, I co-founded Blux, an AI startup specializing in empowering Korean e-commerce through AI-driven personalization. Blux raised $3M+ and is personalizing over 10 million Korean users’ online journeys monthly.
In addition, I am an electric guitarist and composer. I have released two albums and performed as lead guitarist and producer for multiple rock bands since 2008.
Experience
Research Scientist @ Cursor
June 2025 - Present
Building in-house models and optimizing AI kernels for large-scale training and inference. Check out my recent blog post on MXFP8 MoE kernels.
Research Assistant @ Stanford AI Lab
December 2024 - Present
Advised by Prof. Chris RĂ© at Hazy Research. Working on ThunderKittens, low-latency megakernel, high-throughput megakernel and GPU networking.
Co-Founder and CTO @ Blux
July 2021 - August 2023
Blux provides real-time recommender systems for e-commerce. Blux raised $3M+ and is personalizing over 10 million Korean users’ online journeys monthly.
I was the only technical co-founder until we acquired our first paying customer. I built everything (server, infrastructure, ML) from scratch. Over time, I recruited and led a team of 15+ engineers.
Research Assistant @ Seoul National University
June 2020 - September 2020
Worked with Prof. Jae W. Lee at the Architecture and Code Optimization Lab to design and implement a novel embedding clustering algorithm in C++, reducing main memory access by up to 44% in commercial deep learning recommendation models (DLRMs). This research resulted in a paper accepted at ASPLOS 2021.
Research Assistant @ Seoul National University
December 2019 - February 2020
Worked with Prof. Kyogu Lee at the Music and Audio Research Group on audio processing architecture using CNN, cGAN, super-resolution, and the Griffin-Lim algorithm for commercial singing voice synthesis. Also performed millisecond-precision data labeling on audio and MIDI data using Logic Pro and Python for 30+ K-pop songs.
Sergeant, Combat Medic @ US Army
November 2017 - August 2019
Served as a Combat Medic (68W) in a U.S. Army cavalry unit for 21 months through the Korean Augmentation to the United States Army (KATUSA) program, fulfilling South Korea’s mandatory military service requirement.
Projects
Megakernels
Runs FlashMLA, Llama 3 1B, and Llama 3 70B in a single fused megakernel. 500+ stars on GitHub.
ThunderKittens
Helps you write speedy GPU kernels for AI. 2.7k+ stars on GitHub and adopted by Cursor, Together AI, Jump Trading, Modular, TileLang, and Nvidia CuTe 4.0.
MERCI
Fast embedding reduction algorithm for deep learning recommendation models (DLRMs) and other systems with very large embedding tables.
ELF32 Dynamic Linker for Raspberry Pi
ELF dynamic linker on bare metal, allowing you to port shared libraries.
Taught as a lab at Stanford University: CS 240LX: ELF and Dynamic Linker.
Co-Chuck
Collaborative online environment that allows you to “code” your music.
SampyoNet
Deep learning model, inference engine, and mobile app combined for gravel quality assessment. Provided to Sampyo for production concrete manufacturing.
LLVM Compiler Optimization
Achieved 2nd place among 13 teams in an LLVM optimization competition at Seoul National University.
Writing
We Bought the Whole GPU, So We’re Damn Well Going to Use the Whole GPU
Hazy Research
How Many Llamas Can Dance in the Span of a Kernel?
Hazy Research
One Kernel for All Your GPUs
Hazy Research
1.5x Faster MoE Training with Custom MXFP8 Kernels Built From Scratch
Cursor
Look Ma, No Bubbles! Designing a Low-Latency Megakernel for Llama-1B
Hazy Research
Education
Stanford University
M.S. in Computer Science
- Recipient of the merit-based Kwanjeong scholarship ($60,000)
Seoul National University
B.S. in Computer Science and Engineering
- GPA: 4.21/4.3 (class rank: 1st)
- Recipient of the National Science and Engineering Scholarship for Gifted Students (full scholarship)
- Member of the College of Engineering Honor Society