Bio
I am a PhD student in the Department of Automation, Tsinghua University, advised by Prof. Qianchuan Zhao, where I started my PhD study in 2022. Before that, I received my Bachelor’s degree from Nanjing University in 2022. I am also fortunate to collaborate with Prof. Chongjie Zhang.
My research interests lie in reinforcement learning (RL), with a focus on exploration, self-play, LLM reasoning, and agentic RL. I am currently a research intern at ByteDance Seed, working on RL scaling for LLM reasoning and agent with Yufeng Yuan, Yu Yue, and Lin Yan.
Publications
Technical Reports
Papers
-
OPRIDE: Efficient Offline Preference-based Reinforcement Learning via In-Dataset ExplorationICLR 2026
-
NeuralPlane: An Efficiently Parallelizable Platform for Fixed-wing Aircraft Control with Reinforcement LearningNeurIPS 2024
-
Maximum Next-State Entropy for Efficient Reinforcement LearningRA-L 2025