jyh_2023_2.jpeg

Yuhua Jiang

PhD Student
Tsinghua University

Email: jiangyh22@mails.tsinghua.edu.cn

Bio

I am a PhD student in the Department of Automation, Tsinghua University, advised by Prof. Qianchuan Zhao, where I started my PhD study in 2022. Before that, I received my Bachelor’s degree from Nanjing University in 2022. I am also fortunate to collaborate with Prof. Chongjie Zhang.

My research interests lie in reinforcement learning (RL), with a focus on exploration, self-play, LLM reasoning, and agentic RL. I am currently a research intern at ByteDance Seed, working on RL scaling for LLM reasoning and agent with Yufeng Yuan, Yu Yue, and Lin Yan.

Publications

Technical Reports

  1. Seed2.1: A Next-Generation Agent for Real-World Productivity
    ByteDance Seed
    Tech Report
  2. Seed2.0: A Large-Scale Production-Ready Foundation Model Series
    ByteDance Seed
    Tech Report
  3. Seed1.8 Model Card: Towards Generalized Real-World Agency
    ByteDance Seed
    Tech Report
  4. Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
    ByteDance Seed
    Tech Report

Papers

  1. GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems
    Yiqin Yang, Xinyu Yang, Yuhua Jiang, N. Mu, Hao Hu, R. Xie, Z. Zhang, S. Li, Y. Ni, Qianchuan Zhao, and others
    ICLR 2026
  2. OPRIDE: Efficient Offline Preference-based Reinforcement Learning via In-Dataset Exploration
    Yiqin Yang, Hao Hu, Y. Mao, J. Zhang, C. Wu, Yuhua Jiang, Xinyu Yang, R. Xie, Y. Fan, and others
    ICLR 2026
  3. Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
    Yuhua Jiang, J. Huang, Y. Yuan, X. Mao, Y. Yue, Qianchuan Zhao, and L. Yan
    ICLR 2026
  4. DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration
    X. Li, Y. Ding, Yuhua Jiang, Y. Zhao, R. Xie, S. Xu, Y. Ni, Yiqin Yang, and Bo Xu
    CogSci 2025
  5. PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
    Yuhua Jiang, Y. Xiong, Y. Yuan, C. Xin, W. Xu, Y. Yue, Qianchuan Zhao, and L. Yan
    MATH-AI’25
  6. Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
    Yiqin Yang, Q. Wang, Chenghao Li, Hao Hu, C. Wu, Yuhua Jiang, Dianyu Zhong, Z. Zhang, Qianchuan Zhao, and others
    ICLR 2025
  7. Episodic Novelty Through Temporal Distance
    Yuhua Jiang*, Qihan Liu*, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu, Jun Yang, Bin Liang, Bo XU, Chongjie Zhang, and Qianchuan Zhao
    ICLR 2025
    Oral @ IMOL@NeurIPS 2024 (3/47, Top 7%)
  8. NeuralPlane: An Efficiently Parallelizable Platform for Fixed-wing Aircraft Control with Reinforcement Learning
    C. Xue, Qihan Liu, Xiaoteng Ma, X. Qin, Yuhua Jiang, G. Ning, Y. Qi, J. Ren, Bin Liang, and Jun Yang
    NeurIPS 2024
  9. Learning Diverse Risk Preferences in Population-based Self-play
    Yuhua Jiang*, Qihan Liu*, Xiaoteng Ma, Chenghao Li, Yiqin Yang, Jun Yang, Bin Liang, and Qianchuan Zhao
    AAAI 2024
    Oral Presentation (Top 3%)
  1. Maximum Next-State Entropy for Efficient Reinforcement Learning
    Dianyu Zhong, Yiqin Yang, Z. Zhang, Yuhua Jiang, Bo Xu, and Qianchuan Zhao
    RA-L 2025