jyh_2023_2.jpeg


Jiang Yuhua

Hi there! I am Yuhua Jiang, a PhD student in the Department of Automation, Tsinghua University, advised by Prof. Qianchuan Zhao. I am also fortunate to collaborate with Prof. Chongjie Zhang.

My research interests lie in reinforcement learning (RL), with a focus on exploration, self-play, LLM reasoning, and agentic RL. I am currently a research intern at ByteDance Seed, working on LLM reasoning and agentic RL with Yufeng Yuan, Yu Yue, and Lin Yan.

Selected Publications

  1. Tech Report
    Seed2.0: A Large-Scale Production-Ready Foundation Model Series
    ByteDance Seed
    2026
  2. Tech Report
    Seed1.8 Model Card: Towards Generalized Real-World Agency
    ByteDance Seed
    2026
  3. ICLR 2026
    Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
    Yuhua Jiang, J. Huang, Y. Yuan, X. Mao, Y. Yue, Qianchuan Zhao, and L. Yan
    In ICLR 2026
  4. MATH-AI’25
    PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
    Yuhua Jiang, Y. Xiong, Y. Yuan, C. Xin, W. Xu, Y. Yue, Qianchuan Zhao, and L. Yan
    In NeurIPS 2025 MATH-AI Workshop 2025
  5. Tech Report
    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
    ByteDance Seed
    2025
  6. ICLR 2025
    Episodic Novelty Through Temporal Distance
    Yuhua Jiang*, Qihan Liu*, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu, Jun Yang, Bin Liang, Bo XU, Chongjie Zhang, and Qianchuan Zhao
    In ICLR 2025
    Oral @ IMOL@NeurIPS 2024 (3/47, Top 7%)
  7. AAAI 2024
    Learning Diverse Risk Preferences in Population-based Self-play
    Yuhua Jiang*, Qihan Liu*, Xiaoteng Ma, Chenghao Li, Yiqin Yang, Jun Yang, Bin Liang, and Qianchuan Zhao
    In AAAI 2024
    Oral Presentation (Top 3%)