报告题目:Continuous-time q-learning for mean-field control with common noise
报 告 人:魏晓利 副教授(哈尔滨工业大学)
报告时间:2025年8月8日(星期五)10:00—11:00
报告地点:数学科学学院114(小报告厅)
校内联系人:李娜 教授 联系方式:84708354
报告摘要:This talk investigates the continuous-time entropy-regularized reinforcement learning (RL) for mean-field control problems with controlled common noise. We study the continuous-time counterpart of the Q-function in the mean-field model, coined as q-function in the single agent's model. It is shown that the controlled common noise gives rise to a double integral term in the exploratory dynamic programming equation, rendering the policy improvement iteration intricate. The policy improvement at each iteration can be characterized by a first-order condition using the notion of partial linear derivative in policy. To devise some model-free RL algorithms, we introduce the integrated q-function (Iq-function) on distributions of both state and action, and an optimal policy can be identified as a two-layer fixed point to the soft argmax operator of the Iq-function. The martingale characterization of the value function and Iq-function is established by exhausting all test policies. This allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the policy improvement rule induced by the partial linear derivative of the Iq-function and the value function and Iq-function are updated simultaneously in the Critic-step based on the martingale orthogonality condition. In two examples, within and beyond LQ-control framework, we implement and compare our algorithms with satisfactory performance.
报告人简介:魏晓利,哈尔滨工业大学数学研究院副教授。她本科毕业于中国科学技术大学,硕士毕业于巴黎第九大学,于巴黎第七大学(现巴黎西岱大学)获得博士学位。2019-2021年在加州大学伯克利分校从事博士后。2021年-2023年担任清华大学深圳国际研究生院助理教授。主要从事随机微分博弈、平均场理论及强化学习等研究。论文发表在Operations Research,Mathematical Finance, SIAM Journal on Control and Optimization等期刊杂志。