Continuous-time q-learning for mean-field control with common noise-大连理工大学数学科学学院

专题报告

首页 > 科学研究 > 学术报告 > 专题报告 > 正文

Continuous-time q-learning for mean-field control with common noise

发布时间：2025年08月02日 15:44 浏览量：

报告题目：Continuous-time q-learning for mean-field control with common noise

报告人：魏晓利副教授（哈尔滨工业大学）

报告时间：2025年8月8日（星期五）10:00—11:00

报告地点：数学科学学院114（小报告厅）

校内联系人：李娜教授 联系方式：84708354

报告摘要：This talk investigates the continuous-time entropy-regularized reinforcement learning (RL) for mean-field control problems with controlled common noise. We study the continuous-time counterpart of the Q-function in the mean-field model, coined as q-function in the single agent's model. It is shown that the controlled common noise gives rise to a double integral term in the exploratory dynamic programming equation, rendering the policy improvement iteration intricate. The policy improvement at each iteration can be characterized by a first-order condition using the notion of partial linear derivative in policy. To devise some model-free RL algorithms, we introduce the integrated q-function (Iq-function) on distributions of both state and action, and an optimal policy can be identified as a two-layer fixed point to the soft argmax operator of the Iq-function. The martingale characterization of the value function and Iq-function is established by exhausting all test policies. This allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the policy improvement rule induced by the partial linear derivative of the Iq-function and the value function and Iq-function are updated simultaneously in the Critic-step based on the martingale orthogonality condition. In two examples, within and beyond LQ-control framework, we implement and compare our algorithms with satisfactory performance.

报告人简介：魏晓利，哈尔滨工业大学数学研究院副教授。她本科毕业于中国科学技术大学，硕士毕业于巴黎第九大学，于巴黎第七大学（现巴黎西岱大学）获得博士学位。2019-2021年在加州大学伯克利分校从事博士后。2021年-2023年担任清华大学深圳国际研究生院助理教授。主要从事随机微分博弈、平均场理论及强化学习等研究。论文发表在Operations Research，Mathematical Finance, SIAM Journal on Control and Optimization等期刊杂志。

上一篇：Sparse-dense flight copy-based interactive mechanism to airline integrated recovery with cruise speed control

下一篇：增长型初始扰动分布最优估计及其在台风路径集合预报中的应用