报告题目:Doubly Divided Massive Data for Prediction Using Model Aggregation
报告人:刘妍岩 教授 (武汉大学)
报告时间:2022年5月12日 (星期四)下午14:00-15:00
腾讯会议 ID:686-105-874
邀请人:王晓光 副教授 联系电话:84708351-8213
报告摘要:Nowadays, massive data are often featured with high dimensionality as well as huge sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of the response, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm which only requires solving a shifted and penalized quadratic loss function on the master machine, while computing the gradient of the loss function on each local machine and then transferring it back to the master. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Theoretically, we establish the prediction error bound of the DGMA method, which can be explicitly expressed in terms of the local sample size and communication rounds. We further show that if the local sample size or communication rounds are sufficiently large, the proposed method can reach the prediction error bound of the oracle global method that has access to the full data. Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.
报告人简介:刘妍岩,武汉大学数学与统计学院教授,博士生导师。2001年获武汉大学理学博士学位。主要研究方向为生存分析、半参数统计推断、复杂高维数据模型结构选择以及大数据统计分析技术等。曾到美国北卡来罗纳大学教堂山分校、加拿大Simon-Fraser大学、香港理工大学、香港中文大学、德国Greifswald大学等学校短期访问和工作。主持完成国家自然科学基金以及教育部基金项目6项,目前主持国家自然科学基金面上项目一项,参加完成的成果“风险模型中的统计方法及相关理论与应用” 2013年湖北省自然科学奖三等奖(排名第一)。在统计学期刊 Journal of Machine Learning Research, Biometrics, Biostatistics, Genetics,Lifetime Data Analysis等期刊发表SCI研究论文六十余篇。目前担任statistical papers 副主编,中国现场统计学会第十一届理事会常务理事、中国数学会女专家工作委员会委员、 全国应用统计专业学位研究生教育指导委员会委员。