上海海事大学滴水湖经济与管理论坛系列讲座第366期:Model-Free Feature Screening via Subsampling: A Unified Framework and Two-Step Strategy for Correlated Covariates
2026年05月20日

一、报告时间

2026年6月2日(星期二)15:30

二、报告形式

经济管理学院101报告厅

三、报告人

王启华  中国科学院数学与系统科学研究院

四、报告主题

Model-Free Feature Screening via Subsampling: A Unified Framework and Two-Step Strategy for Correlated Covariates

五、报告摘要

Feature screening is widely used in ultrahigh-dimensional data analysis. However, in the era of big data, classic feature screening methods face two main challenges. First, substantial sample sizes bring computational and storage burdens to classic screening methods. Second, covariates are often highly correlated in ultrahigh-dimensional data, in which case most existing screening methods may fail to identify important covariates that are marginally uncorrelated with the response. In this paper, we first develop a general subsampling-based feature screening framework via sampling with replacement scheme. This framework considers a wide range of correlation measures, including model-free screening measures. The proposed general method enjoys the sure screening property under mild contions and is computationally attractive. Furthermore, when strong dependence exists between covariates, we propose a two-step subsampling method based on the proposed general framework. In the first step, uniform sampling is used to select covariates with strong marginal correlation with the response. In the second step, kernel-based non-uniform sampling is designed to recruit important features from remaining covariates. The two-step method helps recover important covariates that have no marginal correlations with response. The sure screening property is established for the two-step method. Simulation studies and an empirical analysis of a news dataset demonstrate the effectiveness of the proposed methods.

六、报告人简介

王启华,中国科学院数学与系统科学研究院研究员,博士生导师,国家杰出青年基金获得者,教育部长江学者奖励计划特聘教授,中科院“百人计划”入选者。曾在北京大学、香港大学任教,先后访问加拿大、美国、德国及澳大利亚10多所世界一流大学。主要从事复杂数据经验似然统计推断、缺失数据分析、高维数据统计分析、大规模数据分析等方面的研究, 出版专著三部,在Journal of the Royal Statistical Society Series B (JRSSB), The Annals of Statistics,  Journal of the American Statistical Association (JASA)及Biometrika等国际重要刊物发表论文150余篇, 部分工作已产生持久不断的学术影响。曾主持国家杰出青年科学基金项目、重点项目、多项面上项目,作为核心骨干成员先后参加了两项国家自然科学基金创新群体项目及一项国家重点研发计划项目。


  • 临港校区:上海市浦东新区海港大道1550号(近沪城环路古棕路)  邮编:201306
  • 港湾校区:上海市浦东新区浦东大道2600号(近金桥路) 邮编:200136