更新至2022年10月,更新内容请访问http://nkcs.iops.ai/shenglinzhang/
张圣林
电子邮件: zhangsl at nankai.edu.cn
职 称: 副教授,博士生导师,硕士生导师
学 历: 博士
研究方向: 智能运维,服务管理,机器学习
个人主页:http://nkcs.iops.ai/shenglinzhang/
个人简介
张圣林现为南开大学软件学院副教授,先进计算与关键软件(信创)海河实验室双聘研究员,主要研究方向为基于机器学习的智能运维,包括异常检测、故障定位、根因分析和故障预测等。获ISSRE 18最佳学术论文奖、清华大学优秀博士学位论文、天津市科技进步一等奖(排名11)、南开大学第九届“良师益友”称号、华为计算产品线“最佳技术合作教授”,入选天津市“131”创新型人才培养工程(第三层次)。在ATC, WWW, VLDB, SIGMETRICS, CoNEXT, INFOCOM, IJCAI等国际会议和JSAC, TC, TSC, TNSM, JSS等国际期刊发表高水平论文40余篇。 主持国家自然科学基金项目2项, 中国博士后科学基金项目1项,横向项目8项(与华为、字节跳动、网商银行、中兴等合作),参与国家重点研发计划2项,横向项目十余项(与阿里巴巴、腾讯、快手、移动研究院、建设银行、百度、微众银行、搜狗、虎牙等合作)。于2017年获清华大学工学博士学位(计算机科学与技术专业),2012年获西安电子科技大学工学学士学位(计算机学院网络工程专业)。在攻读博士学位期间,曾经赴佐治亚理工学院学习。 于2014-2017年在百度运维部实习, 并于2018-2019年在阿里巴巴从事访问学者研究。担任CCF互联网专委、软件工程专委、服务计算专委执行委员, YOCSEF天津AC委员,WWW22, ICNP22, IWQoS22, ISSRE 19/20/21/22程序委员会委员,TON/JSAC/TDSC/IoTJ/ASUR/TNSM/JCST等期刊审稿人。 2017年至 2018年,张圣林作为核心人员参与了首届AIOps挑战赛的筹备工作,并筹办了首届AIOps研讨会。
课题组现在与阿里巴巴、腾讯、华为、字节跳动、中兴、百度、虎牙、云账户、CERNET等国内一流IT公司建立了合作关系,分析互联网服务应用层面和机器层面的数据,解决影响用户体验的问题。课题组欢迎更多优秀的硕士研究生、本科生加入,共同解决世界级的难题,提高数百万用户的使用体验!
教育背景
2012.9-2017.7,清华大学,工学博士,计算机科学与技术专业(导师:刘莹、裴丹)
2016.1-2016.5,美国佐治亚理工学院(Georgia Tech),访问学者(导师:Prof. Jun (Jim) Xu)
2008.9-2012.7,西安电子科技大学,工学学士,网络工程专业
科研项目、成果、获奖、专利等情况
科研项目:
面向多模态数据的大规模云平台故障诊断机制研究,国家自然科学基金面上项目,2023.1-2026.12,项目负责人
面向多语法语义日志的数据中心网络设备异常检测机制研究,国家自然科学基金青年基金项目,2020.1-2022.12,项目负责人
基于日志的数据中心网络设备异常检测机制研究,中国博士后科学基金面上项目,2019.6-2021.5,项目负责人
集群通信故障诊断技术研究项目,华为公司合作项目,2021.11-2022.11,项目负责人
基于图推理的分布式系统故障定位技术研究,网商银行合作项目,2021.11-2022.11,项目负责人
面向数据中心网络设备的智能异常检测,中兴公司合作项目,2021.9-2022.9,项目负责人
OS故障诊断项目,华为公司合作项目,2020.8-2021.8,项目负责人
智能变更评估技术合作项目,华为公司合作项目,2020.4-2021.4,项目负责人
面向机器整体异常的无监督机器聚类和多KPI异常检测模型,字节跳动合作项目,2019.7-2020.6,项目负责人
下一代互联网交换机故障预测机制研究,赛尔网络下一代互联网技术创新项目,2018.12-2019.12,项目负责人
基于日志的数据中心交换机故障预测机制研究,中央高校基本科研业务费专项资金资助项目,2018.1-2019.12,项目负责人
AI运维联合创新技术项目,百度,2017.10~2022.10,项目负责人
基于大数据分析的互联网服务性能管理体系结构研究,国家自然科学基金面上项目,2015.1~2018.12,参与
指导学生参与竞赛或项目:
文雨晨,朱博林,贾雪莹,张嘉诚,王家驹。基于深度学习的微服务调用链异常检测。2021年天津市大学生创新创业训练计划
郭洲蕊,刘炼,常阔。2019年全国大学生数学建模竞赛天津赛区二等奖
Chang Liu, Siyuan Teng, Yao Wang, Finalist, 2019 Interdisciplinary Contest In Modeling (美国大学生数学建模竞赛)
Chenyu Zhao, Can Wang, Jincheng Zhang, Honorable Mention, 2019 Interdisciplinary Contest In Modeling (美国大学生数学建模竞赛)
Congzheng Chen, Xu Chen, Feixiang Li, Honorable Mention, 2019 Interdisciplinary Contest In Modeling (美国大学生数学建模竞赛)
黄翰林,李浩哲,胡智龙,滕思远,张心怡。基于半监督学习的服务指标异常检测。2019年天津市大学生创新创业训练计划。
Quan Ding, Pengbo Yan, Dadi Peng, Honorable Mention, 2018 Interdisciplinary Contest In Modeling (美国大学生数学建模竞赛)
陈戌,鲍阿勇,陈彬,钟震宇,柳郁青。基于机器学习的日志处理与交换机故障预测。2018年天津市大学生创新创业训练计划。
个人获奖情况:
2022 华为“最佳技术合作教授”
2021 天津市科技进步一等奖(排名11)
2021 南开大学第九届“良师益友”
2018 天津市“131”创新型人才培养工程第三层次
2018 Best Research Paper Award, IEEE ISSRE
2017 清华大学优秀博士学位论文二等奖
2016 清华之友-搜狐研发奖学金
2015 清华大学综合优秀奖学金(中国航天科技CASC奖学金)
2014 光华一等奖学金
2011 国家奖学金
2011 美国大学生数学建模竞赛二等奖
2010 国家奖学金
2010 全国大学生数学建模竞赛陕西省一等奖
2009 国家励志奖学金
特邀报告:
Failure Detection, Diagnosis, and Prediction for Large-Scale Cloud Services, Asia Pacific Advanced Network, 2022.3.9, 线上
Failure Detection, Diagnosis, and Prediction for Large-Scale Cloud Services, MS-AIOps workshop (co-located with ISSRE 2021), 2021.10.28,武汉
数据中心智能故障预测、诊断与溯源,2021 CCF青年精英大会,2021.5.14,沈阳
PreFix: Switch Failure Prediction in Datacenter Networks, 第七届中国互联网学术年会,优秀青年学者论坛,2018.9.9,恩施
“网络智能运维中的科研问题”,华为网络天下•数据中心技术论坛,2018.6.7,南京
“智能运维中的科研问题”,存储联盟“智能存储与智能运维”技术沙龙,2018.4.26,北京
撰写论文、专著、教材等
发表论文:
2022
Yiran Cheng, Bo Cheng, Pengxiang Jin, Yongqian Sun*, Xiaohui Nie, Nengwen Zhao, Shenglin Zhang, Dan Pei. Effective Attribute Selection for Multi-dimensional Root Cause Analysis. IEEE International Symposium on Software Reliability Engineering (ISSRE), Charlotte, North Carolina, USA, October 31 - November 3, 2022 (CCF B).
Xuanrun Wang, Kanglin Yin, Qianyu Ouyang, Xidao Wen, Shenglin Zhang, Wenchi Zhang, Li Cao, Jiuxue Han, Xing Jin, Dan Pei. Identifying Erroneous Software Changes through Self-Supervised Contrastive Learning on Time Series Data. IEEE International Symposium on Software Reliability Engineering (ISSRE), Charlotte, North Carolina, USA, October 31 - November 3, 2022 (CCF B).
Shenglin Zhang, Zhenyu Zhong, Dongwen Li, Qiliang Fan, Yongqian Sun*, Man Zhu, Yuzhi Zhang, Dan Pei, Jiyan Sun, Yinlong Liu, Hui Yang, Yongqiang Zou. Efficient KPI Anomaly Detection Through Transfer Learning for Large-Scale Web Services. IEEE Journal on Selected Areas in Communications (JSAC), Accepted (CCF A, SCI中科院1区, Impact Factor: 9.144).
Yongqian Sun, Kunlin Jian, Liyue Cui, Guifei Jiang, Shenglin Zhang*, Yuzhi Zhang, Dan Pei. Online Malicious Domain Name Detection with Partial Labels for Large-Scale Dependable Systems. The Journal of Systems & Software, 190: 1-12, 2022 (CCF B, SCI中科院2区, Impact Factor: 2.829).
Yongqian Sun, Daguo Cheng, Pengxiang Jin, Quan Ding, Shenglin Zhang*, Xu Chen, Yuzhi Zhang, Minghan Liang, Dan Pei, Jianyan Zheng, Sen Luo, Xinyu Tang. Robust Anomaly Clue Localization of Multi-dimensional Derived Measure for Online Video Services. IEEE Transactions on Services Computing. Accepted (CCF B, SCI中科院1区, Impact Factor: 8.216).
Xianglin Lu, Zhe Xie, Zeyan Li, Mingjie Li, Xiaohui Nie, Nengwen Zhao, Qingyang Yu, Shenglin Zhang, Kaixin Sui, Lin Zhu and Dan Pei. Generic and Robust Performance Diagnosis via Causal Inference for OLTP Database systems. IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), May 16-19, 2022 (CCF C).
Shenglin Zhang, Dongwen Li, Zhenyu Zhong, Jun Zhu, Minghan Liang, Jiexi Luo, Yongqian Sun*, Ya Su, Sibo Xia, Zhongyou Hu, Yuzhi Zhang, Dan Pei, Jiyan Sun, Yinlong Liu. Robust System Instance Clustering for Large-Scale Web Services. The Web Conference (WWW), Virtual Conference, April 25-29, 2022 (CCF A).
Xiaolei Hua, Lin Zhu, Shenglin Zhang, Zeyan Li, Su Wang, Dong Zhou, Shuo Wang, Chao Deng. GenAD: General Representations of Multivariate Time Series for Anomaly Detection. Artificial Intelligence for Cyber Security (AICS), AAAI-22 Workshop, Vancouver, BC, Canada, February 2022.
2021
Shenglin Zhang, Chenyu Zhao, Yicheng Sui, Ya Su*, Yongqian Sun, Yuzhi Zhang, Dan Pei, Yizhe Wang. “Robust KPI Anomaly Detection for Large-Scale Software Services with Partial Labels”.IEEE International Symposium on Software Reliability Engineering (ISSRE), October 25-28, 2021, Wuhan, China (CCF B).
Minghua Ma, Shenglin Zhang*, Junjie Chen, Haozhe Li, Yongliang Lin, Jim Xu, Xiaohui Nie, Bo Zhu, Yong Wang. “Jump-Starting Multivariate Time Series Anomaly Detection for Online Service Systems”. USENIX Annual Technical Conference (USENIX ATC), Virtual Conference, July 14-16, 2021 (CCF A).
Ya Su, Youjian Zhao, Ming Sun, Shenglin Zhang*, Xidao Wen, Yongsu Zhang, Xian Liu, Xiaozhou Liu, Junliang Tang, Wenfei Wu, Dan Pei. “Detecting Outlier Machine Instances through Gaussian Mixture Variational Autoencoder with One Dimensional CNN”. IEEE Transactions on Computers (TC), 71 (4):892 – 905. (CCF A, SCI中科院2区, Impact Factor: 2.711)
Weibin Meng, Ying Liu, Shenglin Zhang*, Federico Zaiter, Yuzhe Zhang, Yuheng Huang, Zhaoyang Yu, Yuzhi Zhang, Lei Song, Ming Zhang, Dan Pei. “LogClass: Anomalous Log Identification and Classification with Partial Labels”. IEEE Transactions on Network and Service Management (TNSM), Volume 18, Issue 2, pp 1870 - 1884, June 2021 (SCI中科院2区, Impact Factor: 3.878).
Ming Sun, Ya Su, Shenglin Zhang, Yuanpu Cao, Yuqing Liu, Dan Pei, Wenfei Wu, Yongsu Zhang, Xiaozhou Liu, Junliang Tang. “CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer”. IEEE International Conference on Computer Communications (INFOCOM) 2021, Virtual Conference, May 2021 (CCF A)
2020
苏金树,赵宝康,董德尊,吕高锋,文梅,魏亮,彭伟,李福亮,张圣林,孙永谦. 新一代数据中心网络技术研究进展.《CCF 2019-2020中国计算机科学技术发展报告》,机械工业出版社,2020.10.
Rui Chen, Shenglin Zhang, Dongwen Li, Yuzhe Zhang, Fangrui Guo, Weibin Meng, Dan Pei, Yuzhi Zhang, Xu Chen, Yuqing Liu. "Cross-System Log Anomaly Detection for Software Systems". IEEE International Symposium on Software Reliability Engineering (ISSRE), Virtual Conference, October 2020 (CCF B).
Ping Liu, Haowen Xu, Qianyu Ouyang, Rui Jiao, Zhekang Chen, Xiaoying Bai, Shenglin Zhang, Jiahai Yang, Linlin Mo, Jice Zeng, Wenman Xue, Dan Pei. “Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks”. IEEE International Symposium on Software Reliability Engineering (ISSRE), Virtual Conference, October 2020 (CCF B).
Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen, Dan Pei. “Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases”. International Conference on Very Large Data Bases (VLDB), Virtual Conference, August 2020 (CCF A)。
Weibin Meng, Ying Liu, Federico Zaiter, Shenglin Zhang*, Yihao Chen, Yuzhe Zhang, Yichen Zhu, En Wang, Ruizhi Zhang, Shimin Tao, Dian Yang, Rong Zhou, Dan Pei. “LogParse: Making Log Parsing Adaptive through Word Classification”. IEEE International Conference on Computer Communications (ICCCN) 2020, Virtual Conference, August 3-6, 2020 (CCF C).
Weibin Meng, Ying Liu, Yuheng Huang, Shenglin Zhang*, Federico Zaiter, Bingjin Chen, Dan Pei. “A Semantic-aware Representation Framework for Online Log Analysis”. IEEE International Conference on Computer Communications (ICCCN) 2020, Virtual Conference, August 3-6, 2020 (CCF C).
Yuan Meng, Shenglin Zhang*, Yongqian Sun, Ruru Zhang, Zhilong Hu, Yiyin Zhang, Chenyang Jia, Zhaogang Wang, Dan Pei. “Localizing Failure Root Causes in a Microservice through Causality Inference”. International Symposium on Quality of Service (IWQoS), Virtual Conference, June 2020 (CCF B)
张圣林,林潇霏,孙永谦,张玉志,裴丹. 基于深度学习的无监督KPI异常检测. 《数据与计算发展前沿》, 2(3): 87-100, 2020.6 (邀稿)
张圣林,李东闻,孙永谦,孟伟彬,张宇哲,张玉志,刘莹,裴丹. 面向云数据中心多语法日志通用异常检测机制. 《计算机研究与发展》,57(4):778-790, 2020.
2019
Ping Liu, Yu Chen, Xiaohui Nie, Jing Zhu, Shenglin Zhang, Kaixin Sui, Ming Zhang, Dan Pei. “FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Software Service Failure Mitigation”. IEEE International Symposium on Software Reliability Engineering (ISSRE), Berlin, Germany, October 2019 (CCF B)
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang*, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, Rong Zhou. “LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs”. International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, August 2019 (CCF A).
Yuan Meng, Shenglin Zhang*, Zijie Ye, Benliang Wang, Zhi Wang, Yongqian Sun, Qitong Liu, Shuai Wang, Dan Pei. “Causal Analysis of the Unsatisfying Experience in Realtime Mobile Multiplayer Games in the Wild”. IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, July 2019 (CCF B)
2018
Shenglin Zhang, Ying Liu, Dan Pei, Yu Chen, Xianping Qu, Shimin Tao, Zhi Zang, Xiaowei Jing, Mei Feng. ``FUNNEL: Assessing Software Changes in Web-based Services”, IEEE Transactions on Services Computing, Volume 11, Issue 1, January - February 2018 (SCI Indexed, Impact Factor: 5.82,中科院2区)
Minghua Ma, Shenglin Zhang*, Dan Pei, Xin Huang, Hongwei Dai. `` Robust and Rapid Adaption for Concept Drift in Software System Anomaly Detection''.IEEE International Symposium on Software Reliability Engineering (ISSRE), Memphis, TN, USA, October 2018 (Best Research Paper Award, CCF B)
Shenglin Zhang, Ying Liu, Weibin Meng, Zhiling Luo, Jiahao Bu, Sen Yang, Peixian Liang, Dan Pei, Jun (Jim) Xu, Yuzhi Zhang, Yu Chen, Hui Dong, Xianping Qu, Lei Song. ``PreFix: Switch Failure Prediction in Datacenter Networks ". ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems 2018, Irvine, California, USA, June 18-22, 2018 (CCF B, acceptance rate 20%, 54/270, one of the only two papers with institutes in Mainland China)
Weibin Meng, Ying Liu, Shenglin Zhang*, Dan Pei, Hui Dong, Lei Song, Xulong Luo. ``Device-Agnostic Log Anomaly Classification with Partial Labels'', IEEE/ACM International Symposium on Quality of Service (IWQOS) 2018, Banff, Alberta, Canada, June 2018 (CCF B)
Jiahao Bu, Ying Liu, Shenglin Zhang*, Weibin Meng, Qitong Liu, Xiaotian Zhu, Dan Pei. “Rapid Deployment of Anomaly Detection Models for Large Number of Emerging KPI Streams”. International Performance Computing and Communications Conference (IPCCC), Orlando, Florida, USA, November 2018 (CCF C)
Shenglin Zhang, Ying Liu, Dan Pei, and Baojun Liu. ``Measuring BGP AS Path Looping (BAPL) and Private AS Number Leaking (PANL)'', Journal of Tsinghua University (Science and Technology), Volume 23, Number 1, pp 22– 34, February 2018 (SCI Indexed, IF 1.328)
2017
裴丹,张圣林,裴昶华。《基于机器学习的智能运维》。中国计算机学会通讯,专栏文章,2017年第12期
Shenglin Zhang, Weibin Meng, Jiahao Bu, Sen Yang, Ying Liu, Dan Pei, Jun (Jim) Xu, Yu Chen, Hui Dong, Xianping Qu, Lei Song. ``Syslog Processing for Switch Failure Diagnosis and Prediction in Datacenter Networks”, IEEE/ACM International Symposium on Quality of Service (IWQOS) 2017, VILANOVA I LA GELTRÚ, SPAIN, June 2017 (CCF B)
2016及以前
Shenglin Zhang, Ying Liu, Dan Pei, Yu Chen, Xianping Qu, Shimin Tao, and Zhi Zang. ``Rapid and Robust Impact Assessment of Software Changes in Large Internet-based Services”, ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT), Heidelberg, Germany, December 2015, 13 pages (CCF B)
Ying Liu, Shenglin Zhang*, Hongying Liu. ``A bottleneck-free model for P4P”, SCIENCE CHINA Information Sciences, Volume 58, Issue 10, pp 1-15, October 2015 (SCI Indexed, Impact Factor: 3.304,中科院2区)
Ying Liu, Gang Ren, Jianping Wu, Shenglin Zhang, Lin He, Yihao Jia. ``Building An IPv6 Address Generation and Traceback System With NIDTGA in Address Driven Network'', SCIENCE CHINA Information Sciences, Volume 58, Issue 12, pp 1-14, December 2015 ( (SCI Indexed, Impact Factor: 3.304,中科院2区)
Shenglin Zhang, Ying Liu, Dan Pei. ``A Measurement Study on BGP AS Path Looping Behavior”. IEEE International Conference on Computer Communications and Networks (ICCCN), Shanghai, China, August 4, 2014, 7 pages. (CCF C)
Ying Liu, Shenglin Zhang*, Hongying Liu. ``An Improved Cooperative Model of P2P and ISP”. Frontiers in Internet Technologies, 85-96, LNCS, Springer, 2013. (EI检索)
张圣林,刘莹。AS 路径环路的研究。《通信学报》, 2013, (Z2): 17-22. (EI检索)
申请专利:
张圣林, 应用上线指标的检测方法及装置. 中国专利:CN104809059B,2018.2,已授权
张圣林,应用上线指标的检测方法及装置. 中国专利:CN104809059B,2018.2,已授权
任罡, 刘莹, 吴建平, 张圣林, 贾溢豪, 何林. 用户互联网身份标识及生成方法和系统. 中国专利:CN105262848B,2018.2,已授权
张圣林,李东闻,陈锐,孙永谦,张玉志. 基于迁移学习的日志异常检测方法. 中国专利:202010813538X,2020.8,已受理
讲授课程
计算机网络
Computer Algorithm Design and Analysis
软件测试
算法导论
高级语言编程实训
社会兼职
CCF互联网专委会执行委员
CCF软件工程专委会执行委员
CCF服务计算专委会执行委员
CCF YOCSES天津AC委员
TPC Member of WSDM 23, WWW 2022, IEEE ICNP 2022, IEEE/ACM IWQoS 2022, IEEE ISSRE 2019/2020/2021/2022, HDR-Net 2019/2020, AIOps workshop 2020
TON/JSAC/TDSC/IoTJ/ASUR/TNSM/JCST等期刊审稿人
IEEE Member
ACM Member