裴丹,张圣林,孙永谦,裴昶华,大语言模型时代的智能运维,中兴通讯技术,2024 年 4 月 第 30 卷第 2 期,P56-62
Yilun Liu, Yuhe Ji, Shimin Tao, Minggui He, Weibin Meng, Shenglin Zhang, Yongqian Sun, Yuming Xie, Boxing Chen, Hao Yang. LogLM: From Task-based to Instruction-based Automated Log Analysis. International Conference on Software Engineering(ICSE), Ontario, Canada, April 27-May 3, 2025 (CCF A)
Shenglin Zhang, Sibo Xia, Wenzhao Fan, Binpeng Shi, Xiao Xiong, Zhenyu Zhong, Minghua Ma, Yongqian Sun*, Dan Pei. Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis. ACM Transactions on Software Engineering and Methodology (TOSEM), 2025 (CCF A)
Yongqian Sun, Jiaju Wang, Zhengdan Li, Xiaohui Nie, Minghua Ma, Shenglin Zhang*, Yuhe Ji, Lu Zhang, Wen Long, Yongnan Luo, Hengmao Chen, Dan Pei. AIOpsArena: Scenario-Oriented Evaluation and Leaderboard for AIOps Algorithms in Microservices. IEEE Software Analysis, Evolution and Reengineering (SANER), 2025 (CCF-B)
Shenglin Zhang, Ting Xu, Jun Zhu, Yongqian Sun*, Pengxiang Jin, Binpeng Shi, Dan Pei. Privacy-preserving MTS anomaly detection for network devices through federated learning[J]. Information Sciences (CCF B, JCR Q1 & CAS Tier 1-Top, IF: 8.1), 2024: 121590.
Yongqian Sun, Minghan Liang, Shenglin Zhang*, Zeyu Che, Zhiyao Luo, Dongwen Li, Yuzhi Zhang, Dan Pei, Lemeng Pan, and Liping Hou. Efficient Multivariate Time Series Anomaly Detection Through Transfer Learning for Large-Scale Software Systems. ACM Transactions on Software Engineering and Methodology (TOSEM), 2024 (CCF A) .
Shenglin Zhang, Yongxin Zhao, Sibo Xia, Shirui Wei, Yongqian Sun*, Chenyu Zhao, Shiyu Ma, Junhua Kuang, Bolin Zhu, Lemeng Pan, Yicheng Guo, Dan Pei. No More Data Silos: UnifiedMicroservice Failure Diagnosis with Temporal Knowledge Graph. IEEE Transactions on Services Computing (TSC), 2024 , 17(6):4013-4026 (CCF A) .
Yongqian Sun, Zihan Lin, Binpeng Shi, Shenglin Zhang*, Shiyu Ma, Pengxiang Jin, Zhenyu Zhong, Lemeng Pan, Yicheng Guo, Dan Pei. Interpretable Failure Localization for Microservice Systems Based on Graph Autoencoder. ACM Transactions on Software Engineering and Methodology (TOSEM), , 2024 , 34(2):1-28 (CCF A) .
Yuan Yuan*, Tongqing Zhou*, Xiuhong Tan, Yongqian Sun, Yuqi Li, Zhixing Li, Zhiping Cai, and Tiejun Li. Exploring Hierarchical Patterns for Alert Aggregation in Supercomputers. 2024 International Symposium on Software Reliability Engineering (ISSRE), Tsukuba, Japan, October 28-31, 2024 (Best Paper Award, CCF B).
Yongqian Sun, Yang Guo, Minghan Liang, Xidao Wen, Junhua Kuang, Shenglin Zhang*, Hongbo Li, Kaixu Xia, and Dan Pei. Multivariate Time Series Anomaly Detection based on Pre-trained Models with Dual-Attention Mechanism. 2024 International Symposium on Software Reliability Engineering (ISSRE), Tsukuba, Japan, October 28-31, 2024 (CCF B) .
Shenglin Zhang, Zeyu Che, Zhongjie Pan, Xiaohui Nie, Yongqian Sun*, Lemeng Pan, Dan Pei. LabelEase: A Semi-Automatic Tool for Efficient and Accurate Trace Labeling in Microservices. 2024 International Symposium on Software Reliability Engineering (ISSRE), Tsukuba, Japan, October 28-31, 2024 (CCF B).
Shenglin Zhang, Xiao Xiong, Mengyao Li, Yongqian Sun*, Yongxin Zhao, Xia Chen, Bowen Deng and Dan Pei. Auto-PIP: Real-time Identification of Critical Performance Inflection Points in Software Stress Testing. 2024 International Symposium on Software Reliability Engineering (ISSRE), Tsukuba, Japan, October 28-31, 2024 (Best Industry Paper Award, CCF B) .
Shenglin Zhang, Pengtian Zhu, Minghua Ma, Jiagang Wang, Yongqian Sun*, Dongwen Li, Jingyu Wang, Qianying Guo, Xiaolei Hua, Lin Zhu, Dan Pei. Enhanced Fine-Tuning of Lightweight Domain-Specific Q&A model Based on Large Language Models. 2024 International Symposium on Software Reliability Engineering (ISSRE), Tsukuba, Japan, October 28-31, 2024 (CCF B).
Yongqian Sun, Binpeng Shi, Mingyu Mao, Minghua Ma, Sibo Xia, Shenglin Zhang*, Dan Pei. ART: A Unified Unsupervised Framework for Incident Management in Microservice Systems. 2024 IEEE/ACM Automated Software Engineering Conference (ASE), Sacramento, California, United States, October 27 – November 1, 2024 (CCF A) .
Lei Tao, Shenglin Zhang, Zedong Jia, Jinrui Sun, Minghua Ma, Zhengdan Li*, Yongqian Sun, Canqun Yang, Yuzhi Zhang, Dan Pei. Giving Every Modality a Voice in Microservice Failure Diagnosis via Multimodal Adaptive Optimization. 2024 IEEE/ACM Automated Software Engineering Conference (ASE), Sacramento, California, United States, October 27 – November 1, 2024 (CCF A) .
Shenglin Zhang, Yuhe Ji, Jiaqi Luan, Xiaohui Nie, Zi`ang Chen, Minghua Ma, Yongqian Sun*, Dan Pei. End-to-End AutoML for Unsupervised Log Anomaly Detection. 2024 IEEE/ACM Automated Software Engineering Conference (ASE), Sacramento, California, United States, October 27 – November 1, 2024 (CCF A).
Zhe Xie, Shenglin Zhang, Yitong Geng, Yao Zhang, Minghua Ma, Xiaohui Nie, Zhenhe Yao, Longlong Xu, Yongqian Sun, Wentao Li, Dan Pei. Microservice Root Cause Analysis With Limited Observability Through Intervention Recognition in the Latent Space. 2024 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), Barcelona, Spain, August 2024 (CCF A).
Shenglin Zhang, Jun Zhu, Bowen Hao, Yongqian Sun*, Xiaohui Nie, Jingwen Zhu, Xilin Liu, Xiaoqian Li, Yuchi Ma, Dan Pei. Fault Diagnosis for Test Alarms in Microservices Through Multi-source Data. ACM International Conference on the Foundations of Software Engineering (FSE), Industry Track. Porto de Galinhas, Brazil, July 15-19, 2024 (CCF A) .
Shenglin Zhang, Yongxin Zhao, Xiao Xiong, Yongqian Sun*, Xiaohui Nie, Jiacheng Zhang, Fenglai Wang, Xian Zheng, Yuzhi Zhang, Dan Pei. Illuminating the Gray Zone: Non-Intrusive Gray Failure Localization in Server Operating Systems. ACM International Conference on the Foundations of Software Engineering (FSE), Industry Track. Porto de Galinhas, Brazil, July 15-19, 2024 (CCF A).
Sibo Xia, Minghua Ma, Pengxiang Jin, Liyue Cui, Shenglin Zhang*, Wa Jin, Yongqian Sun, Dan Pei. Response Time Anomaly Diagnosis for Search Service[J]. Journal of Computer Research and Development, 2024, 61(6): 1573-1584 (in Chinese) [paper].
Shenglin Zhang, Zhongjie Pan, Heng Liu, Pengxiang Jin, Yongqian Sun*, Qianyu Ouyang, Jiaju Wang, Xueying Jia, Yuzhi Zhang, Hui Yang, Yongqiang Zou, and Dan Pei. Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems. The 34th IEEE International Symposium on Software Reliability Engineering (ISSRE 2023). Florence, Italy, October 2023 (CCF B).
Dongwen Li, Shenglin Zhang, Yongqian Sun*, Yang Guo, Zeyu Che, Shiqi Chen, Zhenyu Zhong, Minghan Liang, Minyi Shao, Mingjie Li, Shuyang Liu, Yuzhi Zhang, and Dan Pei. An Empirical Analysis of Anomaly Detection Issues for Multivariate Time Series. The 34th IEEE International Symposium on Software Reliability Engineering (ISSRE 2023). Florence, Italy, October 2023 (CCF B).
夏思博,马明华,金鹏翔,崔丽月,张圣林,金娃,孙永谦,裴丹. 搜索服务响应时间异常诊断.《计算机研究与发展》,2023 (CCF T1)
Yicheng Sui, Yuzhe Zhang, Jianjun Sun, Ting Xu, Shenglin Zhang*, Zhengdan Li, Yongqian Sun, Fangrui Guo, Junyu Shen, Yuzhi Zhang, Dan Pei, Xiao Yang, Li Yu. LogKG: Log Failure Diagnosis through Knowledge Graph. IEEE Transactions on Services Computing, 2023 (CCF A).
马玲; 樊漆亮; 许婷; 郭冠琛; 张圣林; 孙永谦; 张玉志. 基于强化学习的在离线混部调度策略. 通信学报. 2023. (CCF T1)
Shenglin Zhang, Pengxiang Jin, Zihan Lin, Yongqian Sun*, Bicheng Zhang, Sibo Xia, Zhengdan Li, Zhenyu Zhong, Minghua Ma, Wa Jin, Dai Zhang, Zhenyu Zhu, Dan Pei. Robust Failure Diagnosis of Microservice System through Multimodal Data. IEEE Transactions on Services Computing, 2023 (CCF A, accepted, to appear).
Chenyu Zhao, Minghua Ma, Zhenyu Zhong, Shenglin Zhang*, Zhiyuan Tan, Xiao Xiong, Lulu Yu, Jiayi Feng, Yongqian Sun, Yuzhi Zhang, Dan Pei, Qingwei Lin, Dongmei Zhang. Robust Multimodal Failure Detection for Microservice Systems. THE 29TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, August 2023 (CCF A).
Yongqian Sun, Minghan Liang, Zeyu Che, Dongwen Li, Tinghua Zheng, Shenglin Zhang*, Pengtian Zhu, Yuzhi Zhang, Dan Pei. Efficient Multivariate Time Series Anomaly Detection Through Transfer Learning for Large-Scale Web services. The 2023 IEEE International Conference on Web Services, Chicago, USA, July 2023 (CCF B)
Yongqian Sun, Daguo Cheng, Tiankai Yang, Shenglin Zhang*, Man Zhu, Xiao Xiong, Qiliang Fan, Minghan Liang, Dan Pei, Tianchi Ma, Yu Chen. Efficient and Robust KPI Outlier Detection for Large-Scale Datacenters. IEEE Transactions on Computers, 2023 (CCF A).
Zeyan Li, Junjie Chen, Yihao Chen, Chengyang Luo, Yiwei Zhao, Yongqian Sun, Kaixin Sui, Xiping Wang, Dapeng Liu, Xing Jin, Qi Wang, Dan Pei. Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems, Journal of Systems & Software, 2023, (CCF B)
Yiran Cheng, Bo Cheng, Pengxiang Jin, Yongqian Sun*, Xiaohui Nie, Nengwen Zhao, Shenglin Zhang, Dan Pei. Effective Attribute Selection for Multi-dimensional Root Cause Analysis. IEEE International Symposium on Software Reliability Engineering (ISSRE), Charlotte, North Carolina, USA, October 31 – November 3, 2022 (CCF B).
Shenglin Zhang, Zhenyu Zhong, Dongwen Li, Qiliang Fan, Yongqian Sun*, Man Zhu, Yuzhi Zhang, Dan Pei, Jiyan Sun, Yinlong Liu, Hui Yang, Yongqiang Zou. Efficient KPI Anomaly Detection Through Transfer Learning for Large-Scale Web Services. IEEE Journal on Selected Areas in Communications (JSAC), vol. 40, no. 8, pp. 2440-2455, Aug. 2022.(CCF A, SCI Indexed, Impact Factor: 9.144).
孙永谦,张茹茹,林子涵,张圣林,谭智元,张玉志. KPI异常检测方法评估. 《数据与计算发展前沿》先进智能计算平台及应用专刊,2022年6月,第4卷第3期,p46-65(CCF T3类期刊).
Yongqian Sun, Kunlin Jian, Liyue Cui, Guifei Jiang, Shenglin Zhang*, Yuzhi Zhang, Dan Pei. Online Malicious Domain Name Detection with Partial Labels for Large-Scale Dependable Systems. The Journal of Systems & Software (JSS), 190: 1-12, 2022 (CCF B, SCI中科院2区, Impact Factor: 2.829).
Yongqian Sun, Daguo Cheng, Pengxiang Jin, Quan Ding, Shenglin Zhang*, Xu Chen, Yuzhi Zhang, Minghan Liang, Dan Pei, Jianyan Zheng, Sen Luo, Xinyu Tang. Robust Anomaly Clue Localization of Multi-dimensional Derived Measure for Online Video Services. IEEE Transactions on Services Computing (TSC), 2022. (CCF A, SCI中科院1区, Impact Factor: 8.21).
Shenglin Zhang, Dongwen Li, Zhenyu Zhong, Jun Zhu, Minghan Liang, Jiexi Luo, Yongqian Sun*, Ya Su, Sibo Xia, Zhongyou Hu, Yuzhi Zhang, Dan Pei, Jiyan Sun and Yinlong Liu. Robust System Instance Clustering for Large-Scale Web Services. The Web Conference (WWW), Virtual Conference, April 25-29, 2022 (CCF A).
李思毅; 马诗雨; 崔丽月; 张圣林; 孙永谦; 张玉志. 微服务架构下的根因定位方法综述. 数据与计算发展前沿, 2022, 4(3): 78-89. (CCF T3)
Shenglin Zhang, Chenyu Zhao, Yicheng Sui, Ya Su*, Yongqian Sun, Yuzhi Zhang, Dan Pei, Yizhe Wang. “Robust KPI Anomaly Detection for Large-Scale Software Services with Partial Labels”.IEEE International Symposium on Software Reliability Engineering (ISSRE), October 25-28, 2021, Wuhan, China (CCF B).
Ruming Tang, Cheng Huang, Yanti Zhou, Hanwen Wu, Xianglin Lu, Yongqian Sun, Qi Li, Jinjin Lin, Weiyao Huang, Siyuan Sun, Dan Pei. "A Practical Machine Learning-Based Framework to Detect DNS Covert Communication in Enterprises". SecureComm 2020. Online, October, 2020. (CCF C)
苏金树,赵宝康,董德尊,吕高锋,文梅,魏亮,彭伟,李福亮,张圣林,孙永谦. 新一代数据中心网络技术研究进展.《CCF 2019-2020中国计算机科学技术发展报告》,机械工业出版社,2020.10.
Yuan Meng, Shenglin Zhang*, Yongqian Sun, Ruru Zhang, Zhilong Hu, Yiyin Zhang, Chenyang Jia, Zhaogang Wang, Dan Pei. "Localizing Failure Root Causes in a Microservice through Causality Inference". International Symposium on Quality of Service (IWQoS), Hangzhou, China, June 2020 (CCF B)
Shenglin Zhang, Ying Liu, Weibin Meng, Jiahao Bu, Sen Yang, Yongqian Sun*, Dan Pei, Jun Xu, Yuzhi Zhang, Lei Song, Ming Zhang. "Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks". IEEE Access, Volume 8, pp 30245-30261, February 2020 (JCR Zone 2, SCI Indexed, IF: 4.098)
张圣林,林潇霏,孙永谦*,张玉志,裴丹. 基于深度学习的无监督KPI异常检测. 《数据与计算发展前沿》, 2(3): 87-100, 2020.6 (CCF T3类期刊)
张圣林,李东闻,孙永谦*,孟伟彬,张宇哲,张玉志,刘莹,裴丹. 面向云数据中心多语法日志通用异常检测机制. 《计算机研究与发展》,57(4):778-790, 2020.(CCF A类 中文)
Ruming Tang, Zheng Yang, Zeyan Li, Weibin Meng, Haixin Wang, Qi Li, Yongqian Sun, Dan Pei, Tao Wei, Yanfei Xu, Yan Liu. "ZeroWall: Detecting Zero-Day Web Attacks through Encoder-Decoder Recurrent Neural Networks". IEEE International Conference on Computer Communications (INFOCOM), Beijing, China, Apr 27-30, 2020 (CCF A)
Zeyan Li, Chengyang Luo, Yiwei Zhao, Yongqian Sun*, Kaixin Sui, Xiping Wang, Dapeng Liu, Xing Jin, Qi Wang, Dan Pei, "Generic and Robust Localization of Multi-Dimensional Root Causes",IEEE International Symposium on Software Reliability Engineering (ISSRE), Berlin, Germany, October 2019 (CCF B)
Yuan Meng,Shenglin Zhang*, Zijie Ye, Benliang Wang, Zhi Wang, Yongqian Sun, Qitong Liu, Shuai Wang, Dan Pei. “Causal Analysis of the Unsatisfying Experience in Realtime Mobile Multiplayer Games in the Wild”. IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, July 2019 (CCF B)
Dapeng Liu, Youjian Zhao, Haowen Xu, Yongqian Sun, Dan Pei, Jiao Luo, Xiaowei Jing, Mei Feng, “Opprentice: Towards Practical and Automatic Anomaly Detection Through Machine Learning”, ACM The 2015 Internet Measurement Conference 2015 (IMC), Tokyo, Japan, Oct 2015. (CCF B 类会议)
Guo Chen, Dan Pei, Youjian Zhao and Yongqian Sun, “Designing Buffer Capacity of Crosspoint-Queued Switch”, The 11th IFIP International Conference on Network and Parallel Computing (NPC), Ilan, Taiwan, Sep 2014. (CCF C 类会议)