[Top Conferences in Computer Science]

Drafts

  • 赵俊舟,李江龙,段涛,王平辉,陶敬. 基于语音频谱与数据包长对齐的 VoIP 加密流量识别方法. 2025-06. 网络流量 [draft]

    [介绍]

    随着智能手机等移动终端的迅速普及,以微信电话为代表的互联网语音(Voice over Internet Protocol, VoIP)应用日益流行。 VoIP 应用在开放的 Internet 中传递涉及用户隐私的语音内容,保障用户个人数据安全至关重要。本文采集并分析了包括微信、TIM、 腾讯会议、钉钉在内的四款流行 VoIP 应用在使用过程中产生的语音流量,发现尽管 VoIP 应用普遍采用私有语音编码算法、加密通 信等手段保障安全,但是 VoIP 加密流量的传输模式仍有可能泄露用户属性、用户身份,甚至通话内容等敏感信息,存在隐私泄露风 险。本文通过测量分析四种 VoIP 应用的加密流量传输模式与用户属性、通话内容等方面的关联关系,发现语音频率与数据包长存在 明显的相关性,并基于该发现设计了一种语音频谱与数据包长对齐的 VoIP 加密流量识别方法——VPrint。VPrint 较已有的加密流量 识别方法能更准确识别 VoIP 加密流量。以微信为例,VPrint 在用户性别识别、用户身份识别、通话语种识别和短语识别任务上的 F1 值分别达到 0.77、0.99、0.88 和 0.92。本文研究结果表明微信等流行 VoIP 应用存在安全隐患,并建议相关厂商采取数据包填 充等措施提升安全性,避免造成用户隐私泄露。

    voip.png

Publications

[Full List]

  • [KDD'25] Chenxu Wang, Jinfeng Chen, Junzhou Zhao, and Pinghui Wang. "Task Negative Sampling Enhanced Graph Few-shot Learning". In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (ACM SIGKDD), 2025. 图学习

    [介绍]

    图少样本节点分类(GFSNC)已成为解决图结构网络中有限标注数据学习挑战的一种有前景的方法。尽管图神经网络在节点分类任务中 取得了成功,但其性能严重依赖于大量标注数据的可用性,这在实际场景中往往难以实现。为了解决这一问题,GFSNC 采用了元学习的 阶段性范式,即模型在一系列元任务上进行训练。然而,现有的方法面临两个关键限制:(i)它们专注于单个元任务内的局部分布, 忽略了全局数据分布;(ii)它们优化模型以最小化类内距离,而没有充分解决类间可分性问题,导致性能欠佳。本文提出了 TaskNS, 这是一种新颖的 GFSNC 框架,通过在元训练任务中引入任务负样本来解决这些限制。通过纳入当前元任务之外类别的样本,我们的框 架使模型能够逐渐学习图数据的全局分布。此外,我们设计了一种新颖的损失函数,以增强模型区分不同类别查询样本的能力。该损失 函数不仅确保了类内紧凑性高,还通过利用任务负样本最大化了类间分离。为了进一步提高任务负样本的质量,我们提出了一种基于 h 跳邻居的采样方法,该方法利用了图的拓扑结构。它选择与查询样本结构上接近的任务负样本,确保它们对模型具有信息量且具有挑战 性。在四个基准数据集上进行的大量实验表明,TaskNS 有效,与最先进的方法相比,平均准确率提高了4.6%,F1 分数提高了 4.9%。

    graph_few_shot.png

  • [ICSE'25] Tao Duan, Runqing Chen, Pinghui Wang, Junzhou Zhao*, Jiongzhou Liu , Shujie Han , Yi Liu, and Fan Xu. "BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems". In Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE), 2025. 智能运维 [arXiv][slides]

    [介绍]

    云基础设施中的故障会严重影响云服务的稳定性和可用性,批量服务器宕机故障会导致所有上游服务完全不可用。批量服务器宕机 故障诊断问题旨在准确、及时地分析故障的根因,辅助故障排除。这是一个具有挑战性的任务:首先,云基础设施中收集的单模态粗粒 度故障监测数据不足以全面描述故障情况;其次,由于设备之间复杂的依赖关系,故障往往是多个故障累积的结果,故障之间的关联难 以确定。为了解决这些问题,本文提出 BSODiag,一个用于批量服务器宕机故障无监督且轻量级的诊断框架。BSODiag 提供了全局分析 视角,全面探究来自多源监控数据的故障信息,对故障的时空关联进行建模,并提供准确且可解释的诊断结果。在阿里巴巴云基础设施 上进行的实验表明,BSODiag 在 PR@3 上达到了 87.5%,在 PCR 上达到了 46.3%,分别比基线方法高出 10.2% 和 3.7%。

    BSODiag.png

  • [ACL'24] Shuo Zhang, Liangming Pan, Junzhou Zhao*, and William Yang Wang. "The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models". Findings of ACL, 2024. 大模型 模型幻觉 [arXiv][slides]

    [介绍]

    大模型通常需要基于外部知识来生成真实可靠的答案。然而,即便外部知识库有正确的依据,大模型也可能忽略这些依据,转而依 赖错误的知识或自身偏见来胡编乱造,进而产生模型幻觉。由于用户大多不了解知识库的具体内容,当用户的问题与检索到的依据没有 直接关联时,就会产生模型幻觉。本研究提出了知识对齐问题并给出了 MixAlign 框架,该框架能与用户和知识库进行交互,获取并整 合有关用户问题与存储信息之间关系的澄清信息。MixAlign 利用语言模型实现自动知识对齐,并在必要时通过用户澄清进一步增强这 种对齐。实验结果表明,知识对齐在提升模型性能和减少模型幻觉方面起着关键作用,分别提高了 22.2% 和 27.1%。

    MixAlign.png

  • [ICDE'24] Tao Duan, Junzhou Zhao*, Shuo Zhang, Jing Tao, and Pinghui Wang. "Representation Learning of Tangled Key-Value Sequence Data for Early Classification". In Proceedings of the 41st IEEE International Conference on Data Engineering (ICDE), 2024. 序列数据 网络流量 [arXiv][slides][poster]

    [介绍]

    键值序列数据出现在各种现实应用中,从电子商务中的用户购物记录序列,到网络流量中的数据包序列。对这些键值序列进行分类 在许多场景中都很重要,例如用户画像和恶意流量识别。在许多时间敏感场景中,除了准确分类键值序列的要求外,还希望尽早对键值 序列进行分类,以便快速响应。然而,这两个目标本质上是相互冲突的。本研究提出一个新的纠缠键值序列快速分类问题,其中纠缠键 值序列是具有不同键的多个并发键值序列的混合。目标是对具有相同键的每个单独的键值序列进行准确且快速分类。为解决这一问题, 本文提出键值序列早期协同分类框架,该框架通过键相关性和值相关性来利用纠缠键值序列中项目之间的内部和相互关联,从而学习出 更好的序列表示。同时,一种时间感知的停止策略决定何时停止观察键值序列,并根据当前的序列表示对其进行分类。在真实世界和合 成数据集上的实验表明,本文的方法显著优于最先进的基线方法。在相同的预测提前率条件下,本文方法将预测准确率提高了 4.7% 至 17.5%,并将准确率和提前率的调和平均值提高了 3.7% 至 14.0%。

    KVEC.png

  • [TON'24] Junzhou Zhao, Pinghui Wang, Wei Zhang, Zhaosong Zhang, Maoli Liu, Jing Tao, and John C.S. Lui. "Tracking Influencers in Decaying Social Activity Streams with Theoretical Guarantees". IEEE/ACM Transactions on Networking (TON), 32(2):1461-1476, 2024. 社交网络 网络舆情

    [介绍]

    社交网络中的影响力最大化问题是很多实际应用背后要解决的优化问题,例如病毒营销,政治竞选造势和网络监控。这个问题已经 被广泛研究,但大多数研究都假设影响力是静态的,而实际中用户的影响力会随时间变化,需要实时发现当前网络中最有影响力的 K 个节点,为此需要解决社交网络节点影响力实时跟踪问题。为了使最优解保持最新状态并能平滑地忘记过时数据,本文提出了一种概率 衰减数据流(PDSAS)模型,使流中的每一个数据点存在的概率随时间衰减。基于PDSAS模型,本文提出了一种流式子模函数在线优化求 解算法。该算法可以在线得到近似解并保证求解质量存在下界(1/2−ϵ);为进一步提高求解效率,本文对该方法进行改进,并提出一 种求解质量下界为(1/4−ϵ)的高效在线优化算法。实验表明,本文方法可以找到高质量的解且计算成本比基线低得多。

    influence.png

  • [AAAI'23] Shuo Zhang, Junzhou Zhao*, Pinghui Wang, Tianxiang Wang, Zi Liang, Jing Tao, Yi Huang, and Junlan Feng. "Multi-Action Dialog Policy Learning from Logged User Feedback". In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI), 2023. 对话系统 NLP [arXiv]
  • [SIGMOD'23] Pinghui Wang, Chengjin Yang, Dongdong Xie, Junzhou Zhao, Hui Li, Jing Tao, and Xiaohong Guan. "An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation". In Proceedings of the ACM SIGMOD/PODS International Conference on Data Management (SIGMOD), 2023. 隐私计算 MPC sketch [slides]
  • [IJCAI'22] Shuo Zhang, Junzhou Zhao*, Pinghui Wang, Yu Li, Yi Huang, and Junlan Feng. "Think Before You Speak: Improving Multi-action Dialog Policy by Planning Single-Action Dialogs". In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022. NLP 对话系统 [arXiv][slides]
  • [KDD'21] Junzhou Zhao, Pinghui Wang, Chao Deng, and Jing Tao. "Temporal Biased Streaming Submodular Optimization". In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (ACM SIGKDD), 2021. 数据流 在线优化 [slides]
  • [AAAI'21] Shuo Zhang, Junzhou Zhao*, Pinghui Wang, Nuo Xu, Yang Yang, Yiting Liu, Yi Huang, and Junlan Feng. "Learning to Check Contract Inconsistencies". In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), 2021. NLP [arXiv][poster]
  • [KAIS'21] Junzhou Zhao, Pinghui Wang, Zhouguo Chen, Jianwei Ding, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Tracking Triadic Cardinality Distributions for Burst Detection in High-Speed Graph Streams". Knowledge and Information Systems (KAIS), 63:939-969, 2021. 图数据流 采样估计 [arXiv]
  • [ICDE'20] Junzhou Zhao, Pinghui Wang, Jing Tao, Shuo Zhang, and John C.S. Lui. "Continuously Tracking Core Items in Data Streams with Probabilistic Decays". In Proceedings of the 36th IEEE International Conference on Data Engineering (IEEE ICDE), 2020. 数据流 在线优化 [slides][poster]
  • [INS'20] Lin Lan, Pinghui Wang, Junzhou Zhao, Jing Tao, John C.S. Lui, and Xiaohong Guan. "Improving Network Embedding with Partially Available Vertex and Edge Content". Information Sciences, 2020. 图学习
  • [TOIS'20] Xiaoying Zhang, Hong Xie, Junzhou Zhao, and John C.S. Lui. "Understanding Assimilation-Contrast Effects in Online Rating Systems: Modeling, Debiasing and Applications". ACM Transactions on Information Systems (TOIS), 2020. 推荐系统
  • [AAAI'19] Junzhou Zhao, S. Shang, Pinghui Wang, John C.S. Lui, and Xiangliang Zhang. "Submodular Optimization over Streams with Inhomogeneous Decays". In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI), 2019. 数据流 在线优化 [arXiv]
  • [ICDE'19] Junzhou Zhao, S. Shang, Pinghui Wang, John C.S. Lui, and Xiangliang Zhang. "Tracking Influential Nodes in Time-Decaying Dynamic Interaction Networks". In Proceedings of the 35th IEEE International Conference on Data Engineering (IEEE ICDE), 2019. 图数据流 网络舆情 [arXiv][poster]
  • [INS'19] Junzhou Zhao, Pinghui Wang, and John C.S. Lui. "Optimizing Node Discovery on Networks: Problem Definitions, Fast Algorithms, and Observations". Information Sciences (INS), 477:161-185, 2019. 图数据 网络舆情 [arXiv]
  • [DMKD'19] Junzhou Zhao, Pinghui Wang, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Sampling Online Social Networks by Random Walk with Indirect Jumps". Data Mining and Knowledge Discovery (DMKD), 33:24-57, 2019. 图数据 采样估计
  • [TKDE'18] Pinghui Wang, Junzhou Zhao, Xiangliang Zhang, Zhenhua Li, Jiefeng Cheng, John C.S. Lui, Don Towsley, Jing Tao, and Xiaohong Guan. "MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs". IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018. 图数据 采样估计 [poster]
  • [KAIS'18] Pinghui Wang, Junzhou Zhao, Xiangliang Zhang, Jing Tao, and Xiaohong Guan. "SNOD: A Fast Sampling Method of Exploring Node Orbit Degrees for Large Graphs". Knowledge and Information Systems (KAIS), 2018. 图数据 采样估计
  • [KAIS'18] Pinghui Wang, Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Fast Crawling Methods of Exploring Content Distributed Over Large Graphs". Knowledge and Information Systems (KAIS), 2018. 图数据 采样估计
  • [KAIS'18] Pinghui Wang, Junzhou Zhao, Bruno Ribeiro, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Practical Characterization of Large Networks Using Neighborhood Information". Knowledge and Information Systems (KAIS), 2018. 图数据 采样估计
  • [INS'17] Junzhou Zhao, John C.S. Lui, Don Towsley, Pinghui Wang, and Xiaohong Guan. "I/O-Efficient Calculation of Group Closeness Centrality over Disk-Resident Graphs". Information Sciences (INS), 2017. 图数据 优化方法
  • [RecSys'17] Xiaoying Zhang, Junzhou Zhao, and John C.S. Lui. "Modeling the Assimilation-Contrast Effects in Online Product Rating Systems: Debiasing and Recommendations". In Proceedings of the 11th ACM Conference on Recommendation Systems (RecSys), 2017. 推荐系统 Awarded Best Paper
  • [COSN'15] Junzhou Zhao, John C.S. Lui, Don Towsley, Pinghui Wang, and Xiaohong Guan. "Tracking Triadic Cardinality Distributions for Burst Detection in Social Activity Streams". In Proceedings of ACM Conference on Online Social Networks (COSN), 2015. 图数据 异常检测 采样估计 [arXiv]
  • [ICDE'15] Junzhou Zhao, John C.S. Lui, Don Towsley, Pinghui Wang, and Xiaohong Guan. "A Tale of Three Graphs: Sampling Design on Hybrid Social-Affiliation Networks". In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE), 2015. 图数据 采样估计
  • [SIMPLEX'14] Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Measuring and Maximizing Group Closeness Centrality over Disk-Resident Graphs". In WWW SIMPLEX workshop, 2014. 图数据 优化方法 Awarded Best Paper
  • [COMNET'14] Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "WTF: Efficient Followee Selection for Cascading Outbreak Detection on Online Social Networks". Computer Networks, Special Issue on Online Social Networks, 2014. 图数据 异常检测
  • [TKDD'14] Pinghui Wang, Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Unbiased Characterization of Node Pairs over Large Graphs". ACM Transactions on Knowledge Discovery from Data (TKDD), 2014. 图数据 采样估计
  • [CrowdRec'13] Junzhou Zhao, Xiaohong Guan, and Jing Tao. "On Analyzing Estimation Errors due to Constrained Connections in Online Review Systems". RecSys CrowdRec workshop, 2013. 推荐系统 估计方法 [arXiv]
  • [ICDE'13] Pinghui Wang, Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Sampling Node Pairs Over Large Graphs". In Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE), 2013. 图数据 采样估计
  • [NetSciCom'11] Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Empirical Analysis of the Evolution of Follower Network: A Case Study on Douban". IEEE INFOCOM NetSciCom workshop, 2011. 社交网络 网络测量