Drafts
赵俊舟,李江龙,段涛,王平辉,陶敬. 基于语音频谱与数据包长对齐的 VoIP 加密流量识别方法. 2025-06. [draft]
[介绍]
随着智能手机等移动终端的迅速普及,以微信电话为代表的互联网语音(Voice over Internet Protocol, VoIP)应用日益流行。 VoIP 应用在开放的 Internet 中传递涉及用户隐私的语音内容,保障用户个人数据安全至关重要。本文采集并分析了包括微信、TIM、 腾讯会议、钉钉在内的四款流行 VoIP 应用在使用过程中产生的语音流量,发现尽管 VoIP 应用普遍采用私有语音编码算法、加密通 信等手段保障安全,但是 VoIP 加密流量的传输模式仍有可能泄露用户属性、用户身份,甚至通话内容等敏感信息,存在隐私泄露风 险。本文通过测量分析四种 VoIP 应用的加密流量传输模式与用户属性、通话内容等方面的关联关系,发现语音频率与数据包长存在 明显的相关性,并基于该发现设计了一种语音频谱与数据包长对齐的 VoIP 加密流量识别方法——VPrint。VPrint 较已有的加密流量 识别方法能更准确识别 VoIP 加密流量。以微信为例,VPrint 在用户性别识别、用户身份识别、通话语种识别和短语识别任务上的 F1 值分别达到 0.77、0.99、0.88 和 0.92。本文研究结果表明微信等流行 VoIP 应用存在安全隐患,并建议相关厂商采取数据包填 充等措施提升安全性,避免造成用户隐私泄露。
Publications
[Top Conferences in Computer Science]
[ICSE'25] Tao Duan, Runqing Chen, Pinghui Wang, Junzhou Zhao*, Jiongzhou Liu , Shujie Han , Yi Liu, and Fan Xu. "BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems". In Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE), 2025. [arXiv][slides]
[介绍]
云基础设施中的故障会严重影响云服务的稳定性和可用性,批量服务器宕机故障会导致所有上游服务完全不可用。批量服务器宕机 故障诊断问题旨在准确、及时地分析故障的根因,辅助故障排除。这是一个具有挑战性的任务:首先,云基础设施中收集的单模态粗粒 度故障监测数据不足以全面描述故障情况;其次,由于设备之间复杂的依赖关系,故障往往是多个故障累积的结果,故障之间的关联难 以确定。为了解决这些问题,本文提出 BSODiag,一个用于批量服务器宕机故障无监督且轻量级的诊断框架。BSODiag 提供了全局分析 视角,全面探究来自多源监控数据的故障信息,对故障的时空关联进行建模,并提供准确且可解释的诊断结果。在阿里巴巴云基础设施 上进行的实验表明,BSODiag 在 PR@3 上达到了 87.5%,在 PCR 上达到了 46.3%,分别比基线方法高出 10.2% 和 3.7%。
[ACL'24] Shuo Zhang, Liangming Pan, Junzhou Zhao*, and William Yang Wang. "The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models". Findings of ACL, 2024. [arXiv][slides]
[介绍]
大模型通常需要基于外部知识来生成真实可靠的答案。然而,即便外部知识库有正确的依据,大模型也可能忽略这些依据,转而依 赖错误的知识或自身偏见来胡编乱造,进而产生模型幻觉。由于用户大多不了解知识库的具体内容,当用户的问题与检索到的依据没有 直接关联时,就会产生模型幻觉。本研究提出了知识对齐问题并给出了 MixAlign 框架,该框架能与用户和知识库进行交互,获取并整 合有关用户问题与存储信息之间关系的澄清信息。MixAlign 利用语言模型实现自动知识对齐,并在必要时通过用户澄清进一步增强这 种对齐。实验结果表明,知识对齐在提升模型性能和减少模型幻觉方面起着关键作用,分别提高了 22.2% 和 27.1%。
[ICDE'24] Tao Duan, Junzhou Zhao*, Shuo Zhang, Jing Tao, and Pinghui Wang. "Representation Learning of Tangled Key-Value Sequence Data for Early Classification". In Proceedings of the 41st IEEE International Conference on Data Engineering (ICDE), 2024. [arXiv][slides][poster]
[介绍]
键值序列数据出现在各种现实应用中,从电子商务中的用户购物记录序列,到网络流量中的数据包序列。对这些键值序列进行分类 在许多场景中都很重要,例如用户画像和恶意流量识别。在许多时间敏感场景中,除了准确分类键值序列的要求外,还希望尽早对键值 序列进行分类,以便快速响应。然而,这两个目标本质上是相互冲突的。本研究提出一个新的纠缠键值序列快速分类问题,其中纠缠键 值序列是具有不同键的多个并发键值序列的混合。目标是对具有相同键的每个单独的键值序列进行准确且快速分类。为解决这一问题, 本文提出键值序列早期协同分类框架,该框架通过键相关性和值相关性来利用纠缠键值序列中项目之间的内部和相互关联,从而学习出 更好的序列表示。同时,一种时间感知的停止策略决定何时停止观察键值序列,并根据当前的序列表示对其进行分类。在真实世界和合 成数据集上的实验表明,本文的方法显著优于最先进的基线方法。在相同的预测提前率条件下,本文方法将预测准确率提高了 4.7% 至 17.5%,并将准确率和提前率的调和平均值提高了 3.7% 至 14.0%。
[TON'24] Junzhou Zhao, Pinghui Wang, Wei Zhang, Zhaosong Zhang, Maoli Liu, Jing Tao, and John C.S. Lui. "Tracking Influencers in Decaying Social Activity Streams with Theoretical Guarantees". IEEE/ACM Transactions on Networking (TON), 32(2):1461-1476, 2024.
[介绍]
社交网络中的影响力最大化问题是很多实际应用背后要解决的优化问题,例如病毒营销,政治竞选造势和网络监控。这个问题已经 被广泛研究,但大多数研究都假设影响力是静态的,而实际中用户的影响力会随时间变化,需要实时发现当前网络中最有影响力的 K 个节点,为此需要解决社交网络节点影响力实时跟踪问题。为了使最优解保持最新状态并能平滑地忘记过时数据,本文提出了一种概率 衰减数据流(PDSAS)模型,使流中的每一个数据点存在的概率随时间衰减。基于PDSAS模型,本文提出了一种流式子模函数在线优化求 解算法。该算法可以在线得到近似解并保证求解质量存在下界(1/2−ϵ);为进一步提高求解效率,本文对该方法进行改进,并提出一 种求解质量下界为(1/4−ϵ)的高效在线优化算法。实验表明,本文方法可以找到高质量的解且计算成本比基线低得多。
- [AAAI'23] Shuo Zhang, Junzhou Zhao*, Pinghui Wang, Tianxiang Wang, Zi Liang, Jing Tao, Yi Huang, and Junlan Feng. "Multi-Action Dialog Policy Learning from Logged User Feedback". In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI), 2023. [arXiv]
- [SIGMOD'23] Pinghui Wang, Chengjin Yang, Dongdong Xie, Junzhou Zhao, Hui Li, Jing Tao, and Xiaohong Guan. "An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation". In Proceedings of the ACM SIGMOD/PODS International Conference on Data Management (SIGMOD), 2023. [slides]
- [IJCAI'22] Shuo Zhang, Junzhou Zhao*, Pinghui Wang, Yu Li, Yi Huang, and Junlan Feng. "Think Before You Speak: Improving Multi-action Dialog Policy by Planning Single-Action Dialogs". In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022. [arXiv][slides]
- [KDD'21] Junzhou Zhao, Pinghui Wang, Chao Deng, and Jing Tao. "Temporal Biased Streaming Submodular Optimization". In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (ACM SIGKDD), 2021. [slides]
- [AAAI'21] Shuo Zhang, Junzhou Zhao*, Pinghui Wang, Nuo Xu, Yang Yang, Yiting Liu, Yi Huang, and Junlan Feng. "Learning to Check Contract Inconsistencies". In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), 2021. [arXiv][poster]
- [KAIS'21] Junzhou Zhao, Pinghui Wang, Zhouguo Chen, Jianwei Ding, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Tracking Triadic Cardinality Distributions for Burst Detection in High-Speed Graph Streams". Knowledge and Information Systems (KAIS), 63:939-969, 2021. [arXiv]
- [ICDE'20] Junzhou Zhao, Pinghui Wang, Jing Tao, Shuo Zhang, and John C.S. Lui. "Continuously Tracking Core Items in Data Streams with Probabilistic Decays". In Proceedings of the 36th IEEE International Conference on Data Engineering (IEEE ICDE), 2020. [slides][poster]
- [INS'20] Lin Lan, Pinghui Wang, Junzhou Zhao, Jing Tao, John C.S. Lui, and Xiaohong Guan. "Improving Network Embedding with Partially Available Vertex and Edge Content". Information Sciences, 2020.
- [TOIS'20] Xiaoying Zhang, Hong Xie, Junzhou Zhao, and John C.S. Lui. "Understanding Assimilation-Contrast Effects in Online Rating Systems: Modeling, Debiasing and Applications". ACM Transactions on Information Systems (TOIS), 2020.
- [AAAI'19] Junzhou Zhao, S. Shang, Pinghui Wang, John C.S. Lui, and Xiangliang Zhang. "Submodular Optimization over Streams with Inhomogeneous Decays". In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI), 2019. [arXiv]
- [ICDE'19] Junzhou Zhao, S. Shang, Pinghui Wang, John C.S. Lui, and Xiangliang Zhang. "Tracking Influential Nodes in Time-Decaying Dynamic Interaction Networks". In Proceedings of the 35th IEEE International Conference on Data Engineering (IEEE ICDE), 2019. [arXiv][poster]
- [INS'19] Junzhou Zhao, Pinghui Wang, and John C.S. Lui. "Optimizing Node Discovery on Networks: Problem Definitions, Fast Algorithms, and Observations". Information Sciences (INS), 477:161-185, 2019. [arXiv]
- [DMKD'19] Junzhou Zhao, Pinghui Wang, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Sampling Online Social Networks by Random Walk with Indirect Jumps". Data Mining and Knowledge Discovery (DMKD), 33:24-57, 2019.
- [TKDE'18] Pinghui Wang, Junzhou Zhao, Xiangliang Zhang, Zhenhua Li, Jiefeng Cheng, John C.S. Lui, Don Towsley, Jing Tao, and Xiaohong Guan. "MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs". IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018. [poster]
- [KAIS'18] Pinghui Wang, Junzhou Zhao, Xiangliang Zhang, Jing Tao, and Xiaohong Guan. "SNOD: A Fast Sampling Method of Exploring Node Orbit Degrees for Large Graphs". Knowledge and Information Systems (KAIS), 2018.
- [KAIS'18] Pinghui Wang, Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Fast Crawling Methods of Exploring Content Distributed Over Large Graphs". Knowledge and Information Systems (KAIS), 2018.
- [KAIS'18] Pinghui Wang, Junzhou Zhao, Bruno Ribeiro, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Practical Characterization of Large Networks Using Neighborhood Information". Knowledge and Information Systems (KAIS), 2018.
- [INS'17] Junzhou Zhao, John C.S. Lui, Don Towsley, Pinghui Wang, and Xiaohong Guan. "I/O-Efficient Calculation of Group Closeness Centrality over Disk-Resident Graphs". Information Sciences (INS), 2017.
- [RecSys'17] Xiaoying Zhang, Junzhou Zhao, and John C.S. Lui. "Modeling the Assimilation-Contrast Effects in Online
Product Rating Systems: Debiasing and Recommendations". In Proceedings of the 11th ACM Conference on Recommendation
Systems (RecSys), 2017.
Awarded Best Paper
- [COSN'15] Junzhou Zhao, John C.S. Lui, Don Towsley, Pinghui Wang, and Xiaohong Guan. "Tracking Triadic Cardinality Distributions for Burst Detection in Social Activity Streams". In Proceedings of ACM Conference on Online Social Networks (COSN), 2015. [arXiv]
- [ICDE'15] Junzhou Zhao, John C.S. Lui, Don Towsley, Pinghui Wang, and Xiaohong Guan. "A Tale of Three Graphs: Sampling Design on Hybrid Social-Affiliation Networks". In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE), 2015.
- [SIMPLEX'14] Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Measuring and Maximizing Group
Closeness Centrality over Disk-Resident Graphs". In WWW SIMPLEX workshop, 2014.
Awarded Best Paper
- [COMNET'14] Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "WTF: Efficient Followee Selection for Cascading Outbreak Detection on Online Social Networks". Computer Networks, Special Issue on Online Social Networks, 2014.
- [TKDD'14] Pinghui Wang, Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Unbiased Characterization of Node Pairs over Large Graphs". ACM Transactions on Knowledge Discovery from Data (TKDD), 2014.
- [CrowdRec'13] Junzhou Zhao, Xiaohong Guan, and Jing Tao. "On Analyzing Estimation Errors due to Constrained Connections in Online Review Systems". RecSys CrowdRec workshop, 2013. [arXiv]
- [ICDE'13] Pinghui Wang, Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Sampling Node Pairs Over Large Graphs". In Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE), 2013.
- [NetSciCom'11] Junzhou Zhao, John C.S. Lui, Don Towsley, and Xiaohong Guan. "Empirical Analysis of the Evolution of Follower Network: A Case Study on Douban". IEEE INFOCOM NetSciCom workshop, 2011.