*Result*: 一种高效的分布式 FDR 假阳性控制算法.

Title:
一种高效的分布式 FDR 假阳性控制算法.
Alternate Title:
An Efficient Distributed False Positive Control Algorithm for FDR.
Authors:
刘旭泽1, 王慧颖2, 褚良宇3, 赵宇海1 zhaoyuhai@mail.neu.edu.cn
Source:
Journal of Northeastern University (Natural Science). May2025, Vol. 46 Issue 5, p37-45. 9p.
Database:
Academic Search Index

*Further Information*

*To address the issue of false positives caused by multiple hypothesis testing in big data mining, as well as the extremely time-consuming nature of calculating theoretical results for controlling the false discovery rate (FDR). Aiming at the computational efficiency of theoretical FDR values, a distributed false-positive control algorithm based on DPFDR (distributed permutation testing-based false discovery rate) is proposed. The algorithm firstly mining the representative patterns based on the conditional frequent pattern tree (CFP) method, and using the representative patterns to compress the pattern space. Then, the workload of the corresponding task is estimated according to the representative mode, the data is divided according to the workload, and the task is allocated to each compute node through the load balancing policy. Finally, the effective FDR false-positive control threshold is obtained by merging and sorting the calculation results of each node. A series of experimental results on real data sets show that the proposed DPFDR algorithm can greatly improve the computational efficiency of FDR false positive control threshold. [ABSTRACT FROM AUTHOR]*

*为了解决大数据挖掘中多重假设检验导致的假阳性问题, 以及控制伪发现率 (false discovery rate, FDR) 理论结果计算过程极其耗时的问题, 针对理论 FDR 值的计算效率问题, 提出了一种分布式假阳性 控制算法 DPFDR (distributed permutation testing-based false discovery rat, DPFDR). 该算法首先基于条件频繁 模式树 (conditional frequent pattern tree, CFP) 方法进行代表模式挖掘, 利用代表模式对模式空间进行压缩. 然后, 根据代表模式对相应任务的工作量进行预估, 按照工作量进行数据划分, 并通过负载均衡策略将任务 分配到各计算结点上. 最后, 通过合并、排序各结点的计算结果, 获得有效的 FDR 假阳性控制阈值. 真实数据 集上的一系列实验结果表明, 提出的 DPFDR 算法能极大提升 FDR 假阳性控制阈值的计算效率. [ABSTRACT FROM AUTHOR]*