Treffer: Fast T-overlap query algorithms using graphics processor units and its applications in web data query.
Weitere Informationen
Given a collection of sets and a query set, a T-Overlap query identifies all sets having at least T common elements with the query. T-Overlap query is the foundation of set similarity query and join and plays an important role on web data query and processing, such as the behavior analysis of web users and the near duplicated detection of web documents. To address T-Overlap query efficiently, unlike traditional algorithms based on CPU, we aim at designing efficient GPU based algorithms. We firstly design inverted index in GPU, then choose ScanCount, a straightforward but efficient T-Overlap algorithm, as underlying algorithm to develop our GPU based T-Overlap algorithms. Depending on queries processed serially or in parallel, three new efficient algorithms are proposed based on our GPU based inverted index. Among all these three algorithms, GS-Parallel-Group processes a group of queries in parallel and supports a high degree of parallelism. Extensive experiments are carried out to compare our GPU based algorithms with other state-of-the-art CPU based algorithms. Results show that GS-Parallel-Group outperforms CPU based algorithms significantly. [ABSTRACT FROM AUTHOR]
Copyright of World Wide Web is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)