Treffer: HPC-T-Assembly: a pipeline for de novo transcriptome assembly of large multi-specie datasets.

Title:
HPC-T-Assembly: a pipeline for de novo transcriptome assembly of large multi-specie datasets.
Source:
BMC Bioinformatics; 4/28/2025, Vol. 26 Issue 1, p1-14, 14p
Database:
Complementary Index

Weitere Informationen

Background: Recent years have seen a substantial increase in RNA-seq data production, with this technique becoming the primary approach for gene expression studies across a wide range of non-model organisms. The majority of these organisms lack a well-annotated reference genome to serve as a basis for studying differentially expressed genes (DEGs). As an alternative cost-effective protocol to using a reference genome, the assembly of RNA-seq raw reads is performed to produce what is referred to as a 'de novo transcriptome,' serving as a reference for subsequent DEGs' analysis. This assembly step for conventional DEGs analysis pipelines for non-model organisms is a computationally expensive task. Furthermore, the complexity of the de novo transcriptome assembly workflows poses a challenge for researchers in implementing best-practice techniques and the most recent software versions, particularly when applied to various organisms of interest. Results: To address computational challenges in transcriptomic analyses of non-model organisms, we present HPC-T-Assembly, a tool for de novo transcriptome assembly from RNA-seq data on high-performance computing (HPC) infrastructures. It is designed for straightforward setup via a Web-oriented interface, allowing analysis configuration for several species. Once configuration data is provided, the entire parallel computing software for assembly is automatically generated and can be launched on a supercomputer with a simple command line. Intermediate and final outputs of the assembly pipeline include additional post-processing steps, such as assembly quality control, ORF prediction, and transcript count matrix construction. Conclusion: HPC-T-Assembly allows users, through a user-friendly Web-oriented interface, to configure a run for simultaneous assemblies of RNA-seq data from multiple species. The parallel pipeline, launched on HPC infrastructures, significantly reduces computational load and execution times, enabling large-scale transcriptomic and meta-transcriptomics analysis projects. [ABSTRACT FROM AUTHOR]

Copyright of BMC Bioinformatics is the property of BioMed Central and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)