SRA-Toolkit and fasterq-dump on the HPC cluster

Creation date: 9/1/2023 4:57 PM    Updated: 9/1/2023 4:57 PM   elzar hpc software
The SRA-Toolkit is a set of tools and libraries for accessing data in the Sequencing Read Archive (SRA) format.  
fastq-dump and fasterq-dump are both used to extract data in the FASTQ or FASTA format from SRA-accessions.

fasterq-dump is the successor to fastq-dump.  It is faster (surprise) and uses temporary files and multi-threading to speed up the extractions of files.

For best performance, it is recommended to use prefetch to download the accession before using fasterq-dump.  The prefetch tools downloads all necessary files and can resume interrupted downloads.

Use on CSHL HPC 

From either bamdev1 or bamdev2:


Quick test:


Setup before use

1. Create an empty directory (named whatever) that will be used for your local repository.
2. Follow the instructions here


Download data


Tips

👉 Some of the data sets are large; ensure you have enough space
👉 When changing the local repository, ensure to re-run `vdb-config`
👉 Use `vdb-validate` to test downloaded data for integrity

Robert Petkus, 9/1/23

Files