Refer to the latest documentation
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for accessing data in the Sequencing Read Archive (SRA) format.
fastq-dump and fasterq-dump are both used to extract data in the FASTQ or FASTA format from SRA-accessions.
fasterq-dump is the successor to fastq-dump. It is (you guessed it) faster and uses temporary files and multi-threading to accelerate file extraction.
module load EB5
module load SRA-Toolkit
Quick test:
fastq-dump --stdout -X 2 SRR390728
Read 2 spots for SRR390728
Written 2 spots for SRR390728
@SRR390728.1 1 length=72
CATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCGATGGAGAATGACTTTGACAAGCTGAGAGAAGNTNC
+SRR390728.1 1 length=72
;;;;;;;;;;;;;;;;;;;;;;;;;;;9;;665142;;;;;;;;;;;;;;;;;;;;;;;;;;;;;96&&&&(
@SRR390728.2 2 length=72
AAGTAGGTCTCGTCTGTGTTTTCTACGAGCTTGTGTTCCAGCTGACCCACTCCCTGGGTGGGGGGACTGGGT
+SRR390728.2 2 length=72
;;;;;;;;;;;;;;;;;4;;;;3;393.1+4&&5&&;;;;;;;;;;;;;;;;;;;;;<9;<;;;;;464262
Taken from these instructions.
Create an empty directory that will be used for your local repository - this example uses ~/sra-cache:
mkdir sra-cache
Configure location of user-repository (CACHE tab):
#!/bin/bash
#SBATCH --job-name=sra
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=12:00:00
#SBATCH --output=sra-slurm-%j.out
module load EB5
module load SRA-Toolkit
prefetch SRR390728
fasterq-dump SRR390728 --threads $SLURM_CPUS_PER_TASK --temp $TMPDIR
Verify your configured cache and repository paths:
vdb-config -o n
Large datasets can exceed your quota. Some accessions (e.g., PacBIO/ONT) can be multiple TB. Ensure you have enough space! Look for errors such as storage exhausted while writing file within file system module.
fasterq-dump uses temporary space. By default it writes temp files to $TMPDIR or /tmp, which is shared and not especially large. Point to a path with plenty of space:
fasterq-dump SRRXXXX --temp /path/to/big/scratch
For best performance, use prefetch to download the accession before using fasterq-dump (prefetch → local.sra → fasterq-dump). Prefetch tools will download all necessary files and can resume interrupted downloads.
Download SRA object into your configured repository
prefetch SRR390728
# Convert to FASTQ, writing output to the current directory
fasterq-dump SRR390728 --threads 8 --temp /path/to/scratch
Cleaning up: after converting to FASTQ, you may remove the cached .sra files:
rm -rf ~/sra-cache/sra/*.cache
When changing the local repository, ensure to re-run vdb-config -i.
Use vdb-validate to test the integrity of downloaded data.