Cold Spring Harbor Laboratory | Information Technology

Using nudup on Elzar

Creation date: 5/11/2023 2:53 PM Updated: 6/2/2023 11:16 AM elzar hpc nudup software

nudup.py is a Python script that marks/removes PCR introduced duplicated molecules based on the molecular tagging technology used in Tecan products. It can be used for both single-end and paired-end reads. It requires a SAM/BAM file and a FASTQ file as inputs.

nudup.py can be obtained here

> git clone https://github.com/tecangenomics/nudup.git

It is clear from the README that nudup.py has only been tested on legacy software versions.

For expected behavior, I recommend configuring your environment to match the stated prerequisites.

> module load EBModules
> module load SAMtools/1.14-GCC-10.3.0
> module load Python/2.7.16-GCCcore-8.3.0

> which python
/grid/it/data/elzar/easybuild/software/Python/2.7.16-GCCcore-8.3.0/bin/python

Try running nudup.py without any options to ensure there are no errors:

> python nudup.py
usage: nudup.py [-2] [-f INDEX.fq|READ.fq] [-o OUT_PREFIX] [-s START]
                [-l LENGTH] [-T TEMP_DIR] [--old-samtools] [--rmdup-only] [-v]
                [-h]
                IN.sam|IN.bam

nudup.py: error: too few arguments

nudup.py works with aligned and sorted BAM or SAM files from next-generation sequencing (NGS).

As a test, I cloned this repository which contained some test BAM files.

> python nudup.py -o test_bam_dedup ../rnaseq/test/hisat2_k20.bam
2023-05-11 14:52:36,599 [     INFO] - Deduplicating NuGEN single end reads...
2023-05-11 14:52:36,654 [     INFO] - Processing sorted SAM/BAM with molecular tag sequence in read name (assumes sorted)
2023-05-11 14:52:36,929 [     INFO] -            Aligned count:         24290
2023-05-11 14:52:36,929 [     INFO] -          Unaligned count:             0
2023-05-11 14:52:36,929 [     INFO] - Molecular tag dups count:             0 (0.0000 rate)
2023-05-11 14:52:36,929 [     INFO] - Deduplication success.
2023-05-11 14:52:36,929 [     INFO] - Created output file test_bam_dedup.sorted.markdup.bam with duplicates marked
2023-05-11 14:52:36,929 [     INFO] - Created output file test_bam_dedup.sorted.dedup.bam with duplicates removed

Robert Petkus 5/11/23

Using nudup on Elzar

On this page