Fastq headers sometimes contain the f/r pair identification. In my case, I have the identification at the end of the header as .1 or .2 for fastq1 and fastq2, respectively:
# F1
@A66666:555:HTK7JDSX2:4:1128:30879:19867.1
# F2
@A66666:555:HTK7JDSX2:4:1128:30879:19867.2
According to this article on FASTQ format, illumina reads can also contain "/1" or "/2" for indicating the pair.
For my fastq files, the pipeline broke at the stage:
#### extract split reads
samtools view -h $sample.unique.bam \
| python3 $dir/extractSplitReads_BwaMem.py -i stdin \
| samtools view -Sb > $sample.unsort.splitters.bam
samtools sort -@ $sort_t -o $sample.splitters.bam $sample.unsort.splitters.bam
samtools index $sample.splitters.bam
samtools index $sample.unique.bam
No .splitters.bam was produced and therefore no acc.csv neither.
It worked when I removed the .1 and .2 from the end of the headers.
My suggestion is to add the recommendation that the paired files should have identical headers. Also, would be good if the pipeline could deal with such cases :).
Thanks for the cool tool!
Cheers,
Fastq headers sometimes contain the f/r pair identification. In my case, I have the identification at the end of the header as
.1or.2for fastq1 and fastq2, respectively:According to this article on FASTQ format, illumina reads can also contain "/1" or "/2" for indicating the pair.
For my fastq files, the pipeline broke at the stage:
No
.splitters.bamwas produced and therefore noacc.csvneither.It worked when I removed the
.1and.2from the end of the headers.My suggestion is to add the recommendation that the paired files should have identical headers. Also, would be good if the pipeline could deal with such cases :).
Thanks for the cool tool!
Cheers,