Deal with unidentical paired-end fastq headers 

Fastq headers sometimes contain the f/r pair identification. In my case, I have the identification at the end of the header as `.1` or `.2` for fastq1 and fastq2, respectively:

```
# F1
@A66666:555:HTK7JDSX2:4:1128:30879:19867.1
# F2
@A66666:555:HTK7JDSX2:4:1128:30879:19867.2
```
According to this article on [FASTQ format](https://en.wikipedia.org/wiki/FASTQ_format), illumina reads can also contain "/1" or "/2" for indicating the pair.

For my fastq files, the pipeline broke at the stage:

```bash
#### extract split reads
samtools view -h $sample.unique.bam \
| python3 $dir/extractSplitReads_BwaMem.py -i stdin \
| samtools view -Sb > $sample.unsort.splitters.bam
samtools sort -@ $sort_t -o $sample.splitters.bam $sample.unsort.splitters.bam
samtools index $sample.splitters.bam
samtools index $sample.unique.bam
```

No `.splitters.bam` was produced and therefore no `acc.csv` neither.

It worked when I removed the `.1` and `.2` from the end of the headers.

My suggestion is to add the recommendation that the paired files should have identical headers. Also, would be good if the pipeline could deal with such cases :).

Thanks for the cool tool!
Cheers,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with unidentical paired-end fastq headers #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deal with unidentical paired-end fastq headers #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions