Skip to content

deepomicslab/PALACE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

115 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PALACE

PALACE is a computational framework based on deep learning models and conjugate graph theory to assemble high-quality and confident phage genomes from metagenomic sequencing data. PALACE currently supports normal pair-end reads. The assembled phages genomes analyzed in the manuscript are available at zenodo.

image

Installation

Approach 1, install with mamba/conda.(Recommended)

conda create -n palace_env
conda activate palace_env
conda install -c delta2cityu -c conda-forge -c bioconda palace
or
#mamba is recommended
mamba create -n palace_env
mamba activate palace_env
mamba install -c delta2cityu -c conda-forge -c bioconda palace

Approach 2, from scratch

Prerequisites

Python packages

  • pysam==0.17.0
  • numpy==1.20.2
  • sklearn==1.1.1
  • biopython==1.78
  • pysam==0.17.0
  • matplotlib==3.4.2

Torch packages, (cpu or gpu)

Please check https://pytorch.org/get-started/previous-versions/ for installation

  • torch==1.7.1
  • torch-cluster==1.5.9
  • torch-geometric==1.7.0
  • torch-scatter==2.0.6
  • torch-sparse==0.6.9
  • torch-spline-conv==1.2.1
  • torch-summary==1.4.5
  • torchvision==0.8.0a0

Other packages

  • bwa BWA is a software package for reads mapping.
  • samtools Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format.
  • fastp Provide fast all-in-one preprocessing for FastQ files.
  • spades Pre-assembly
  • ncbi-blast Sequence alignment tool.
  • htslib

Install the prerequisites first, then clone the repository and enter the directory:

git clone https://github.com/deepomicslab/PALACE
#create a new mamba(conda) env
mamba create -n palace ## or conda create -n palace
mamba activate palace ## or conda activate palace
cd ./PALACE/
cd bin
chmod u+x ./*
cd ../share/palace/scripts/
python setup.py build_ext --inplace

Using PALACE

  1. Config the config.txt file, here is a demo file.
  • fastq1, Read1 paired fastq file.
  • fastq2, Read2 paired fastq file.
  • phagedb, Phage reference database; the phage reference database can be download from google driver.
  • protein_db, Phage protein database dir; the phage protein database file can be download from google driver
  • gcn_model, Deep Learning model for phage contigs predict; can be download from google driver
  • threads, Threads to be used.
  • out_dir, Output directory.
  • prefix, Intermediate file prefix, can be sample name.
  • ENV_PREFIX, Conda ENV path. can keep empty if conda ENV is activated.
  1. Runing PALACE.
  • palace --config config.txt

Output

  • 01-qc/, fastp output.
  • 02-assembly/, Raw assembly result with spades with --meta.
  • 03-search/, This directory contains three main intermediate files: hit_seqs.out contains contigs with phage protein. node_scores.out, the second column is the score predicted by deeplearning network. {prefix}_ref_names.txt, contains phage references identified by kmer alignment.
  • 04-match/, This directory contains the graph structure of the conjugate graph({prefix}_filtered_graph.txt), the results of the graph decompose({prefix}_all_result.txt).
  • 05-furth, This directory contains the local matching result based on the phage reference.
  • final_result, This directory contains the final result, final contig paths for phages({prefix}_final.txt) and phages fasta(```{prefix}_final.fasta)

Author

PALACE is developed by DeepOmics lab under the supervision of Dr. Li Shuaicheng, City University of Hong Kong, Hong Kong, China. Should you have any queries, please feel free to contact us by gzpan2-c@my.cityu.edu.hk or ruohawang2-c@my.cityu.edu.hk.

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

About

A tool to assemble high-quality phages from metagenomes.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors