GECCO¶
Biosynthetic Gene Cluster prediction with Conditional Random Fields.
Overview¶
GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
GECCO is developed in the Zeller group and is part of the suite of computational microbiome analysis tools hosted at EMBL.
Quickstart¶
Setup¶
GECCO is implemented in Python, and supports
all versions from Python 3.6. Install
GECCO with pip
:
$ pip install gecco-tool
Or with Conda, using the bioconda
channel:
$ conda install -c bioconda gecco
Predictions¶
GECCO works with DNA sequences, and loads them using Biopython, allowing it to support a large variety of formats, including the common FASTA and GenBank files.
Run a prediction on a FASTA file named sequence.fna
and output the
predictions to the current directory:
$ gecco -v run --genome sequence.fna
Output¶
GECCO will create the following files once done (using the same prefix as the input file):
{sequence}.genes.tsv
: The genes file, containing the genes found by Pyrodigal and per-gene BGC probabilities predicted by the CRF.{sequence}.features.tsv
: The features file, containing the domains identified in the predicted genes.{sequence}.clusters.tsv
: If any BGCs were found, a clusters file, containing the coordinates of the predicted clusters, along their putative biosynthetic type.{sequence}_cluster_{N}.gbk
: If any were found, a GenBank file per cluster, containing the cluster sequence annotated with its member proteins and domains. They can be opened by a standard GenBank viewer, such as Ugene.
Reference¶
GECCO can be cited using the following preprint:
Accurate de novo identification of biosynthetic gene clusters with GECCO. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller. bioRxiv 2021.05.03.442509; doi:10.1101/2021.05.03.442509
Feedback¶
Contact¶
If you have any question about GECCO, if you run into any issue, or if you would like to make a feature request, please create an issue in the GitHub repository. You can also directly contact Martin Larralde via email.
Contributing¶
If you want to contribute to GECCO, please have a look at the contribution guide first, and feel free to open a pull request on the GitHub repository.
Documentation¶
Guides¶
Library¶
License¶
GECCO is released under the
GNU General Public License v3
or later, and is fully open-source. The LICENSE
file distributed with
the software contains the complete license text.
About¶
GECCO is developped by the Zeller Team at the European Molecular Biology Laboratory in Heidelberg. The following individuals contributed to the development of GECCO: