Algorithm to smooth contiguous BGC predictions into single regions.
- class gecco.refine.ClusterRefiner(object)¶
A post-processor to extract contiguous BGCs from CRF predictions.
- __init__(threshold: float = 0.8, criterion: str = 'gecco', n_cds: int = 5, n_biopfams: int = 5, average_threshold: float = 0.6, edge_distance: int = 10) None ¶
Create a new
float) – The probability threshold to use to consider a protein to be part of a BGC region.
str) – The criterion to use when checking for BGC validity. See
gecco.bgc.BGC.is_validdocumentation for allowed values and expected behaviours.
int) – The minimum number of genes a gene cluster must contain to be considered valid. If
gecco, then this is the minimum number of annotated CDS.
int) – The minimum number of biosynthetic Pfam domains a gene cluster must contain to be considered valid (only when the criterion is
int) – The average probability threshold to use to consider a BGC valid (only when the criterion is
int) – The minimum distance from the edge the BGC must be located (it may start at an edge, but must span for longer than
edge_distance), in number of annotated genes. Lowering this number will increase the number of false-positives in the case of very short sequences. (only when the criterion is
- iter_clusters(genes: List[gecco.model.Gene]) Iterator[gecco.model.Cluster] ¶
Find all clusters in a table of CRF predictions.