BGC Detection

BGC prediction using a conditional random field.

class gecco.crf.ClusterCRF(object)[source]

A wrapper for sklearn_crfsuite.CRF to work with the GECCO data model.

classmethod trained(model_path: Optional[str] = None) gecco.crf.ClusterCRF[source]

Create a new pre-trained ClusterCRF instance from a model path.


model_path (str, optional) – The path to the model directory obtained with the gecco train command. If None given, use the embedded model.


ClusterCRF – A CRF model that can be used to perform predictions without training first.


ValueError – If the model data does not match its hash.

__init__(feature_type: str = 'protein', algorithm: str = 'lbfgs', window_size: int = 5, window_step: int = 1, **kwargs: Dict[str, object]) None[source]

Create a new ClusterCRF instance.


Any additional keyword argument is passed as-is to the internal CRF constructor.

  • ValueError – if feature_type has an invalid value.

  • TypeError – if one of the *_columns argument is not iterable.

predict_probabilities(genes: Iterable[gecco.model.Gene], *, pad: bool = True) List[gecco.model.Gene][source]

Predict how likely each given gene is part of a gene cluster.