Type Prediction

Supervised classifier to predict the biosynthetic type of a cluster.

class gecco.types.TypeBinarizer(sklearn.preprocessing.MultiLabelBinarizer)[source]

A MultiLabelBinarizer working with ProductType instances.

transform()[source]

Transform the given label sets.

Parameters

y (iterable of iterables) – A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.

Returns

y_indicator (array or CSR matrix, shape (n_samples, n_classes)) – A matrix such that y_indicator[i, j] = 1 iff classes_[j] is in y[i], and 0 otherwise.

inverse_transform()[source]

Transform the given indicator matrix into label sets.

Parameters

yt ({ndarray, sparse matrix} of shape (n_samples, n_classes)) – A matrix containing only 1s ands 0s.

Returns

y (list of tuples) – The set of labels for each sample such that y[i] consists of classes_[j] for each yt[i, j] == 1.

class gecco.types.TypeClassifier(object)[source]

A wrapper to predict the type of a Cluster.

classmethod trained(model_path: Optional[str] = None) → gecco.types.TypeClassifier[source]

Create a new TypeClassifier pre-trained with embedded data.

Parameters

model_path (str, optional) – The path to the model directory obtained with the gecco train command. If None given, use the embedded training data.

Returns

TypeClassifier – A random forest model that can be used to perform BGC type predictions without training first.

__init__(**kwargs: object) → None[source]

Instantiate a new type classifier.

Keyword Arguments
  • additional keyword argument is passed as argument to the (Any) –

  • RandomForestClassifier constructor. (internal) –

predict_types(clusters: _S) → _S[source]

Predict types for each of the given clusters.