Type Prediction

Supervised classifier to predict the type of a cluster.

class gecco.types.TypeBinarizer(sklearn.preprocessing.MultiLabelBinarizer)[source]

A MultiLabelBinarizer working with ClusterType instances.

transform(y: List[ClusterType]) Iterable[Iterable[int]][source]

Transform the given label sets.

Parameters:

y (iterable of iterables) – A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.

Returns:

y_indicator (array or CSR matrix, shape (n_samples, n_classes)) – A matrix such that y_indicator[i, j] = 1 iff classes_[j] is in y[i], and 0 otherwise.

inverse_transform(yt: NDArray[numpy.bool_]) Iterable[ClusterType][source]

Transform the given indicator matrix into label sets.

Parameters:

yt ({ndarray, sparse matrix} of shape (n_samples, n_classes)) – A matrix containing only 1s ands 0s.

Returns:

y (list of tuples) – The set of labels for each sample such that y[i] consists of classes_[j] for each yt[i, j] == 1.

class gecco.types.TypeClassifier(object)[source]

A wrapper to predict the type of a Cluster.

classmethod trained(model_path: Optional[str] = None) TypeClassifier[source]

Create a new TypeClassifier pre-trained with embedded data.

Parameters:

model_path (str, optional) – The path to the model directory obtained with the gecco train command. If None given, use the embedded training data.

Returns:

TypeClassifier – A random forest model that can be used to perform cluster type predictions without training first.

__init__(classes: Iterable[str] = (), **kwargs: object) None[source]

Instantiate a new type classifier.

Keyword Arguments:
  • the (Any additional keyword argument is passed as argument to) –

  • constructor. (internal RandomForestClassifier) –

predict_types(clusters: _S) _S[source]

Predict types for each of the given clusters.