Domain Annotation

Compatibility wrapper for HMMER binaries and output.

gecco.hmmer.embedded_hmms() Iterator[HMM][source]

Iterate over the embedded HMMs that are shipped with GECCO.

class gecco.hmmer.HMM(object)[source]

A Hidden Markov Model library to use with HMMER.

id: str

Alias for field number 0

version: str

Alias for field number 1

url: str

Alias for field number 2

path: str

Alias for field number 3

size: Optional[int]

Alias for field number 4

relabel_with: Optional[str]

Alias for field number 5

md5: Optional[str]

Alias for field number 6

relabel(domain: str) str[source]

Rename a domain obtained by this HMM to the right accession.

This method can be used with HMM libraries that have separate HMMs for the same domain, such as Pfam.

class gecco.hmmer.DomainAnnotator(object)[source]

An abstract class for annotating genes with protein domains.

__init__(hmm: HMM, cpus: Optional[int] = None, whitelist: Optional[Container[str]] = None) None[source]

Prepare a new HMMER annotation handler with the given hmms.

Parameters:
  • hmm (str) – The path to the file containing the HMMs.

  • cpus (int, optional) – The number of CPUs to allocate for the hmmsearch command. Give None to use the default.

  • whitelist (container of str) – If given, a container containing the accessions of the individual HMMs to annotate with. If None is given, annotate with the entire file.

abstract run(genes: Iterable[Gene]) List[Gene][source]

Run annotation on proteins of genes and update their domains.

Parameters:

genes (iterable of Gene) – An iterable that yield genes to annotate with self.hmm.

class gecco.hmmer.PyHMMER(DomainAnnotator)[source]

A domain annotator that uses pyhmmer.

run(genes: Iterable[Gene], progress: Optional[Callable[[HMM, int], None]] = None, bit_cutoffs: Optional[str] = None) List[Gene][source]

Run annotation on proteins of genes and update their domains.

Parameters:

genes (iterable of Gene) – An iterable that yield genes to annotate with self.hmm.