ORF Extraction

Generic protocol for ORF detection in DNA sequences.

class gecco.orf.ORFFinder(object)[source]

An abstract base class to provide a generic ORF finder.

abstract find_genes(records: Iterable[Bio.SeqRecord.SeqRecord], progress: Optional[Callable[[Bio.SeqRecord.SeqRecord, int], None]] = None) Iterable[gecco.model.Gene][source]

Find all genes from a DNA sequence.

class gecco.orf.PyrodigalFinder(ORFFinder)[source]

An ORFFinder that uses the Pyrodigal bindings to Prodigal.

Prodigal is a fast and reliable protein-coding gene prediction for prokaryotic genomes, with support for draft genomes and metagenomes.

__init__(metagenome: bool = True, mask: bool = False, cpus: int = 0) None[source]

Create a new PyrodigalFinder instance.

Parameters
  • metagenome (bool) – Whether or not to run Prodigal in metagenome mode, defaults to True.

  • mask (bool) – Whether or not to mask genes running across regions containing unknown nucleotides, defaults to False.

  • cpus (int) – The number of threads to use to run Pyrodigal in parallel. Pass 0 to use the number of CPUs on the machine.

find_genes(records: typing.Iterable[Bio.SeqRecord.SeqRecord], progress: typing.Optional[typing.Callable[[Bio.SeqRecord.SeqRecord, int], None]] = None, *, pool_factory: typing.Union[typing.Type[multiprocessing.pool.Pool], typing.Callable[[typing.Optional[int]], multiprocessing.pool.Pool]] = <class 'multiprocessing.pool.ThreadPool'>) Iterator[gecco.model.Gene][source]

Find all genes contained in a sequence of DNA records.

Parameters
  • records (iterable of SeqRecord) – An iterable of DNA records in which to find genes.

  • progress (callable, optional) – A progress callback of signature progress(record, total) that will be called everytime a record has been processed successfully, with record being the SeqRecord instance, and total being the total number of records to process.

Keyword Arguments

pool_factory (type) – The callable for creating pools, defaults to the multiprocessing.pool.ThreadPool class, but multiprocessing.pool.Pool is also supported.

Yields

Gene – An iterator over all the genes found in the given records.