ORF Extraction

Generic protocol for ORF detection in DNA sequences.

class gecco.orf.ORFFinder(object)[source]

An abstract base class to provide a generic ORF finder.

abstract find_genes(records: Iterable[SeqRecord], progress: Optional[Callable[[SeqRecord, int], None]] = None) Iterable[Gene][source]

Find all genes from a DNA sequence.

class gecco.orf.PyrodigalFinder(ORFFinder)[source]

An ORFFinder that uses the Pyrodigal bindings to Prodigal.

Prodigal is a fast and reliable protein-coding gene prediction for prokaryotic genomes, with support for draft genomes and metagenomes.

__init__(metagenome: bool = True, mask: bool = False, cpus: int = 0) None[source]

Create a new PyrodigalFinder instance.

Parameters:
  • metagenome (bool) – Whether or not to run Prodigal in metagenome mode, defaults to True.

  • mask (bool) – Whether or not to mask genes running across regions containing unknown nucleotides, defaults to False.

  • cpus (int) – The number of threads to use to run Pyrodigal in parallel. Pass 0 to use the number of CPUs on the machine.

find_genes(records: ~typing.Iterable[~Bio.SeqRecord.SeqRecord], progress: ~typing.Optional[~typing.Callable[[~Bio.SeqRecord.SeqRecord, int], None]] = None, *, pool_factory: ~typing.Union[~typing.Type[~multiprocessing.pool.Pool], ~typing.Callable[[~typing.Optional[int]], ~multiprocessing.pool.Pool]] = <class 'multiprocessing.pool.ThreadPool'>) Iterator[Gene][source]

Find all genes contained in a sequence of DNA records.

Parameters:
  • records (iterable of SeqRecord) – An iterable of DNA records in which to find genes.

  • progress (callable, optional) – A progress callback of signature progress(record, total) that will be called everytime a record has been processed successfully, with record being the SeqRecord instance, and total being the total number of records to process.

Keyword Arguments:

pool_factory (type) – The callable for creating pools, defaults to the multiprocessing.pool.ThreadPool class, but multiprocessing.pool.Pool is also supported.

Yields:

Gene – An iterator over all the genes found in the given records.