ORF Extraction¶
Generic protocol for ORF detection in DNA sequences.
- class gecco.orf.PyrodigalFinder(ORFFinder)[source]¶
An
ORFFinder
that uses the Pyrodigal bindings to Prodigal.Prodigal is a fast and reliable protein-coding gene prediction for prokaryotic genomes, with support for draft genomes and metagenomes.
- __init__(metagenome: bool = True, mask: bool = False, cpus: int = 0) None [source]¶
Create a new
PyrodigalFinder
instance.- Parameters:
metagenome (bool) – Whether or not to run Prodigal in metagenome mode, defaults to
True
.mask (bool) – Whether or not to mask genes running across regions containing unknown nucleotides, defaults to
False
.cpus (int) – The number of threads to use to run Pyrodigal in parallel. Pass
0
to use the number of CPUs on the machine.
- find_genes(records: ~typing.Iterable[~Bio.SeqRecord.SeqRecord], progress: ~typing.Optional[~typing.Callable[[~Bio.SeqRecord.SeqRecord, int], None]] = None, *, pool_factory: ~typing.Union[~typing.Type[~multiprocessing.pool.Pool], ~typing.Callable[[~typing.Optional[int]], ~multiprocessing.pool.Pool]] = <class 'multiprocessing.pool.ThreadPool'>) Iterator[Gene] [source]¶
Find all genes contained in a sequence of DNA records.
- Parameters:
records (iterable of
SeqRecord
) – An iterable of DNA records in which to find genes.progress (callable, optional) – A progress callback of signature
progress(record, total)
that will be called everytime a record has been processed successfully, withrecord
being theSeqRecord
instance, andtotal
being the total number of records to process.
- Keyword Arguments:
pool_factory (
type
) – The callable for creating pools, defaults to themultiprocessing.pool.ThreadPool
class, butmultiprocessing.pool.Pool
is also supported.- Yields:
Gene
– An iterator over all the genes found in the given records.