presto.Applications
External application wrappers
- presto.Applications.getMuscleVersion(exec='muscle')
Gets the version of the Muscle executable
- presto.Applications.makeBlastnDb(ref_file, db_exec='makeblastdb')
Makes a blastn database file
- Parameters:
ref_file – the path to the reference database file
db_exec – the path to the makeblastdb executable
- Returns:
(name and location of the database, handle of the tempfile.TemporaryDirectory)
- Return type:
- presto.Applications.makeUBlastDb(ref_file, db_exec='usearch')
Makes a ublast database file
- Parameters:
ref_file – path to the reference database file.
db_exec – path to the usearch executable.
- Returns:
(location of the database, handle of the tempfile.NamedTemporaryFile)
- Return type:
- presto.Applications.runBlastn(seq, database, evalue=1e-05, max_hits=100, aligner_exec='blastn')
Aligns a sequence against a reference database using BLASTN
- Parameters:
seq – a list of SeqRecord objects to align.
database – the path and name of the blastn database.
evalue – the E-value cut-off.
maxhits – the maximum number of hits returned.
aligner_exec – the path to the blastn executable.
- Returns:
Alignment results.
- Return type:
pandas.DataFrame
- presto.Applications.runCDHit(seq_list, ident=0.9, length_ratio=0.0, seq_start=0, seq_end=None, max_memory=3000, threads=1, cluster_exec='cd-hit-est', min_word_match=12)
Cluster a set of sequences using CD-HIT
- Parameters:
seq_list (list) – a list of SeqRecord objects to align.
ident (float) – the sequence identity cutoff to be passed to cd-hit-est.
length_ratio (float) – cd-hit-est parameter defining the minimum short/long length ratio allowed within a cluster.
seq_start (int) – the start position to trim sequences at before clustering.
seq_end (int) – the end position to trim sequences at before clustering.
max_memory (int) – cd-hit-est max memory limit (Mb)
threads (int) – number of threads for cd-hit-est.
cluster_exec (str) – the path to the cd-hit-est executable.
min_word_match (int) – (not used for this function, see runUClust)
- Returns:
{cluster id: list of sequence ids}.
- Return type:
- presto.Applications.runMuscle(seq_list, aligner_exec='muscle')
Multiple aligns a set of sequences using MUSCLE
- Parameters:
seq_list – a list of SeqRecord objects to align
aligner_exec – the MUSCLE executable
- Returns:
Multiple alignment results.
- Return type:
Bio.Align.MultipleSeqAlignment
- presto.Applications.runUBlast(seq, database, evalue=1e-05, max_hits=100, aligner_exec='usearch')
Aligns a sequence against a reference database using the usearch_local algorithm of USEARCH
- Parameters:
seq – a list of SeqRecord objects to align.
database – the path to the ublast database or a fasta file.
evalue – the E-value cut-off.
maxhits – the maximum number of hits returned.
aligner_exec – the path to the usearch executable.
- Returns:
Alignment results.
- Return type:
pandas.DataFrame
- presto.Applications.runUClust(seq_list, ident=0.9, length_ratio=0.0, seq_start=0, seq_end=None, max_memory=3000, threads=1, cluster_exec='usearch', min_word_match=12)
Cluster a set of sequences using the UCLUST algorithm from USEARCH
- Parameters:
seq_list (list) – a list of SeqRecord objects to align.
ident (float) – the sequence identity cutoff to be passed to usearch.
length_ratio (float) – usearch parameter defining the minimum short/long length ratio allowed within a cluster.
seq_start (int) – the start position to trim sequences at before clustering.
seq_end (int) – the end position to trim sequences at before clustering.
max_memory (int) – currently ignored.
threads (int) – number of threads for usearch.
cluster_exec (str) – the path to the usearch executable.
min_word_match (int) – minimum number of words that must match to be clustered.
- Returns:
{cluster id: list of sequence ids}.
- Return type: