Bio.Phylo Cookbook. · Biopython (2024)

  • Cookbook

Here are some examples of using Bio.Phylo for somelikely tasks. Some of these functions might be added to Biopython in alater release, but you can use them in your own code with Biopython1.54.

Convenience functions

Get the parent of a clade

The Tree data structures in Bio.Phylo don’t store parent references foreach clade. Instead, the get_path method can be used to trace the pathof parent-child links from the tree root to the clade of choice:

def get_parent(tree, child_clade): node_path = tree.get_path(child_clade) return node_path[-2]# Select a clademyclade = tree.find_clades("foo").next()# Test the functionparent = get_parent(tree, myclade)assert myclade in parent

Note that get_path has a linear run time with respect to the size ofthe tree – i.e. for best performance, don’t call get_parent orget_path inside a time-critical loop. If possible, call get_pathoutside the loop, and look up parents in the list returned by thatfunction.

Alternately, if you need to repeatedly look up the parents of arbitrarytree elements, create a dictionary mapping all nodes to their parents:

def all_parents(tree): parents = {} for clade in tree.find_clades(order="level"): for child in clade: parents[child] = clade return parents# Exampleparents = all_parents(tree)myclade = tree.find_clades("foo").next()parent_of_myclade = parents[myclade]assert myclade in parent_of_myclade

Index clades by name

For large trees it can be useful to be able to select a clade by name,or some other unique identifier, rather than searching the whole treefor it during each operation.

def lookup_by_names(tree): names = {} for clade in tree.find_clades(): if clade.name: if clade.name in names: raise ValueError("Duplicate key: %s" % clade.name) names[clade.name] = clade return names

Now you can retrieve a clade by name in constant time:

tree = Phylo.read("ncbi_taxonomy.xml", "phyloxml")names = lookup_by_names(tree)for phylum in ("Apicomplexa", "Euglenozoa", "Fungi"): print("Phylum size: %d" % len(names[phylum].get_terminals()))

A potential issue: The above implementation of lookup_by_names doesn’tinclude unnamed clades, generally internal nodes. We can fix this byadding a unique identifier for each clade. Here, all clade names areprefixed with a unique number (which can be useful for searching, too):

def tabulate_names(tree): names = {} for idx, clade in enumerate(tree.find_clades()): if clade.name: clade.name = "%d_%s" % (idx, clade.name) else: clade.name = str(idx) names[clade.name] = clade return names

Calculate distances between neighboring terminals

Suggested by Joel Berendzen

import itertoolsdef terminal_neighbor_dists(self): """Return a list of distances between adjacent terminals.""" def generate_pairs(self): pairs = itertools.tee(self) next(pairs[1]) return zip(pairs[0], pairs[1]) return [self.distance(*i) for i in generate_pairs(self.find_clades(terminal=True))]

Test for “semi-preterminal” clades

Suggested by Joel Berendzen

The existing tree method is_preterminal returns True if all of thedirect descendants are terminal. This snippet will instead return Trueif any direct descendent is terminal, but still False if the givenclade itself is terminal.

def is_semipreterminal(clade): """True if any direct descendent is terminal.""" for child in clade: if child.is_terminal(): return True return False

In Python 2.5 and later, this is simplified with the built-in anyfunction:

def is_semipreterminal(clade): return any(child.is_terminal() for child in clade)

Comparing trees

TODO:

  • Symmetric difference / partition metric, a.k.a. topologicaldistance (Robinson-Foulds)
  • Quartets distance
  • Nearest-neighbor interchange
  • Path-length-difference

Consensus methods

TODO:

Rooting methods

The basic method on the Tree class (not TreeMixin) isroot_with_outgroup:

tree = Phylo.read("example.nwk", "newick")print(tree)# ...tree.root_with_outgroup({"name": "A"}) # Operates in-placeprint(tree)

Normally you’ll want the outgroup to be a monophyletic group, ratherthan a single taxon. This isn’t automatically checked, but you can do ityourself with the is_monophyletic method.

To save some typing, try keeping the query in a separate list andreusing it:

outgroup = [{"name": taxon_name} for taxon_name in ("E", "F", "G")]if tree.is_monophyletic(outgroup): tree.root_with_outgroup(*outgroup)else: raise ValueError("outgroup is paraphyletic")

TODO:

  • Root at the midpoint between the two most distant nodes (or “center”of all tips)

Graphics

TODO:

  • Party tricks with draw methods, covering each keyword argument

Exporting to other types

Convert to an ‘ape’ tree, via Rpy2

The R statistical programming environment provides support forphylogenetics through the ape package andseveral others that build on top of ape. The Python packagerpy2 provides an interfacebetween R and Python, so it’s possible to convert a Bio.Phylo tree intoan ape tree object:

import tempfilefrom rpy2.robjects import rdef to_ape(tree): """Convert a tree to the type used by the R package `ape`, via rpy2. Requirements: - Python package `rpy2` - R package `ape` """ with tempfile.NamedTemporaryFile() as tmpf: Phylo.write(tree, tmpf, "newick") tmpf.flush() rtree = r( """ library('ape') read.tree('%s') """ % tmpf.name ) return rtree

See that it works:

>>> from StringIO import StringIO>>> from Bio import Phylo>>> tree = Phylo.read(StringIO("(A,(B,C),(D,E));"), "newick")>>> rtree = to_ape(tree)>>> len(rtree)3>>> print(r.summary(rtree))Phylogenetic tree: structure(list(edge = structure(c(6, 6, 7, 7, 6, 8, 8, 1, 7, 2, 3, 8, 4, 5), .Dim = c(7L, 2L)), tip.label = c("A", "B", "C", "D", "E"), Nnode = 3L), .Names = c("edge", "tip.label", "Nnode" ), class="phylo") Number of tips: 5 Number of nodes: 3 No branch lengths. No root edge. Tip labels: A B C D E No node labels.NULL>>> r.plot(rtree)

See the rpy2 documentation for further guidance.

Convert to a DendroPy or PyCogent tree

The tree objects used by Biopython, DendroPy and PyCogent are different.Nonetheless, all three toolkits support the Newick file format, sointeroperability is straightforward at that level by writing to atemporary file or StringIO object with one library, then reading thesame string again with another.

from Bio import Phyloimport cogentPhylo.write(bptree, "mytree.nwk", "newick") # Biopython treectree = cogent.LoadTree("mytree.nwk") # PyCogent tree
import dendropy# Create or load a tree in DendroPydtree = dendropy.Tree.get_from_string("(A, (B, C), (D, E))", "newick")dtree.write_to_path("tmp.nwk", "newick", suppress_rooting=True)# Load the same tree in Biopythonbptree = Phylo.read("tmp.nwk", "newick")

Convert to a NumPy array or matrix

Adjacency matrix: cells are 1 (true) if a parent-child relationshipexists, otherwise 0 (false).

import numpy as npdef to_adjacency_matrix(tree): """Create an adjacency matrix (NumPy array) from clades/branches in tree. Also returns a list of all clades in tree ("allclades"), where the position of each clade in the list corresponds to a row and column of the numpy array: a cell (i,j) in the array is 1 if there is a branch from allclades[i] to allclades[j], otherwise 0. Returns a tuple of (allclades, adjacency_matrix) where allclades is a list of clades and adjacency_matrix is a NumPy 2D array. """ allclades = list(tree.find_clades(order="level")) lookup = {} for i, elem in enumerate(allclades): lookup[elem] = i adjmat = np.zeros((len(allclades), len(allclades))) for parent in tree.find_clades(terminal=False, order="level"): for child in parent.clades: adjmat[lookup[parent], lookup[child]] = 1 if not tree.rooted: # Branches can go from "child" to "parent" in unrooted trees adjmat = adjmat + adjmat.transpose() return (allclades, np.matrix(adjmat))

Distance matrix: cell values are branch lengths if a branch exists,otherwise infinity (this plays well with graph algorithms).

import numpy as npdef to_distance_matrix(tree): """Create a distance matrix (NumPy array) from clades/branches in tree. A cell (i,j) in the array is the length of the branch between allclades[i] and allclades[j], if a branch exists, otherwise infinity. Returns a tuple of (allclades, distance_matrix) where allclades is a list of clades and distance_matrix is a NumPy 2D array. """ allclades = list(tree.find_clades(order="level")) lookup = {} for i, elem in enumerate(allclades): lookup[elem] = i distmat = np.repeat(np.inf, len(allclades) ** 2) distmat.shape = (len(allclades), len(allclades)) for parent in tree.find_clades(terminal=False, order="level"): for child in parent.clades: if child.branch_length: distmat[lookup[parent], lookup[child]] = child.branch_length if not tree.rooted: distmat += distmat.transpose return (allclades, np.matrix(distmat))

Enhancements:

  • Use an OrderedDict for allclades, so the separate dictionarylookup isn’t needed. (Python 2.7+)
  • Use NumPy’s structured arrays toassign clade names to rows and columns of the matrix, so allcladesisn’t needed either (this works nicely along with thetabulate_names function given earlier).

TODO:

Bio.Phylo Cookbook. · Biopython (2024)

References

Top Articles
Who Has The Best Power Play Percentage In A Single Season | StatMuse
Rollag 2023 Dates
Swissport Timecard
Happel Real Estate
Brett Cooper Wikifeet
T Mobile Rival Crossword Clue
Jodie Sweetin Breast Reduction
Chris Wragge Illness
Omega Pizza-Roast Beef -Seafood Middleton Menu
Craigs List High Rockies
Thomas the Tank Engine
At 25 Years, Understanding The Longevity Of Craigslist
National Weather Denver Co
Ingersoll Greenwood Funeral Home Obituaries
Chittenden County Family Court Schedule
Five Guys Calorie Calculator
James And Lisa Goy Obituary
Best Chinese Rome Ny
Free 120 Step 2 Correlation
Natasha Tillotson
Walmart Phone Number Auto Center
Advance Auto Parts Near Me Open Now
Pair sentenced for May 2023 murder of Roger Driesel
Hca Florida Middleburg Emergency Reviews
Killing Self Gif
Brublackvip
Megan Hall Bikini
Terraria Water Gun
David Mayries
What Happened To Doublelist? Unveiling The Mystery | Men's Venture
Jan Markell Net Worth
Body Rubs Austin Texas
Waive Upgrade Fee
Pick N Pull Near Me [Locator Map + Guide + FAQ]
7066642123
Marie Anne Thiebaud 2019
Seats 3D Ubs Arena
Brian Lizer Life Below Zero Next Generation
Aces Login Palo Alto
Grupos De Cp Telegram
o2 Störung? Netzausfall & Netzprobleme im Live-Check prüfen
Congdon Heart And Vascular Center
Faze Teeqo Wiki
John Deere 7 Iron Deck Parts Diagram
Mtb Com Online
Damaged car, damaged cars for sale
Ces 2023 Badge Pickup
Craigslist Nokomis Fl
The Battle Gimmick for the Gen 10 Pokémon Games Will Be...
Grand Rapids, Michigan Aviation Weather Report and Forecast
Mt Sinai Walk In Clinic
Highplainsobserverperryton
Latest Posts
Article information

Author: Sen. Emmett Berge

Last Updated:

Views: 6145

Rating: 5 / 5 (60 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Sen. Emmett Berge

Birthday: 1993-06-17

Address: 787 Elvis Divide, Port Brice, OH 24507-6802

Phone: +9779049645255

Job: Senior Healthcare Specialist

Hobby: Cycling, Model building, Kitesurfing, Origami, Lapidary, Dance, Basketball

Introduction: My name is Sen. Emmett Berge, I am a funny, vast, charming, courageous, enthusiastic, jolly, famous person who loves writing and wants to share my knowledge and understanding with you.