Phylogenetic Trees — A Python Environment for (phylogenetic) Tree Exploration (2024)

Contents

  • Phylogenetic Trees
    • Overview
    • Linking Phylogenetic Trees with Multiple Sequence Alignments
    • Visualization of phylogenetic trees
    • Adding taxonomic information
      • Automatic control of species info
      • Automatic (and custom) control of the species info
      • Manual control of the species info
    • Detecting evolutionary events
      • Species Overlap (SO) algorithm
      • Tree reconciliation algorithm
      • A closer look to the evolutionary event object
    • Relative dating phylogenetic nodes
      • Implementation
    • Automatic rooting (outgroup detection)
    • Working with duplicated gene families
      • Treeko (splitting gene trees into species trees)
      • Splitting gene trees by duplication events
      • Collapse species specific duplications

Overview

Phylogenetic trees are the result of most evolutionary analyses. Theyrepresent the evolutionary relationships among a set of species or, inmolecular biology, a set of hom*ologous sequences.

The PhyloTree class is an extension of the base Treeobject, providing a appropriate way to deal with phylogenetic trees.Thus, while leaves are considered to represent species (or sequencesfrom a given species genome), internal nodes are considered ancestralnodes. A direct consequence of this is, for instance, that every splitin the tree will represent a speciation or duplication event.

Linking Phylogenetic Trees with Multiple Sequence Alignments

PhyloTree instances allow molecular phylogenies to be linkedto the Multiple Sequence Alignments (MSA). To associate a MSA with aphylogenetic tree you can use the PhyloNode.link_to_alignment()method. You can use the alg_format argument to specify itsformat (See SeqGroup documentation for available formats)

Given that Fasta format are not only applicable for MSA but also forUnaligned Sequences, you may also associate sequences of differentlengths with tree nodes.

from ete2 import PhyloTreefasta_txt = """>seqAMAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEALNSYYASQTDDIKDRREEAH>seqBMAEIPDATIQQFMALTNVSHNIAVQY--EFGDLNEALNSYYAYQTDDQKDRREEAH>seqCMAEIPDATIQ---ALTNVSHNIAVQYLSEFGDLNEALNSYYASQTDDQPDRREEAH>seqDMAEAPDETIQQFMALTNVSHNIAVQYLSEFGDLNEAL--------------REEAH"""# Load a tree and link it to an alignment.t = PhyloTree("(((seqA,seqB),seqC),seqD);")t.link_to_alignment(alignment=fasta_txt, alg_format="fasta")

The same could be done at the same time the tree is being loaded, byusing the alignment and alg_format arguments ofPhyloTree.

# Load a tree and link it to an alignment.t = PhyloTree("(((seqA,seqB),seqC),seqD);", alignment=fasta_txt, alg_format="fasta")

As currently implemented, sequence linking process is not strict,which means that a perfect match between all node names and sequencesnames is not required. Thus, if only one match is found betweensequences names within the MSA file and tree node names, only one treenode will contain an associated sequence. Also, it is important tonote that sequence linking is not limited to terminal nodes. Ifinternal nodes are named, and such names find a match within theprovided MSA file, their corresponding sequences will be also loadedinto the tree structure. Once a MSA is linked, sequences will beavailable for every tree node through its node.sequenceattribute.

from ete2 import PhyloTreefasta_txt = """ >seqA MAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEALNSYYASQTDDIKDRREEAH >seqB MAEIPDATIQQFMALTNVSHNIAVQY--EFGDLNEALNSYYAYQTDDQKDRREEAH >seqC MAEIPDATIQ---ALTNVSHNIAVQYLSEFGDLNEALNSYYASQTDDQPDRREEAH >seqD MAEAPDETIQQFMALTNVSHNIAVQYLSEFGDLNEAL--------------REEAH"""iphylip_txt = """ 4 76 seqA MAEIPDETIQ QFMALT---H NIAVQYLSEF GDLNEALNSY YASQTDDIKD RREEAHQFMA seqB MAEIPDATIQ QFMALTNVSH NIAVQY--EF GDLNEALNSY YAYQTDDQKD RREEAHQFMA seqC MAEIPDATIQ ---ALTNVSH NIAVQYLSEF GDLNEALNSY YASQTDDQPD RREEAHQFMA seqD MAEAPDETIQ QFMALTNVSH NIAVQYLSEF GDLNEAL--- ---------- -REEAHQ--- LTNVSHQFMA LTNVSH LTNVSH---- ------ LTNVSH---- ------ -------FMA LTNVSH"""# Load a tree and link it to an alignment. As usual, 'alignment' can# be the path to a file or data in text format.t = PhyloTree("(((seqA,seqB),seqC),seqD);", alignment=fasta_txt, alg_format="fasta")#We can now access the sequence of every leaf nodeprint "These are the nodes and its sequences:"for leaf in t.iter_leaves(): print leaf.name, leaf.sequence#seqD MAEAPDETIQQFMALTNVSHNIAVQYLSEFGDLNEAL--------------REEAH#seqC MAEIPDATIQ---ALTNVSHNIAVQYLSEFGDLNEALNSYYASQTDDQPDRREEAH#seqA MAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEALNSYYASQTDDIKDRREEAH#seqB MAEIPDATIQQFMALTNVSHNIAVQY--EFGDLNEALNSYYAYQTDDQKDRREEAH## The associated alignment can be changed at any timet.link_to_alignment(alignment=iphylip_txt, alg_format="iphylip")# Let's check that sequences have changedprint "These are the nodes and its re-linked sequences:"for leaf in t.iter_leaves(): print leaf.name, leaf.sequence#seqD MAEAPDETIQQFMALTNVSHNIAVQYLSEFGDLNEAL--------------REEAHQ----------FMALTNVSH#seqC MAEIPDATIQ---ALTNVSHNIAVQYLSEFGDLNEALNSYYASQTDDQPDRREEAHQFMALTNVSH----------#seqA MAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEALNSYYASQTDDIKDRREEAHQFMALTNVSHQFMALTNVSH#seqB MAEIPDATIQQFMALTNVSHNIAVQY--EFGDLNEALNSYYAYQTDDQKDRREEAHQFMALTNVSH----------## The sequence attribute is considered as node feature, so you can# even include sequences in your extended newick format!print t.write(features=["sequence"], format=9)### (((seqA[&&NHX:sequence=MAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEALNSYYASQTDDIKDRREEAHQF# MALTNVSHQFMALTNVSH],seqB[&&NHX:sequence=MAEIPDATIQQFMALTNVSHNIAVQY--EFGDLNEALNSY# YAYQTDDQKDRREEAHQFMALTNVSH----------]),seqC[&&NHX:sequence=MAEIPDATIQ---ALTNVSHNIA# VQYLSEFGDLNEALNSYYASQTDDQPDRREEAHQFMALTNVSH----------]),seqD[&&NHX:sequence=MAEAPD# ETIQQFMALTNVSHNIAVQYLSEFGDLNEAL--------------REEAHQ----------FMALTNVSH]);## And yes, you can save this newick text and reload it into a PhyloTree instance.sametree = PhyloTree(t.write(features=["sequence"]))print "Recovered tree with sequence features:"print sametree## /-seqA# /--------|# /--------| \-seqB# | |#---------| \-seqC# |# \-seqD#print "seqA sequence:", (t&"seqA").sequence# MAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEALNSYYASQTDDIKDRREEAHQFMALTNVSHQFMALTNVSH

Visualization of phylogenetic trees

PhyloTree instances can benefit from all the features of theprogrammable drawing engine. However, a built-in phylogenetic layoutis provided for convenience.

All PhyloTree instances are, by default, attached to such layout fortree visualization, thus allowing for in-place alignment visualizationand evolutionary events labeling.

Phylogenetic Trees — A Python Environment for (phylogenetic) Tree Exploration (1)

from ete2 import PhyloTree, TreeStylealg = """ >Dme_001 MAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEAL--YYASQTDDIKDRREEAH >Dme_002 MAEIPDATIQQFMALTNVSHNIAVQY--EFGDLNEALNSYYAYQTDDQKDRREEAH >Cfa_001 MAEIPDATIQ---ALTNVSHNIAVQYLSEFGDLNEALNSYYASQTDDQPDRREEAH >Mms_001 MAEAPDETIQQFMALTNVSHNIAVQYLSEFGDLNEAL--------------REEAH >Hsa_001 MAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEALNSYYASQTDDIKDRREEAH >Ptr_002 MAEIPDATIQ-FMALTNVSHNIAVQY--EFGDLNEALNSY--YQTDDQKDRREEAH >Mmu_002 MAEIPDATIQ---ALTNVSHNIAVQYLSEFGDLNEALNSYYASQTDDQPDRREEAH >Hsa_002 MAEAPDETIQQFM-LTNVSHNIAVQYLSEFGDLNEAL--------------REEAH >Mmu_001 MAEIPDETIQQFMALT---HNIAVQYLSEFGDLNEALNSYYASQTDDIKDRREEAH >Ptr_001 MAEIPDATIQ-FMALTNVSHNIAVQY--EFGDLNEALNSY--YQTDDQKDRREEAH >Mmu_001 MAEIPDATIQ---ALTNVSHNIAVQYLSEFGDLNEALNSYYASQTDDQPDRREEAH"""def get_example_tree(): # Performs a tree reconciliation analysis  gene_tree_nw = '((Dme_001,Dme_002),(((Cfa_001,Mms_001),((Hsa_001,Ptr_001),Mmu_001)),(Ptr_002,(Hsa_002,Mmu_002))));' species_tree_nw = "((((Hsa, Ptr), Mmu), (Mms, Cfa)), Dme);" genetree = PhyloTree(gene_tree_nw) sptree = PhyloTree(species_tree_nw) recon_tree, events = genetree.reconcile(sptree) recon_tree.link_to_alignment(alg) return recon_tree, TreeStyle()if __name__ == "__main__": # Visualize the reconciled tree t, ts = get_example_tree() t.show(tree_style=ts) #recon_tree.render("phylotree.png", w=750)

Adding taxonomic information

PhyloTree instances allow to deal with leaf names and speciesnames separately. This is useful when working with molecularphylogenies, in which node names usually represent sequenceidentifiers. Species names will be stored in thePhyloNode.species attribute of each leaf node. The methodPhyloNode.get_species() can be used obtain the set of speciesnames found under a given internal node (speciation or duplicationevent). Often, sequence names do contain species information as apart of the name, and ETE can parse this information automatically.

There are three ways to establish the species of the different treenodes:

  • Default: The three first letters of node’s name represent the species
  • The species code of each node is dynamically created based on node’s name
  • The species code of each node is manually set.

Automatic control of species info

from ete2 import PhyloTree# Reads a phylogenetic tree (using default species name encoding)t = PhyloTree("(((Hsa_001,Ptr_001),(Cfa_001,Mms_001)),(Dme_001,Dme_002));")# /-Hsa_001# /--------|# | \-Ptr_001# /--------|# | | /-Cfa_001# | \--------|#---------| \-Mms_001# |# | /-Dme_001# \--------|# \-Dme_002## Prints current leaf names and species codesprint "Deafult mode:"for n in t.get_leaves(): print "node:", n.name, "Species name:", n.species# node: Dme_001 Species name: Dme# node: Dme_002 Species name: Dme# node: Hsa_001 Species name: Hsa# node: Ptr_001 Species name: Ptr# node: Cfa_001 Species name: Cfa# node: Mms_001 Species name: Mms

Automatic (and custom) control of the species info

The default behavior can be changed by using thePhyloNode.set_species_naming_function() method or by using thesp_naming_function argument of the PhyloTree class.Note that, using the sp_naming_function argument, the wholetree structure will be initialized to use the provided parsingfunction to obtain species nameinformation. PhyloNode.set_species_naming_function() (present inall tree nodes) can be used to change the behavior in a previouslyloaded tree, or to set different parsing function to different partsof the tree.

from ete2 import PhyloTree# Reads a phylogenetic treet = PhyloTree("(((Hsa_001,Ptr_001),(Cfa_001,Mms_001)),(Dme_001,Dme_002));")# Let's use our own leaf name parsing function to obtain species# names. All we need to do is create a python function that takes# node's name as argument and return its corresponding species name.def get_species_name(node_name_string): # Species code is the first part of leaf name (separated by an # underscore character) spcode = node_name_string.split("_")[0] # We could even translate the code to complete names code2name = { "Dme":"Drosophila melanogaster", "Hsa":"hom*o sapiens", "Ptr":"Pan troglodytes", "Mms":"Mus musculus", "Cfa":"Canis familiaris" } return code2name[spcode]# Now, let's ask the tree to use our custom species naming functiont.set_species_naming_function(get_species_name)print "Custom mode:"for n in t.get_leaves(): print "node:", n.name, "Species name:", n.species# node: Dme_001 Species name: Drosophila melanogaster# node: Dme_002 Species name: Drosophila melanogaster# node: Hsa_001 Species name: hom*o sapiens# node: Ptr_001 Species name: Pan troglodytes# node: Cfa_001 Species name: Canis familiaris# node: Mms_001 Species name: Mus musculus

Manual control of the species info

To disable the automatic generation of species names based on nodenames, a None value can be passed to thePhyloNode.set_species_naming_function() function. From then on,species attribute will not be automatically updated based on the nameof nodes and it could be controlled manually.

from ete2 import PhyloTree# Reads a phylogenetic treet = PhyloTree("(((Hsa_001,Ptr_001),(Cfa_001,Mms_001)),(Dme_001,Dme_002));")# Of course, you can disable the automatic generation of species# names. To do so, you can set the species naming function to# None. This is useful to set the species names manually or for# reading them from a newick file. Other wise, species attribute would# be overwritenmynewick = """(((Hsa_001[&&NHX:species=Human],Ptr_001[&&NHX:species=Chimp]),(Cfa_001[&&NHX:species=Dog],Mms_001[&&NHX:species=Mouse])),(Dme_001[&&NHX:species=Fly],Dme_002[&&NHX:species=Fly]));"""t = PhyloTree(mynewick, sp_naming_function=None)print "Disabled mode (manual set)"for n in t.get_leaves(): print "node:", n.name, "Species name:", n.species# node: Dme_001 Species name: Fly# node: Dme_002 Species name: Fly# node: Hsa_001 Species name: Human# node: Ptr_001 Species name: Chimp# node: Cfa_001 Species name: Dog# node: Mms_001 Species name: Mouse

Full Example: Species aware trees.

Detecting evolutionary events

There are several ways to automatically detect duplication andspeciation nodes. ETE provides two methodologies: One implements thealgorithm described in Huerta-Cepas (2007) and is based on the speciesoverlap (SO) between partitions and thus does not depend on theavailability of a species tree. The second, which requires thecomparison between the gene tree and a previously defined speciestree, implements a strict tree reconciliation algorithm (Page andCharleston, 1997). By detecting evolutionary events, orthology andparalogy relationships among sequences can also be inferred. Find acomparison of both methods in Marcet-Houben and Gabaldon (2009).

Species Overlap (SO) algorithm

In order to apply the SO algorithm, you can use thePhyloNode.get_descendant_evol_events() method (it will detectall evolutionary events under the current node) or thePhyloNode.get_my_evol_events() method (it will detect only theevolutionary events in which current node, a leaf, is involved).

By default the species overlap score (SOS) threshold is set to0.0, which means that a single species in common between two nodebranches will rise a duplication event. This has been shown to performthe best with real data, however you can adjust the threshold usingthe sos_thr argument present in both methods.

from ete2 import PhyloTree# Loads an example treenw = """((Dme_001,Dme_002),(((Cfa_001,Mms_001),((Hsa_001,Ptr_001),Mmu_001)),(Ptr_002,(Hsa_002,Mmu_002))));"""t = PhyloTree(nw)print t# /-Dme_001# /--------|# | \-Dme_002# |# | /-Cfa_001# | /--------|#---------| | \-Mms_001# | /--------|# | | | /-Hsa_001# | | | /--------|# | | \--------| \-Ptr_001# \--------| |# | \-Mmu_001# |# | /-Ptr_002# \--------|# | /-Hsa_002# \--------|# \-Mmu_002## To obtain all the evolutionary events involving a given leaf node we# use get_my_evol_events methodmatches = t.search_nodes(name="Hsa_001")human_seq = matches[0]# Obtains its evolutionary eventsevents = human_seq.get_my_evol_events()# Print its orthology and paralogy relationshipsprint "Events detected that involve Hsa_001:"for ev in events: if ev.etype == "S": print ' ORTHOLOGY RELATIONSHIP:', ','.join(ev.in_seqs), "<====>", ','.join(ev.out_seqs) elif ev.etype == "D": print ' PARALOGY RELATIONSHIP:', ','.join(ev.in_seqs), "<====>", ','.join(ev.out_seqs)# Alternatively, you can scan the whole tree topologyevents = t.get_descendant_evol_events()# Print its orthology and paralogy relationshipsprint "Events detected from the root of the tree"for ev in events: if ev.etype == "S": print ' ORTHOLOGY RELATIONSHIP:', ','.join(ev.in_seqs), "<====>", ','.join(ev.out_seqs) elif ev.etype == "D": print ' PARALOGY RELATIONSHIP:', ','.join(ev.in_seqs), "<====>", ','.join(ev.out_seqs)# If we are only interested in the orthology and paralogy relationship# among a given set of species, we can filter the list of sequences## fseqs is a function that, given a list of sequences, returns only# those from human and mousefseqs = lambda slist: [s for s in slist if s.startswith("Hsa") or s.startswith("Mms")]print "Paralogy relationships among human and mouse"for ev in events: if ev.etype == "D": # Prints paralogy relationships considering only human and # mouse. Some duplication event may not involve such species, # so they will be empty print ' PARALOGY RELATIONSHIP:', \ ','.join(fseqs(ev.in_seqs)), \ "<====>",\ ','.join(fseqs(ev.out_seqs))# Note that besides the list of events returned, the detection# algorithm has labeled the tree nodes according with the# predictions. We can use such lables as normal node features.dups = t.search_nodes(evoltype="D") # Return all duplication nodes

Tree reconciliation algorithm

Tree reconciliation algorithm uses a predefined species tree to inferall the necessary genes losses that explain a given gene treetopology. Consequently, duplication and separation nodes will strictlyfollow the species tree topology.

To perform a tree reconciliation analysis over a given node in amolecular phylogeny you can use the PhyloNode.reconcile()method, which requires a species PhyloTree as its firstargument. Leaf node names in the the species are expected to be thesame species codes in the gene tree (seetaxonomic_info). All species codes present in thegene tree should appear in the species tree.

As a result, the PhyloNode.reconcile() method will label theoriginal gene tree nodes as duplication or speciation, will return thelist of inferred events, and will return a new reconcilied tree(PhyloTree instance), in which inferred gene losses arepresent and labeled.

from ete2 import PhyloTree# Loads a gene tree and its corresponding species tree. Note that# species names in sptree are the 3 firs letters of leaf nodes in# genetree.gene_tree_nw = '((Dme_001,Dme_002),(((Cfa_001,Mms_001),((Hsa_001,Ptr_001),Mmu_001)),(Ptr_002,(Hsa_002,Mmu_002))));'species_tree_nw = "((((Hsa, Ptr), Mmu), (Mms, Cfa)), Dme);"genetree = PhyloTree(gene_tree_nw)sptree = PhyloTree(species_tree_nw)print genetree# /-Dme_001# /--------|# | \-Dme_002# |# | /-Cfa_001# | /--------|#---------| | \-Mms_001# | /--------|# | | | /-Hsa_001# | | | /--------|# | | \--------| \-Ptr_001# \--------| |# | \-Mmu_001# |# | /-Ptr_002# \--------|# | /-Hsa_002# \--------|# \-Mmu_002## Let's reconcile our genetree with the species treerecon_tree, events = genetree.reconcile(sptree)# a new "reconcilied tree" is returned. As well as the list of# inferred events.print "Orthology and Paralogy relationships:"for ev in events: if ev.etype == "S": print 'ORTHOLOGY RELATIONSHIP:', ','.join(ev.inparalogs), "<====>", ','.join(ev.orthologs) elif ev.etype == "D": print 'PARALOGY RELATIONSHIP:', ','.join(ev.inparalogs), "<====>", ','.join(ev.outparalogs)# And we can explore the resulting reconciled treeprint recon_tree# You will notice how the reconcilied tree is the same as the gene# tree with some added branches. They are inferred gene losses.### /-Dme_001# /--------|# | \-Dme_002# |# | /-Cfa_001# | /--------|# | | \-Mms_001#---------| /--------|# | | | /-Hsa_001# | | | /--------|# | | \--------| \-Ptr_001# | | |# | | \-Mmu_001# \--------|# | /-Mms# | /--------|# | | \-Cfa# | |# | | /-Hsa# \--------| /--------|# | /--------| \-Ptr_002# | | |# | | \-Mmu# \--------|# | /-Ptr# | /--------|# \--------| \-Hsa_002# |# \-Mmu_002## And we can visualize the trees using the default phylogeny# visualization layoutgenetree.show()recon_tree.show()

A closer look to the evolutionary event object

Both methods, species overlap and tree reconciliation, can be used tolabel each tree node as a duplication or speciation event. Thus, thePhyloNode.evoltype attribute of every node will be set to oneof the following states: D (Duplication), S (Speciation) orL gene loss.

Additionally, a list of all the detected events is returned. Eachevent is a python object of type phylo.EvolEvent, containingsome basic information about each event ( etype,in_seqs, out_seqs, node):

If an event represents a duplication, in_seqs are allparalogous to out_seqs. Similarly, if an event represents aspeciation, in_seqs are all orthologous to out_seqs.

Relative dating phylogenetic nodes

In molecular phylogeny, nodes can be interpreted as evolutionaryevents. Therefore, they represent duplication or speciation events. Inthe case of gene duplication events, nodes can also be assigned to acertain point in a relative temporal scale. In other words, you canobtain a relative dating of all the duplication events detected.

Although absolute dating is always preferred and more precise,topological dating provides a faster approach to compare the relativeage of paralogous sequences (read this fora comparison with other methods, such as the use of synonymoussubstitution rates as a proxy to the divergence time).

Some applications of topological dating can be found in Huerta-Cepaset al, 2007 or, morerecently, in Huerta-Cepas et al, 2011 orKalinka et al, 2001.

Implementation

The aim of relative dating is to establish a gradient of ages amongsequences. For this, a reference species needs to be fixed, so thegradient of ages will be referred to that referent point.

Thus, if our reference species is Human, we could establish thefollowing gradient of species:

  • (1) Human -> (2) Other Primates -> (3) Mammals -> (4) Vertebrates

So, nodes in a tree can be assigned to one of the above categoriesdepending on the sequences grouped. For instance:

  • A node with only human sequences will be mapped to (1).
  • A node with human and orangutan sequences will be mapped to (2)
  • A node with human a fish sequences will be mapped to (4)

This simple calculation can be done automatically by encoding thegradient of species ages as Python dictionary.

relative_dist = { "human": 0, # human "chimp": 1, # Primates non human "rat": 2, # Mammals non primates "mouse": 2, # Mammals non primates "fish": 3 # Vertebrates non mammals }

Once done, ETE can check the relative age of any tree node. ThePhyloNode.get_age() method can be used to that purpose.

For example, let’s consider the following gene tree:

# /-humanA# /---|# | \-chimpA# /Dup1# | | /-humanB# /---| \---|# | | \-chimpB# /---| |# | | \-mouseA# | |# | \-fish#-Dup3# | /-humanC# | /---|# | /---| \-chimpC# | | |# \Dup2 \-humanD# |# | /-ratC# \---|# \-mouseC

the expected node dating would be:

  • Dup1 will be assigned to primates (most distant species ischimp). Dup1.get_age(relative_distances) will return 1
  • Dup2 will be assigned to mammals [2] (most distant species are ratand mouse). Dup2.get_age(relative_distances) will return 2
  • Dup3 will be assigned to mammals [3] (most distant species isfish). Dup3.get_age(relative_distances) will return 3
from ete2 import PhyloTree# Creates a gene phylogeny with several duplication events at# different levels. Note that we are using the default method for# detecting the species code of leaves (three first lettes in the node# name are considered the species code).nw = """((Dme_001,Dme_002),(((Cfa_001,Mms_001),((((Hsa_001,Hsa_003),Ptr_001),Mmu_001),((Hsa_004,Ptr_004),Mmu_004))),(Ptr_002,(Hsa_002,Mmu_002))));"""t = PhyloTree(nw)print "Original tree:",print t## /-Dme_001# /--------|# | \-Dme_002# |# | /-Cfa_001# | /--------|# | | \-Mms_001# | |#--| | /-Hsa_001# | | /--------|# | /--------| /--------| \-Hsa_003# | | | | |# | | | /--------| \-Ptr_001# | | | | |# | | | | \-Mmu_001# | | \--------|# \--------| | /-Hsa_004# | | /--------|# | \--------| \-Ptr_004# | |# | \-Mmu_004# |# | /-Ptr_002# \--------|# | /-Hsa_002# \--------|# \-Mmu_002# Create a dictionary with relative ages for the species present in# the phylogenetic tree. Note that ages are only relative numbers to# define which species are older, and that different species can# belong to the same age.species2age = { 'Hsa': 1, # hom*o sapiens (Hominids) 'Ptr': 2, # P. troglodytes (primates) 'Mmu': 2, # Macaca mulata (primates) 'Mms': 3, # Mus musculus (mammals) 'Cfa': 3, # Canis familiaris (mammals) 'Dme': 4 # Drosophila melanogaster (metazoa)}# We can translate each number to its correspondig taxonomic numberage2name = { 1:"hominids", 2:"primates", 3:"mammals", 4:"metazoa"}event1= t.get_common_ancestor("Hsa_001", "Hsa_004")event2=t.get_common_ancestor("Hsa_001", "Hsa_002")printprint "The duplication event leading to the human sequences Hsa_001 and "+\ "Hsa_004 is dated at: ", age2name[event1.get_age(species2age)]print "The duplication event leading to the human sequences Hsa_001 and "+\ "Hsa_002 is dated at: ", age2name[event2.get_age(species2age)]# The duplication event leading to the human sequences Hsa_001 and Hsa_004# is dated at: primates## The duplication event leading to the human sequences Hsa_001 and Hsa_002# is dated at: mammals

Warning

Note that relative distances will vary depending on your referencespecies.

Automatic rooting (outgroup detection)

Two methods are provided to assist in the automatic rooting ofphylogenetic trees. Since tree nodes contain relative age information(based on the species code autodetection), the same relative agedictionaries can be used to detect the farthest and oldest node in atree to given sequences.

PhyloNode.get_farthest_oldest_node() andPhyloNode.get_farthest_oldest_leaf() can be used for thatpurpose.

Working with duplicated gene families

Treeko (splitting gene trees into species trees)

Comparisons between tree topologies provide important information formany evolutionary studies. Treeko(Marcet and Gabaldon, 2011 ) is a novel methodthat allows the comparison of any two tree topologies, even those withmissing leaves and duplications. This is important in genome-wideanalysis since many trees do not have exact leaf pairings andtherefore most tree comparison methods are rendered useless.

Although Treeko is available as a standalone package, it uses ETE togenerate all possible species tree topologies within a duplicated genefamily tree.

Thus, the ETE method PhyloNode.get_speciation_trees() isexpected to provide the core functionality required to perform aTreeko analysis. When used, the method will return a list of allpossible species trees observed after combining the differentnon-duplicated subparts under a gene family tree node.

Duplication events will be automatically identified using the speciesoverlap algorithm described within this manual. However, duplicationnodes can be manually labeled and used by disabling theautodetect_duplication flag.

Because of the combinatorial background of the Treeko method, thenumber of speciation trees generated by this function may varyenormously (ranging from few hundreds to tens of thousands topologies).

Here is a basic example on how to use it:

from ete2 import PhyloTreet = PhyloTree("((((Human_1, Chimp_1), (Human_2, (Chimp_2, Chimp_3))), ((Fish_1, (Human_3, Fish_3)), Yeast_2)), Yeast_1);")t.set_species_naming_function(lambda node: node.name.split("_")[0] )print t.get_ascii(attributes=["name", "species"], show_internal=False )# /-Human_1, Human# /-|# | \-Chimp_1, Chimp# /-|# | | /-Human_2, Human# | \-|# | | /-Chimp_2, Chimp# | \-|# /-| \-Chimp_3, Chimp# | |# | | /-Fish_1, Fish# | | /-|# | | | | /-Human_3, Human# --| \-| \-|# | | \-Fish_3, Fish# | |# | \-Yeast_2, Yeast# |# \-Yeast_1, Yeast# We obtain a list of species trees inferred from the duplication# events. Note that species specific duplications are ignored.ntrees, ndups, sptrees = t.get_speciation_trees()print "Found %d species trees and %d duplication nodes" %(ntrees, ndups)for spt in sptrees: print spt# Found 5 species trees and 4 duplication nodes## /-Human_1# --|# \-Chimp_1## /-Human_2# --|# | /-Chimp_2# \-|# \-Chimp_3## /-Fish_1# --|# \-Yeast_2## /-Human_3# /-|# --| \-Fish_3# |# \-Yeast_2## --Yeast_1

Note

For performance reasons, species trees are created without any linkto the original gene family tree, rather than the species name ofeach node. However, the map_features attribute can be usedto keep certain attributes of the original tree into the generatedspecies trees.

Note

Although the efficiency of the method to generate all possibletrees has been significantly improved from ETE version 2.2,creating thousands of new PhyloTree objects could affectperformance. The flag newick_only is now available to limitthe output to a newick string per generated tree, thus improvingthe speed they can be processed or dumped into a file.

Splitting gene trees by duplication events

A much simpler approach to separate duplicates within the same genefamily tree is to split the topology by their duplication nodes. Forthis, the method PhyloNode.split_by_dups() is provided.

from ete2 import PhyloTreet = PhyloTree("((((Human_1, Chimp_1), (Human_2, (Chimp_2, Chimp_3))), ((Fish_1, (Human_3, Fish_3)), Yeast_2)), Yeast_1);")t.set_species_naming_function(lambda node: node.name.split("_")[0] )print t.get_ascii(attributes=["name", "species"], show_internal=False )# /-Human_1, Human# /-|# | \-Chimp_1, Chimp# /-|# | | /-Human_2, Human# | \-|# | | /-Chimp_2, Chimp# | \-|# /-| \-Chimp_3, Chimp# | |# | | /-Fish_1, Fish# | | /-|# | | | | /-Human_3, Human# --| \-| \-|# | | \-Fish_3, Fish# | |# | \-Yeast_2, Yeast# |# \-Yeast_1, Yeast# Again, species specific duplications are ignoredfor node in t.split_by_dups(): print node# /-Human_1# --|# \-Chimp_1## /-Human_2# --|# | /-Chimp_2# \-|# \-Chimp_3## --Yeast_2## --Fish_1## /-Human_3# --|# \-Fish_3## --Yeast_1

Collapse species specific duplications

The method PhyloNode.collapse_lineage_specific_expansions()method, which returns a pruned version of a tree, where nodesrepresenting lineage specific expansions are converted into a singleleaf node is also available.

From the previous examples, the lineage specific duplication ofChimp_1 and Chimp_2 could be easily collapsed into a single node.

from ete2 import PhyloTreet = PhyloTree("((((Human_1, Chimp_1), (Human_2, (Chimp_2, Chimp_3))), ((Fish_1, (Human_3, Fish_3)), Yeast_2)), Yeast_1);")t.set_species_naming_function(lambda node: node.name.split("_")[0] )print t.get_ascii(attributes=["name", "species"], show_internal=False )# /-Human_1, Human# /-|# | \-Chimp_1, Chimp# /-|# | | /-Human_2, Human# | \-|# | | /-Chimp_2, Chimp# | \-|# /-| \-Chimp_3, Chimp# | |# | | /-Fish_1, Fish# | | /-|# | | | | /-Human_3, Human# --| \-| \-|# | | \-Fish_3, Fish# | |# | \-Yeast_2, Yeast# |# \-Yeast_1, Yeastt2 = t.collapse_lineage_specific_expansions()print t2.get_ascii(attributes=["name", "species"], show_internal=False )# /-Human_1, Human# /-|# | \-Chimp_1, Chimp# /-|# | | /-Human_2, Human# | \-|# | \-Chimp_2, Chimp ***# /-|# | | /-Fish_1, Fish# | | /-|# | | | | /-Human_3, Human# --| \-| \-|# | | \-Fish_3, Fish# | |# | \-Yeast_2, Yeast# |# \-Yeast_1, Yeast
Phylogenetic Trees — A Python Environment for (phylogenetic) Tree Exploration (2024)

References

Top Articles
Freightliner Business Class M2 Fault Codes List - ABS | Truckmanualshub.com
How To Clear Freightliner Fault Codes? (Solved For Newbies!) – FreightViking.com
Obituary for Mark E. Rimer at Hudson-Rimer Funeral Chapel
Wym Urban Dictionary
Aflac on LinkedIn: Aflac Supplemental Insurance | 22 comments
Mychart.texaschildrens.org.mychart/Billing/Guest Pay
Celebrity Guest Tape Free
How to cancel subscriptions on your iPhone through the Settings app
Joann Ally Employee Portal
Craigslist Free Stuff Columbus Ga
Bekijk hier het rouwregister van Uitvaartzorg FSK
Behind The Scenes Of White Christmas (1954) - Casting, Choreography, Costumes, And Music | TrainTracksHQ
National Weather Denver Co
20 Cozy and Creative Fall Front Porch Ideas to Welcome the Season in Style
Ironman Kona Tracker
Roilog Com Payment
Employment Vacancies - Find Jobs with our recruitment team
Minnesota Gophers Highlights
Dovob222
Nissan Rogue Tire Size
Hood County Buy Sell And Trade
Hannah Nichole Kast Twitter
Cal Poly San Luis Obispo Catalog
Accuweather Mold Count
Dimbleby Funeral Home
Meine Erfahrung mit Textbroker als Autor (inkl. Beispiel zu Verdienst)
Idaho Falls Temple Prayer Roll
Elfqrindiscard
Melanin - Altmeyers Enzyklopädie - Fachbereich Dermatologie
Kickflip Seeds
Dumb Money Showtimes Near Showcase Cinema De Lux Legacy Place
Culver's Flavor Of The Day Taylor Dr
Realidades 2 Capitulo 2B Answers
Alex Galindo And Leslie Quezada Net Worth 2022
Phasmophobia Do As I Command Challenge
Nationsotc.com/Bcbsri
Tighe Hamilton Hudson Ma Obituary
Wells Fargo Arena Des Moines Seating Chart Virtual View
Tattered Paws And Golden Hearts Rescue
Clarksburg Wv Craigslist Personals
Montrose Colorado Sheriff's Department
South Carolina Craigslist Motorcycles
Craigslist Sf Bay Free Stuff
marie claire Australia January 2016 Robyn Lawley, Rachel Taylor, Sarah Snook • EUR 11,50
Intel Core i3-4130 - CM8064601483615 / BX80646I34130
Cetaphil Samples For Providers
Neo Geo Bios Raspberry Pi 3
Katmovie.hs
Autozone On 7 Mile And Hubbell
Azpeople Self Service
Craigslist Pets Olympia
The Ultimate Guide to Newquay Surf - Surf Atlas
Latest Posts
Article information

Author: Edmund Hettinger DC

Last Updated:

Views: 6141

Rating: 4.8 / 5 (58 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Edmund Hettinger DC

Birthday: 1994-08-17

Address: 2033 Gerhold Pine, Port Jocelyn, VA 12101-5654

Phone: +8524399971620

Job: Central Manufacturing Supervisor

Hobby: Jogging, Metalworking, Tai chi, Shopping, Puzzles, Rock climbing, Crocheting

Introduction: My name is Edmund Hettinger DC, I am a adventurous, colorful, gifted, determined, precious, open, colorful person who loves writing and wants to share my knowledge and understanding with you.