A Frequently asked questions

A.1 Installation

ggtree is released within the Bioconductor project, you need to use BiocManager to install it.

## you need to install BiocManager before using it
## install.packages("BiocManager")
library(BiocManager)
install("ggtree")

Bioconductor release is adhere to specific R version. Please make sure you are using latest version of R if you want to install the latest release of Bioconductor packages, including ggtree. Beware that bugs will only be fixed in current release and develop branches. If you find a bug, please follow the guide21 to report it.

A.2 Basic R related

A.2.1 Use your local file

If you are new to R and want to use ggtree for tree visualization, please do learn some basic R and ggplot2.

A very common issue is that users always copy-paste command without looking at the function’s behavior. system.file() was used in the treeio and ggtree package documentation to find files in the packages.

system.file                package:base                R Documentation

Find Names of R System Files

Description:

     Finds the full file names of files in packages etc.

Usage:

     system.file(..., package = "base", lib.loc = NULL,
                 mustWork = FALSE)

For users who want to use their own files, please just use relative or absolute file path (e.g. f = "your/folder/filename").

A.3 Aesthetic mapping

A.3.1 Inherit aes

ggtree(rtree(30)) + geom_point()

For example, we can add symbolic points to nodes with geom_point() directly. The magic here is we don’t need to map x and y position of the points by providing aes(x, y) to geom_point() since it was already mapped by ggtree function and it serves as a global mapping for all layers.

But what if we provide a dataset in a layer and the dataset doesn’t contain column of x and/or y, the layer function also try to map x and y and also others if you map them in ggtree function. As these variable is not available in your dataset, you will get the following error:

Error in eval(expr, envir, enclos) : object 'x' not found

This can be fixed by using parameter inherit.aes=FALSE which will disable inheriting mapping from ggtree function.

A.3.2 Never use $ in aes

NEVER DO THIS22.

See the explaination in the ggplot2 book 2ed:

Never refer to a variable with $ (e.g., diamonds$carat) in aes(). This breaks containment, so that the plot no longer contains everything it needs, and causes problems if ggplot2 changes the order of the rows, as it does when facetting.

A.4 Text & Label

A.4.1 Tip label truncated

ggplot2 can’t auto adjust xlim based on added text23.

library(ggtree)
## example tree from https://support.bioconductor.org/p/72398/
tree <- read.tree(text= paste("(Organism1.006G249400.1:0.03977,(Organism2.022118m:0.01337,", 
             "(Organism3.J34265.1:0.00284,Organism4.G02633.1:0.00468)0.51:0.0104):0.02469);"))
p <- ggtree(tree) + geom_tiplab()  

In this example, the tip labels displayed on Figure A.1A are truncated. This is because the units are in two different spaces (data and pixel). Users can use xlim to allocate more spaces for tip labels (Figure A.1B).

p + xlim(0, 0.08)

Another solution is to set clip = "off" to allow drawing outside of the plot panel. We may also need to set plot.margin to allocate more spaces for margin (Figure A.1C).

p + coord_cartesian(clip = 'off') + 
  theme_tree2(plot.margin=margin(6, 120, 6, 6))
Allocating more spaces for truncated tip lables. Long tip lables may be truncated (A). One solution is to allocate more spaces for plot panel (B) and another solution is to allow plotting labels outside the plot panel (C).

Figure A.1: Allocating more spaces for truncated tip lables. Long tip lables may be truncated (A). One solution is to allocate more spaces for plot panel (B) and another solution is to allow plotting labels outside the plot panel (C).

A.4.2 Modify (tip) labels

If you want to modify tip labels of the tree, you can use treeio::rename_taxa() to rename a phylo or treedata object.

tree <- read.tree(text = "((A, B), (C, D));")
d <- data.frame(label = LETTERS[1:4], 
                label2 = c("sunflower", "tree", "snail", "mushroom"))

## rename_taxa use 1st column as key and 2nd column as value by default                
## rename_taxa(tree, d)
rename_taxa(tree, d, label, label2) %>% write.tree
## [1] "((sunflower,tree),(snail,mushroom));"

If the input tree object is a treedata instance, you can use write.beast() to export the tree with with associated data to a BEAST compatible NEXUS file.

Renaming phylogeny tip labels seems not be a good idea, since it may introduce problems when mapping the original sequence alignment to the tree. Personally, I recommend to store the new labels as a tip annotation in treedata object.

tree2 <- full_join(tree, d, by = "label")
tree2
## 'treedata' S4 object'.
## 
## ...@ phylo: 
## Phylogenetic tree with 4 tips and 3 internal nodes.
## 
## Tip labels:
## [1] "A" "B" "C" "D"
## 
## Rooted; no branch lengths.
## 
## with the following features available:
##  'label2'.

If you just want to show different or additional information when plotting the tree, you don’t need to modify tip labels. This could be easily done via the %<+% operator to attach the modified version of the labels and than use geom_tiplab to display the modified version (Figure A.2).

p <- ggtree(tree) + xlim(NA, 3)
p1 <- p + geom_tiplab()

## the following command will produce identical figure of p2
## ggtree(tree2) + geom_tiplab(aes(label = label2))
p2 <- p %<+% d + geom_tiplab(aes(label=label2))
cowplot::plot_grid(p1, p2, ncol=2, labels = c("A", "B"))
Alternative tip labels. Original tip lables (A) and modified version (B).

Figure A.2: Alternative tip labels. Original tip lables (A) and modified version (B).

A.4.3 Formatting (tip) labels

If you want to format labels, you need to set parse=TRUE in geom_text/geom_tiplab and the label should be string that can be parsed into expression and displayed as described in ?plotmath.

For example, the tip labels contains two parts, species name and accession number and we want to display species name in italic, we can use command like this to format specific tip/node label (Figure A.3A):

set.seed(2019-06-24)
tree <- rtree(30)
p1 <- ggtree(tree) + 
  geom_tiplab(aes(subset=node==35), 
                  label='paste(italic("species name"), 
                              " accession number")', 
              parse=T) + xlim(0, 6)

Another example for formating all tip labels is demonstrated in Figure A.3B:

p2 <- ggtree(tree) + 
  geom_tiplab(aes(label=paste0('bold(', label, 
                  ')~italic(', node, ')')), 
            parse=TRUE) + xlim(0, 5)

The label can be provided by a data.frame that contains related information of the taxa (Figure A.3C).

tree <- read.tree(text = "((a,(b,c)),d);")
genus <- c("Gorilla", "Pan", "Homo", "Pongo")
species <- c("gorilla", "spp.", "sapiens", "pygmaeus")
geo <- c("Africa", "Africa", "World", "Asia")
d <- data.frame(label = tree$tip.label, genus = genus,
                species = species, geo = geo)

p3 <- ggtree(tree) %<+% d + xlim(NA, 6) +
    geom_tiplab(aes(label=paste0('italic(', genus, 
                  ')~bolditalic(', species, ')~', geo)), 
                parse=T)
cowplot::plot_grid(p1, p2, p3, ncol=3, labels = LETTERS[1:3])                
Formatting labels. Formatting specific tip/node label (A), all tip labels (B & C).

Figure A.3: Formatting labels. Formatting specific tip/node label (A), all tip labels (B & C).

A.4.4 Avoid overlapping text labels

User can use ggrepel package to repel overlapping text labels24. .

For example:

library(ggrepel)
library(ggtree)
raxml_file <- system.file("extdata/RAxML", "RAxML_bipartitionsBranchLabels.H3", package="treeio")
raxml <- read.raxml(raxml_file)
ggtree(raxml) + geom_label_repel(aes(label=bootstrap, fill=bootstrap)) + 
  theme(legend.position = c(.1, .8)) + scale_fill_viridis_c()
Repel labels. Repel labels to avoid overlapping.

Figure A.4: Repel labels. Repel labels to avoid overlapping.

A.4.5 Bootstrap values from newick format

It’s quite command to store bootstrap value as node label in newick format. Visualizing node label is easy using geom_text2(aes(subset = !isTip, label=label)).

If you want to only display a subset of bootstrap (e.g. bootstrap > 80), you can’t simply using geom_text2(subset= (label > 80), label=label) since label is a character vector, which contains node label (bootstrap value) and tip label (taxa name). If we use geom_text2(subset=(as.numeric(label) > 80), label=label), it will also fail since NAs were introduced by coercion. We need to convert NAs to logical FALSE, this can be done by the following code:

nwk <- system.file("extdata/RAxML","RAxML_bipartitions.H3", package='ggtree')
tr <- read.tree(nwk)
ggtree(tr) + geom_text2(aes(label=label, subset = !is.na(as.numeric(label)) & as.numeric(label) > 80))

Another solution is converting the bootstrap value outside ggtree.

q <- ggtree(tr)
d <- q$data
d <- d[!d$isTip,]
d$label <- as.numeric(d$label)
d <- d[d$label > 80,]

q + geom_text(data=d, aes(label=label))

A.5 Different x labels for different facet panels

This is not supported by ggplot2 in general. However, we can just draw text labels for each panels and put the labels beyond the plot panels as demonstrated in Figure A.5.

library(ggtree)
library(ggplot2)
set.seed(2019-05-02)
x <- rtree(30)
p <- ggtree(x) + geom_tiplab()
d <- data.frame(label = x$tip.label, 
                value = rnorm(30))
p2 <- facet_plot(p, panel = "Dot", data = d, 
            geom = geom_point, mapping = aes(x = value)) 
            
p2 <- p2 + theme_bw() + 
    xlim_tree(5) + xlim_expand(c(-5, 5), 'Dot') 

d = data.frame(.panel = c('Tree', 'Dot'), 
                lab = c("Distance", "Dot Units"), 
                x=c(2.5,0), y=-2)

p2 + scale_y_continuous(limits=c(0, 31), 
                        expand=c(0,0), 
                        oob=function(x, ...) x) +
    geom_text(aes(label=lab), data=d) + 
    coord_cartesian(clip='off')  + 
    theme(plot.margin=margin(6, 6, 40, 6))
X-axis titles for different facet panels.

Figure A.5: X-axis titles for different facet panels.

A.6 Plot something behind the phylogeny

The ggtree function plot the tree structure and normally we add layers on top of the tree.

set.seed(1982)
x <- rtree(5)
p <- ggtree(x) + geom_hilight(7, alpha=1)

If we want the layers behind the tree layer, we can reverse the order of all the layers.

p$layers <- rev(p$layers)

Another solution is to use ggplot() instead of ggtree() and + geom_tree() to add the layer of tree structure at the correct position of layer stack.

ggplot(x) + geom_hilight(7, alpha=1) + geom_tree() + theme_tree()     
Add layers behind tree structure. A layer on top of the tree structure (A). Reverse layer order of A (B). Add layer behind the tree layer (C).

Figure A.6: Add layers behind tree structure. A layer on top of the tree structure (A). Reverse layer order of A (B). Add layer behind the tree layer (C).

A.7 Enlarge center space in circular/fan layout tree

This question was asked several times25, and a published example can be found in https://www.ncbi.nlm.nih.gov/pubmed/27605062. Increasing percentage of center white space in circular tree is useful to avoid overlapping tip labels and to increase readibility of the tree by moving all nodes and branches further out. This can be done simply by using +xlim() to allocate more space, just like in Figure 4.3G, or assign a long root branch that is similar to the “Root Length” parameter in FigTree.

set.seed(1982)
tree <- rtree(30)
plot_grid(
  ggtree(tree, layout='circular') + xlim(-10, NA),
  ggtree(tree, layout='circular') + geom_rootedge(5),
  labels = c("A", "B", ncol=2)
)
Enlarge center space in circular tree. Allocate more space by xlim (A) or long root branch (B).

Figure A.7: Enlarge center space in circular tree. Allocate more space by xlim (A) or long root branch (B).

A.8 Use the most distant tip from the root as the origin of the time scale

The revts will reverse the x-axis by setting the most recent tip to 0. We can use scale_x_continuous(labels=abs) to label x-axis using absolute values.

tr <- rtree(10)
p <- ggtree(tr) + theme_tree2()
p2 <- revts(p) + scale_x_continuous(labels=abs)
plot_grid(p, p2, ncol=2, labels=c("A", "B"))
Origin of the time scale. Forward: from the root to tips (A). Backward: from the most distant tip to the root (B).

Figure A.8: Origin of the time scale. Forward: from the root to tips (A). Backward: from the most distant tip to the root (B).

A.9 Changing branch length of outgroup

When outgroups are on a very long branch length (Figure A.9A), we would like to keep the out groups in the tree but ignore their branch lengths (Figure A.9B)26. This can be easily done by modifying coordination of the out groups.

x <- read.tree("data/long-branch-example.newick")
m <- MRCA(x, 75, 76)
y <- groupClade(x, m)
p <- p1 <- ggtree(y, aes(linetype = group)) + 
  geom_tiplab(size = 2) +
  theme(legend.position = 'none')
p$data[p$data$node %in% c(75, 76), "x"] <- mean(p$data$x)
plot_grid(p1, p, ncol=2)
Changing branch length of outgroup. Original tree (A) and reduced outgroup branch length version (B).

Figure A.9: Changing branch length of outgroup. Original tree (A) and reduced outgroup branch length version (B).

A.10 Edit tree graphic details

It can be hard to modify plot details for ordinary users using ggplot2/ggtree. We recommend using the export package to export ggtree output to ‘Microsoft Office’ Document and edit the tree graphic in ‘PowerPoint’.