PhyKIT, a toolkit for the UNIX shell environment with numerous functions that process multiple sequence alignments and phylogenies for broad applications
If you found PhyKIT useful, please cite PhyKIT: a broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data. Bioinformatics. doi: 10.1093/bioinformatics/btab096.
Quick Start
These two lines represent the simplest method to rapidly install and run PhyKIT.
# install
pip install phykit
# run
phykit -h
1) Installation
To install using pip, we strongly recommend building a virtual environment to avoid software dependency issues. To do so, execute the following commands:
# create virtual environment
python -m venv venv
# activate virtual environment
source venv/bin/activate
# install phykit
pip install phykit
Note, the virtual environment must be activated to use PhyKIT.
After using PhyKIT, you may wish to deactivate your virtual environment and can do so using the following command:
# deactivate virtual environment
deactivate
Similarly, to install from source, we strongly recommend using a virtual environment. To do so, use the following commands:
# download
git clone https://github.com/JLSteenwyk/PhyKIT.git
cd PhyKIT/
# create virtual environment
python -m venv venv
# activate virtual environment
source venv/bin/activate
# install
make install
To deactivate your virtual environment, use the following command:
# deactivate virtual environment
deactivate
Note, the virtual environment must be activated to use PhyKIT.
To install via anaconda, execute the following command:
conda install bioconda::phykit
Visit here for more information: https://anaconda.org/bioconda/phykit
2) Usage
Get the help message from PhyKIT:
phykit -h
- About
- Usage
- Quick start
- General usage
- Functions by analytical category
- Alignment quality & statistics
- Alignment & dataset utilities
- Tree summary statistics
- Tree manipulation & utilities
- Tree comparison & consensus
- Introgression & gene flow
- Phylogenetic signal
- Trait evolution
- Phylogenetic comparative methods
- Evolutionary rate analysis
- Homology assessment
- Saturation & model adequacy
- Alignment-based functions
- Alignment entropy
- Alignment length
- Alignment length no gaps
- Alignment outlier taxa
- Alignment recoding
- Alignment subsampling
- Column score
- Composition per taxon
- Compositional bias per site
- Create concatenation matrix
- Evolutionary Rate per Site
- Faidx
- Guanine-cytosine (GC) content
- Identity matrix
- Mask alignment
- Occupancy filter
- Occupancy per taxon
- Pairwise identity
- Parsimony informative sites
- Phylo GWAS
- Plot alignment QC
- Protein-to-nucleotide alignment
- Relative composition variability
- Relative composition variability, taxon
- Rename FASTA entries
- Sum-of-pairs score
- Taxon groups
- Variable sites
- Tree-based functions
- Ancestral state reconstruction
- Bipartition support statistics
- Branch length multiplier
- Character map (synapomorphy/homoplasy mapping)
- Chronogram
- Collapse bipartitions
- Concordance-aware ancestral state reconstruction
- Consensus network
- Consensus tree
- Continuous trait evolution model comparison (fitContinuous)
- Continuous trait mapping (contMap)
- Cophylogenetic plot (tanglegram)
- Covarying evolutionary rates
- D-statistic (ABBA-BABA test)
- Degree of violation of the molecular clock
- Density map
- DFOIL test (Pease & Hahn 2015)
- Discordance asymmetry
- Discrete trait evolution model comparison (fitDiscrete)
- Disparity through time (DTT)
- Evolutionary rate
- Evolutionary tempo mapping
- Faith's phylogenetic diversity
- Hidden paralogy check
- Hybridization analysis
- Independent contrasts (PIC)
- Internal branch statistics
- Internode labeler
- Kuhner-Felsenstein distance
- Last common ancestor subtree
- Lineage-through-time plot and gamma statistic
- Long branch score
- Monophyly check
- Multi-regime OU models (OUwie)
- Nearest neighbor interchange
- NeighborNet
- Network signal
- OU shift detection (l1ou)
- Parsimony score
- Patristic distances
- Phenogram (traitgram)
- Phylogenetic ANOVA / MANOVA
- Phylogenetic GLM
- Phylogenetic heatmap
- Phylogenetic imputation
- Phylogenetic Logistic Regression
- Phylogenetic Ordination
- Phylogenetic path analysis
- Phylogenetic regression (PGLS)
- Phylogenetic signal
- Phylomorphospace
- Polytomy testing
- Print tree
- Prune tree
- Quartet network
- Quartet pie chart
- Rate heterogeneity test (multi-rate Brownian motion)
- Rename tree tips
- Robinson-Foulds distance
- Root tree
- SIMMAP summary
- Spectral discordance decomposition
- Spurious homolog identification
- Stochastic character mapping (SIMMAP)
- Subtree pruning and regrafting
- Terminal branch statistics
- Threshold model
- Tip labels
- Tip-to-tip distance
- Tip-to-tip node distance
- Total tree length
- Trait correlation
- Trait rate map
- Transfer annotations
- Tree space visualization
- Treeness
- Alignment- and tree-based functions
- Tutorials
- 1. Summarizing information content
- 2. Evaluating gene-gene covariation
- 3. Identifying signatures of rapid radiations
- 4. Evaluating the accuracy of a multiple sequence alignment
- 5. Mapping the evolutionary history of discrete traits
- 6. Testing for phylogenetic signal in continuous traits
- 7. Phylogenetic ordination for multivariate trait analysis
- 8. Visualizing trait evolution with phylomorphospace
- 9. Phylogenetic regression (PGLS)
- 10. Phylogenetic GLM for binary and count data
- 11. Reconstructing ancestral trait values and mapping them onto a phylogeny
- Step 0: Prepare data
- Step 1: Run fast ancestral reconstruction with confidence intervals
- Step 2: Use the VCV-based ML method
- Step 3: Generate a contMap plot
- Step 4: Use a multi-trait file
- Step 5: Export results as JSON
- Step 6: Reconstruct discrete traits
- Step 7: Choose a discrete model
- Step 8: Plot discrete ancestral states
- Summary
- 12. Spectral discordance decomposition
- 13. Testing for rate heterogeneity across phylogenetic regimes
- 14. Visualization commands
- 15. Comparing continuous trait evolution models
- 16. Multi-regime OU models (OUwie)
- 17. Automatic detection of adaptive shifts on a phylogeny
- 18. Visualizing conflicting phylogenetic signal with splits networks
- 19. End-to-end comparative methods workflow
- 20. Gene tree discordance analysis pipeline
- Scenario
- Overview
- Data files used in this tutorial:
- Step 1: Visualize gene tree conflict with a splits network
- Step 2: Test for rate-topology associations with evolutionary tempo mapping
- Step 3: Test for asymmetric discordance (gene flow detection)
- Step 4: Identify diversification patterns with LTT
- Step 5: Concordance-aware ancestral state reconstruction
- Step 6: Discordance-aware comparative methods
- Putting it all together
- 21. Customizing phylogenetic plots
- Change log
- Other software
- FAQ