Change log
Major changes to PhyKIT are summarized here.
2.1.31:
Added evolutionary tempo mapping command (evo_tempo_map / etm) for
detecting rate-topology associations in phylogenomic datasets:
Compares branch length distributions between concordant and discordant gene trees at each species tree branch via bipartition matching
Tests for differences using Mann-Whitney U and permutation tests with Benjamini-Hochberg FDR correction across branches
Reports global treeness (internal/total branch length ratio) comparison between concordant and discordant gene trees
Optional
--plotflag generates a grouped box/strip plot showing concordant vs. discordant branch lengths per species tree branchSupports
--jsonand-v/--verboseoutput modesCross-validated against existing PhyKIT tools: bipartition matching matches
concordance_asrgCF, treeness matchestreenesscommand, FDR correction matchesrelative_rate_testUpdated tutorial 19 (gene tree discordance pipeline) with a new evolutionary tempo mapping step including expected output and figures
Enhanced command reference documentation with example output and plot
2.1.30: Added phylogenetic effect size (R² variance decomposition) to all phylogenetic comparative method commands:
phylogenetic_signal: reports R²_phylo = 1 - (σ²_BM / σ²_WN), the fraction of trait variance explained by phylogenetic structurephylogenetic_regression(PGLS): reports three-way decomposition: R²_total (phylo + predictor), R²_pred (predictor given phylogeny), and R²_phylo (phylogeny's unique contribution)phylogenetic_glm: reports McFadden's pseudo-R² for both Poisson GEE and Logistic MPLE, computed from full vs. intercept-only model log-likelihoodsfit_continuous: reports per-model R² = 1 - (σ²_model / σ²_White), measuring how much each evolutionary model reduces unexplained variance compared to white noiserate_heterogeneity: reports R²_regime, the variance reduction from regime-specific rates vs. a single rate, weighted by tips per regimeouwie: reports per-model R² = 1 - (σ²_model / σ²_BM1), measuring improvement over the simplest Brownian motion baselineAll effect sizes appear in both text and JSON output
R validation scripts provided in
tests/r_validation/
2.1.29: Added discordance-aware VCV matrix support for phylogenetic comparative methods:
When gene trees are provided via
-g/--gene-trees, all five phylogenetic comparative method commands now compute a genome-wide average VCV matrix from per-gene-tree VCVs instead of using the species tree aloneThis accounts for incomplete lineage sorting (ILS) and introgression, giving more accurate covariance estimates for downstream analyses
Affected commands:
phylogenetic_signal,phylogenetic_regression(PGLS),fit_continuous,phylogenetic_ordination, andphylogenetic_glmAlgorithm: parse gene trees from a multi-Newick file, prune to shared taxa, build a VCV matrix from each gene tree, average them, and correct to nearest positive semi-definite matrix via eigenvalue clipping
Auto-prunes gene trees to the intersection of taxa shared across the species tree and all gene trees; errors if fewer than 3 shared taxa
JSON output includes
vcv_metadatawith number of gene trees used, number of shared taxa, and whether PSD correction was appliedWhen no gene trees are provided, behavior is unchanged (full backward compatibility)
Consolidated duplicated
_build_vcv_matrixcode from 5 service files into a sharedvcv_utilsmodule
2.1.28: Added Felsenstein (2012) threshold model for trait correlation via MCMC:
Added new
threshold_modelcommand (aliases:threshold,thresh,threshbayes,thresh_bayes) for estimating evolutionary correlations between binary discrete and/or continuous traits using a latent-liability Brownian motion modelImplements the Gibbs/Metropolis-Hastings MCMC sampler following phytools::threshBayes (Revell 2014): binary characters are modelled as continuous liabilities crossing a threshold at 0
Supports all trait combinations: discrete+continuous, discrete+discrete, and continuous+continuous
Adaptive proposal tuning during burn-in targeting ~23% acceptance rate
Output includes posterior summary (mean, median, 95% HPD) for the correlation (r), rate parameters (sigma2), and ancestral values
Optional
--plotgenerates a 3x2 figure: trace plots (left column) and posterior density histograms with 95% HPD shading (right column) for convergence diagnostics and posterior visualizationJSON output (
--json) with full posterior samples for custom analysisReproducible results via
--seedfor random number generationCross-validated against R's phytools::threshBayes: posterior means and 95% intervals agree within expected MCMC sampling variation
2.1.27: Added lineage-through-time (LTT) plot and Pybus & Harvey gamma statistic:
Added new
lttcommand (aliases:gamma_stat,gamma) for testing temporal variation in diversification ratesImplements the Pybus & Harvey (2000) gamma statistic: under constant-rate pure-birth, gamma ~ N(0,1); negative = decelerating diversification, positive = accelerating
Optional
--plot-outputgenerates a step-function LTT plot with log-scaled y-axis showing lineage accumulation through timeVerbose mode (
-v) prints branching times and full LTT dataJSON output support via
--jsonValidated against R's
ape::gammaStat()(ape v5.8.1, R 4.4.0): gamma values match to 10 decimal places across 4 test topologies (balanced 8-tip: -1.4142135624, ladder 5-tip: -0.7142857143, recent burst 10-tip: 2.2824790785, early burst 7-tip: -3.5362021857)
2.1.26: Added phylogenetic signal on networks:
Added new
network_signalcommand (aliases:netsig,net_signal) for computing Blomberg's K and Pagel's lambda on phylogenetic networks using the Bastide et al. (2018) variance-covariance algorithmAccounts for hybridization/introgression when estimating how strongly a continuous trait tracks evolutionary history
Accepts explicit hybrid edge specifications (
--hybrid donor:recipient:gamma) or auto-infers fromquartet_networkJSON output (--quartet-json)Blomberg's K on phylogenetic networks is a novel capability not available in any other tool; Pagel's lambda on networks was previously only available in Julia (PhyloNetworks.jl)
K and lambda formulas are identical to
phylogenetic_signal; only the VCV matrix differs (network VCV vs tree VCV)
2.1.25: Added Tajima's relative rate test:
Added new
relative_rate_testcommand (aliases:rrt,tajima_rrt) for testing whether two lineages evolved at equal ratesImplements Tajima's (1993) chi-squared test on unique substitution counts (m1/m2) relative to an outgroup
Single alignment mode (
-a) and batch mode (-l) for multi-gene analysisAutomatic outgroup inference from rooted tree
Bonferroni and Benjamini-Hochberg FDR multiple testing correction
JSON output support via
--jsonValidated against R's
pegas::rr.test(): chi-squared statistics and p-values match to machine precision (< 1e-8 difference)
2.1.24: Added quartet-based network inference (NANUQ-style) for distinguishing ILS from hybridization:
Added new
quartet_networkcommand (aliases:quartet_net,qnet,nanuq) for computing quartet concordance factors from gene trees and classifying each quartet as tree-like, hybrid, or unresolvedImplements the NANUQ algorithm (Allman, Baños & Rhodes 2019): star test (Pearson chi-squared against uniform 1/3) followed by T3 tree model test (G-test / likelihood ratio with conservative chi-squared df=1 p-value)
Separate
--alpha(tree test threshold, default 0.05) and--beta(star test threshold, default 0.95) parameters matching MSCquartets--plot-outputoption to generate a species tree with reticulation arcs overlaid for hybrid quartets--missing-taxa sharedsupport for trees with different taxon setsJSON output support via
--jsonAdded new CLI entry points:
pk_quartet_network,pk_quartet_net,pk_qnet,pk_nanuqValidated against R's MSCquartets v3.2
NANUQ()function:Star test (p_star) p-values match R exactly
T3 tree test (p_tree) p-values match R exactly for large samples; small-sample values are slightly conservative (e.g., 0.096 vs 0.214 for counts 8,0,2) but yield identical classifications
All 15 quartets from the sample gene tree file classified identically to R's NANUQ (100% agreement)
Counts
p_star (PK)
p_star (R)
p_tree (PK)
p_T3 (R)
(70, 15, 15)
0.0000
0.0000
1.0000
1.000
(45, 35, 20)
0.0087
0.0087
0.0418
0.042
(10, 0, 0)
0.0000
0.0000
1.0000
1.000
(8, 0, 2)
0.0055
0.0055
0.0959
0.214
Classifications agree in all cases. The p_tree difference for small samples (8,0,2) is due to MSCquartets using a specialized T3 density integration for the p-value; the conservative chi-squared(df=1) approach is a well-established approximation that improves with sample size.
2.1.22: Added consensus splits network for visualizing conflicting phylogenetic signal:
Added new
consensus_networkcommand (aliases:consnet,splitnet,splits_network) for extracting bipartition splits from gene trees and summarizing conflicting phylogenetic signalCounts frequency of each non-trivial bipartition across input trees
--thresholdoption to filter splits by minimum frequency (default: 0.1)--plot-outputoption to generate a circular splits network diagram--missing-taxa sharedsupport for trees with different taxon setsJSON output support via
--jsonAdded new CLI entry points:
pk_consensus_network,pk_consnet,pk_splitnet,pk_splits_network
2.1.21: Added automatic OU shift detection (l1ou):
Added new
ou_shift_detectioncommand (aliases:ou_shifts,l1ou,detect_shifts) for automatic detection of adaptive optimum shifts on a phylogeny using the LASSO-based approach of Khabbazian et al. (2016)No regime file needed — only a tree and trait data
Model selection via pBIC (default), BIC, or AICc
--max-shiftsoption to limit number of candidate shiftsJSON output support via
--jsonAdded new CLI entry points:
pk_ou_shift_detection,pk_ou_shifts,pk_l1ou,pk_detect_shiftsValidated against R's l1ou package: same shift count, alpha, and pBIC on a 100-tip Anolis dataset
2.1.20:
Added taxon occupancy threshold to create_concatenation_matrix:
New
--thresholdoption (default 0) excludes taxa whose effective representation (fraction of informative, non-gap/non-ambiguous characters) falls below the specified valueFiltering is disabled by default; set
--threshold 0.5(for example) to exclude poorly represented taxaExcluded taxa are reported to stderr with their effective occupancy
JSON output includes
thresholdandexcluded_taxafields
2.1.19: Added multi-regime Ornstein-Uhlenbeck models (OUwie):
Added new
ouwiecommand (aliases:fit_ouwie,multi_regime_ou) for fitting multi-regime OU models of continuous trait evolutionSeven models: BM1, BMS, OU1, OUM, OUMV, OUMA, OUMVA (Beaulieu et al. 2012)
Regime assignments to internal branches via Fitch parsimony
Model comparison via AIC, AICc, BIC, and AICc weights
JSON output support via
--jsonAdded new CLI entry points:
pk_ouwie,pk_fit_ouwie,pk_multi_regime_ouResults validated against R 4.4.0 (
OUwiev2.10 withroot.station=FALSE):Model
PhyKIT LL
R OUwie LL
Diff
Notes
BM1
-11.5697
-11.5697
< 1e-4
Exact match
BMS
-11.2046
-11.1357
0.069
Rooting artifact (R adds 1e-6 branch)
OU1
-10.2890
-11.5697
1.281
R stuck at alpha=0 (BM boundary)
OUM
-8.6297
-10.9823
2.353
R stuck at alpha=0 (BM boundary)
OUMV
-6.9859
-10.2705
3.285
R stuck at alpha=0 (BM boundary)
OUMA
-6.9859
-6.9892
0.003
Excellent match
OUMVA
-6.9859
-7.0063
0.020
Very close
BM1 matches R to machine precision. OUMA and OUMVA agree within 0.003-0.02 log-likelihood units. For OU1, OUM, and OUMV, R's OUwie optimizer converges to alpha=0 (the Brownian motion boundary), while PhyKIT's multi-interval search finds genuinely better OU optima with positive alpha. BMS shows a small difference (0.07 LL units) attributable to R's
resolve.root=TRUEadding a 1e-6 length branch and optimizer convergence differences.
2.1.18: Added phylogenetic generalized linear models for binary and count data:
Added new
phylogenetic_glmcommand (aliases:phylo_glm,pglm) for fitting phylogenetic GLMsBinomial family: logistic regression via Maximum Penalized Likelihood Estimation (logistic_MPLE; Ives & Garland 2010) with Firth's penalty. Log-likelihood computed via pruning algorithm for a 2-state CTMC on the phylogeny. Fisher information computed via O(n) tree-based three-point algorithm.
Poisson family: Poisson regression via Generalized Estimating Equations (poisson_GEE; Paradis & Claude 2002) with overdispersion estimation
Jointly estimates phylogenetic signal parameter alpha (binomial) or overdispersion phi (Poisson)
JSON output support via
--jsonAdded new CLI entry points:
pk_phylogenetic_glm,pk_phylo_glm,pk_pglmResults validated against R 4.4.0 (
phylolm::phyloglm()):Poisson GEE (
count_trait ~ body_mass):Parameter
PhyKIT
R
phyloglmDifference
Intercept
0.6741
0.6741
< 1e-4
body_mass
0.5968
0.5968
< 1e-4
SE(Intercept)
0.1678
0.1678
< 1e-4
SE(body_mass)
0.0877
0.0877
< 1e-4
Overdispersion (phi)
0.1730
0.1730
< 1e-4
Poisson GEE matches R to within numerical precision.
Logistic MPLE (
binary_trait ~ body_mass):Parameter
PhyKIT
R
phyloglmDifference
Intercept
-2.2374
-2.1210
0.116
body_mass
2.2432
2.2158
0.027
alpha
0.0215
0.0274
0.006
Log-likelihood
-1.833
-1.870
0.037
AIC
9.665
9.740
0.075
Logistic MPLE coefficients agree to within ~5%. Small differences arise from how R's
ape::branching.times()computes node heights for non-ultrametric trees, which slightly affects the Ives & Garland branch length transformation for the Fisher information penalty. Both implementations use the same 2-state CTMC pruning log-likelihood (verified to match R exactly at -1.870 when evaluated at R's optimal parameters).
2.1.17:
Unified phylogenetic PCA and dimensionality reduction into a single
phylogenetic_ordination command, and added continuous trait evolution
model comparison:
Merged
phylogenetic_pcaandphylogenetic_dimreduceinto a unifiedphylogenetic_ordinationcommand (aliases:phylo_ordination,ordination,ord) supporting PCA, t-SNE, and UMAP via--methodAll previous aliases remain functional:
phylo_pca,phyl_pca,ppca,phylo_dimreduce,dimreduce,pdrPCA's old
-m/--method(BM/lambda) is now--correctionGLS-centering via phylogenetic VCV matrix for all methods
Auto-adjusted parameters for small datasets (t-SNE/UMAP)
Optional Pagel's lambda correction via
--correction lambdaOptional scatter plot with phylogeny overlay (
--plot,--plot-tree)JSON output support via
--jsonAdded new CLI entry points:
pk_phylogenetic_ordination,pk_phylo_ordination,pk_ordination,pk_ord
Added continuous trait evolution model comparison:
Added new
fit_continuouscommand (aliases:fitcontinuous,fc) for comparing models of continuous trait evolution on a phylogeny, analogous to R'sgeiger::fitContinuous()Fits 7 models: BM, OU, EB, Lambda, Delta, Kappa, White
Ranks models by AIC, BIC, and AIC weights
Optional subset of models via
--modelsJSON output support via
--jsonAdded new CLI entry points:
pk_fit_continuous,pk_fitcontinuous,pk_fc
2.1.16: Added visualization commands and rate heterogeneity test:
Added new
cont_mapcommand (aliases:contmap,cmap) for plotting a phylogram with branches colored by continuous trait values via ML ancestral reconstruction (analogous to R'sphytools::contMap())Added new
density_mapcommand (aliases:densitymap,dmap) for plotting posterior probabilities of discrete character states along phylogeny branches from stochastic character mapping (analogous to R'sphytools::densityMap())Added new
phenogramcommand (aliases:traitgram,tg) for plotting a phenogram (traitgram) showing trait evolution along a phylogeny with X-axis = distance from root, Y-axis = trait value (analogous to R'sphytools::phenogram())Added new
cophylocommand (aliases:tanglegram,tangle) for plotting cophylogenetic tanglegrams of two phylogenies with connecting lines between matching taxa and node rotation to minimize crossings (analogous to R'sphytools::cophylo())
Added rate heterogeneity test:
Added new
rate_heterogeneitycommand (aliases:brownie,rh) for testing whether continuous trait evolution rates differ across phylogenetic regimes using multi-rate Brownian motion (O'Meara et al. 2006), analogous to R'sphytools::brownie.lite()Fits single-rate vs. multi-rate BM models and performs a likelihood ratio test (chi-squared)
Optional parametric bootstrap via
-n/--nsimRegime assignments to internal branches inferred via Fitch parsimony
Optional
--plotargument for regime-colored phylogramJSON output support via
--jsonAdded new CLI entry points:
pk_rate_heterogeneity,pk_brownie,pk_rhResults validated against R 4.4.0 (
phytools::brownie.litewithpaintSubTree(stem=TRUE)):PhyKIT uses Fitch parsimony for regime assignment, which matches R's
paintSubTree(stem=TRUE)behavior.Parameter
PhyKIT
R
brownie.liteDifference
Single-rate sigma2
0.03841
0.03841
< 1e-11
Single-rate LL
-11.56968
-11.56968
< 1e-14
Single-rate anc. state
1.64469
1.64469
< 1e-15
Multi-rate sigma2 (terrestrial)
0.05002
0.04998
3.9e-05
Multi-rate sigma2 (aquatic)
0.00881
0.00889
8.1e-05
Multi-rate LL
-11.20459
-11.20461
1.6e-05
Chi-squared p-value
0.39283
0.39284
1.1e-05
Single-rate model matches to machine precision. Multi-rate model matches to within optimizer convergence tolerance (both converge to the same flat likelihood plateau).
2.1.15: Added ancestral state reconstruction:
Added new
ancestral_state_reconstructioncommand (aliases:asr,anc_recon) for estimating ancestral states of continuous traits using maximum likelihood, analogous to R'sphytools::fastAnc()andape::ace(type="ML")Two methods:
fast(two-pass Felsenstein's algorithm, O(n)) andml(full VCV-based ML with exact conditional CIs, O(n^3))Optional
--ciflag to include 95% confidence intervalsOptional
--plotargument to generate a contMap plot showing continuous trait values mapped onto the phylogeny with a color gradientSupports both two-column single-trait files and multi-trait files with
-cflag to select a trait columnJSON output support via
--jsonAdded new CLI entry points:
pk_ancestral_state_reconstruction,pk_asr,pk_anc_reconResults validated against R 4.4.0 (
phytools::fastAncwithvars=TRUE, CI=TRUE):Point estimates —
fastmethod vs R'sphytools::fastAnc()Node
Descendants
PhyKIT
R
fastAncError
N1 (root)
all 8 tips
1.6446924
1.6446924
0.0000000
N2
bear, raccoon
1.7012405
1.7012405
0.0000000
N3
5 taxa
1.4564597
1.4564597
0.0000000
N4
sea_lion, seal
1.8090745
1.8090745
0.0000000
N5
cat, monkey, weasel
1.2565917
1.2565917
0.0000000
N6
cat, monkey
0.9894725
0.9894725
0.0000000
95% CIs —
fastmethod vs R'sfastAnc(CI=TRUE)Node
PhyKIT CI
R CI
Error
N1 (root)
[0.894, 2.396]
[0.894, 2.396]
0.000
N2
[0.970, 2.433]
[0.970, 2.433]
0.000
N3
[0.639, 2.274]
[0.639, 2.274]
0.000
N4
[0.976, 2.642]
[0.976, 2.642]
0.000
N5
[0.355, 2.158]
[0.355, 2.158]
0.000
N6
[-0.565, 2.544]
[-0.565, 2.544]
0.000
fastandmlmethods produce identical point estimates (within 1e-6)Sigma-squared (BM rate) = 0.04389 (matches R's PIC-based estimate within 1e-3)
2.1.13: Added stochastic character mapping (SIMMAP):
Added new
stochastic_character_mapcommand (aliases:simmap,scm) for performing Stochastic Character Mapping of discrete traits onto a phylogeny (Huelsenbeck et al. 2003; Bollback 2006), analogous to R'sphytools::make.simmap()Fits a continuous-time Markov chain (CTMC) rate matrix Q via maximum likelihood using Felsenstein's pruning algorithm
Three substitution models: ER (equal rates), SYM (symmetric), ARD (all rates differ)
Simulates character histories conditioned on tip states via rejection sampling
Reports mean dwelling times, mean transition counts, and posterior node probabilities across simulations
Optional
--plotargument to generate a horizontal phylogram with branches colored by mapped character stateReproducible simulations via
--seedargumentJSON output support via
--jsonResults validated against R 4.4.0 (
phytools::fitMkandphytools::make.simmap):ER log-likelihood: R = -8.7889, PhyKIT = -8.7874 (within 0.002); both are valid ML estimates on a flat likelihood surface
ARD log-likelihood: R = -8.4305, PhyKIT = -8.3845; PhyKIT's multi-start optimizer finds a slightly better local optimum
Total tree length conserved exactly (277.2772)
Dwelling times sum to total tree length across all simulations
Q matrix structural properties verified: rows sum to zero, off-diagonal elements positive, diagonal elements negative
Model nesting confirmed: ARD loglik >= SYM loglik >= ER loglik
Added new CLI entry points:
pk_stochastic_character_map,pk_simmap,pk_scm
2.1.12: Added phylogenetic regression (PGLS):
Added new
phylogenetic_regressioncommand (aliases:phylo_regression,pgls) for fitting Phylogenetic Generalized Least Squares regression while accounting for phylogenetic non-independence among speciesSupports Brownian motion (BM) and Pagel's lambda estimation methods
Outputs coefficient estimates, standard errors, t-values, p-values, R-squared, F-statistic, log-likelihood, and AIC
Supports multiple predictor variables
JSON output support via
--jsonResults validated against R 4.4.0 (manual GLS with
ape::vcv(), matchingcaper::pgls()behavior); coefficients, standard errors, t-values, p-values, R-squared, F-statistic, log-likelihood, and AIC match to at least four decimal placesUses the raw phylogenetic VCV matrix (not the normalized correlation matrix used by
nlme::glswithcorBrownian)Added new CLI entry points:
pk_phylogenetic_regression,pk_phylo_regression,pk_pgls
2.1.11: Maintenance:
Added
matplotlib>=3.7.0as a required dependency
2.1.10: Added phylomorphospace:
Added new
phylomorphospacecommand (aliases:phylomorpho,phmo) for plotting raw traits with the phylogeny overlaid via ML-reconstructed ancestral states at internal nodesTree edges colored by distance from root (coolwarm colormap with colorbar)
Optional
--color-byfor tip point coloring (continuous or discrete)Auto-selects first two trait columns when the file has exactly 2 traits and
--trait-x/--trait-yare omittedJSON output support via
--jsonAdded new CLI entry points:
pk_phylomorphospace,pk_phylomorpho,pk_phmo
2.1.9: Added phylogenetic PCA:
Added new
phylogenetic_pcacommand (aliases:phylo_pca,phyl_pca,ppca) implementing Revell (2009) phylogenetic PCA for multi-trait dataTwo methods:
BM(Brownian motion) andlambda(joint Pagel's lambda estimation across all traits)Two modes:
cov(covariance PCA) andcorr(correlation PCA)Tab-delimited multi-trait file input with header row, comment/blank-line support, and automatic taxon-mismatch handling (intersection with stderr warnings)
Optional
--plotargument to generate PCA scatter plot (PC1 vs PC2) with taxon labels and variance-explained axesJSON output support via
--jsonAdded new CLI entry points:
pk_phylogenetic_pca,pk_phylo_pca,pk_phyl_pca,pk_ppcaBenchmarked against R phytools
phyl.pca()(Revell 2012): eigenvalues, loadings, and scores match across all method/mode combinations within 1e-4
2.1.8: Added phylogenetic signal analysis:
Added new
phylogenetic_signalcommand (aliases:phylo_signal,ps) with support for Blomberg's K and Pagel's lambdaBlomberg's K includes permutation-based p-value (configurable
--permutations)Pagel's lambda uses ML optimization with likelihood ratio test p-value
Tab-delimited trait file input with comment/blank-line support and automatic taxon-mismatch handling (intersection with stderr warnings)
JSON output support via
--jsonAdded new CLI entry points:
pk_phylogenetic_signal,pk_phylo_signal,pk_psValidated against R phytools
phylosig()across 95 simulated datasets (5-50 tips; pure-birth, coalescent; random, BM, and known-lambda traits): Pearson r = 1.0 for K, r > 0.999 for lambda, log-likelihood, and LRT p-value
2.1.7: Added a missing-taxa-aware consensus tree utility:
Added new
consensus_treecommand (aliases:consensus,ctree) with strict and majority-rule modesAdded
--missing-taxahandling:error(default) orshared(prune all trees to the shared taxa set)Added JSON output support for consensus metadata and Newick output
Added new CLI entry points:
pk_consensus_tree,pk_consensus,pk_ctreeAdded unit/integration test coverage for parsing, alias dispatch, and missing-taxa behavior
Updated usage documentation and top-level command help text
2.1.6: Expanded plotting support for alignment/tree QC workflows:
Added optional plotting to:
pairwise_identity,saturation,covarying_evolutionary_rates,compositional_bias_per_site,evolutionary_rate_per_site, andalignment_entropyAdded
tip_to_tip_distance --all-pairsmode with optional clustered heatmap output (--plot)Standardized plotting arguments and JSON metadata:
--plot,--plot-output, andplot_outputin JSON payloadsUpdated CLI help text and online usage documentation for all newly plotted commands
Expanded integration test coverage for plotting and JSON+plot behavior and reran full regression and documentation build checks
Removed
cythonfrom runtime dependencies; repository has no Cython build pipeline (no.pyx/cythonizeusage), so this was unnecessarySunset Python 3.9 and 3.10 support; CI, packaging classifiers, and
python_requiresnow target Python 3.11+
2.1.5: JSON output expansion and harmonization:
Added
--jsonsupport to the remaining CLI commands, includingcreate_concatenation_matrixandnearest_neighbor_interchangeStandardized JSON metadata key naming for improved consistency across commands
Added canonical
rowspayloads for list-style JSON outputs while preserving legacy keys for backward compatibilityExpanded integration test coverage for JSON payloads and performed full regression verification
2.1.4: New alignment utilities, masking support, and composition/RCV correctness updates:
Added new alignment commands: -
alignment_entropy(site entropy reporting) -occupancy_per_taxon(per-taxon valid-site occupancy) -composition_per_taxon(per-taxon symbol composition, excluding invalid symbols) -mask_alignment(column masking by gap fraction, occupancy, and optional entropy)Updated saturation slope calculation to use NumPy no-intercept least-squares (fit constrained through the origin), replacing sklearn in this code path
Updated
rcvandrcvthandling to be case-insensitive and to exclude gaps/ambiguous symbols from counts and normalizationClarified valid-length normalization in RCV calculations and related docs/help text
Documentation maintenance updates to improve rendering consistency and remove duplicate changelog maintenance burden
2.1.0: Major performance improvements, expanded Python support, and bug fixes:
Compatibility:
Added support for Python 3.12 and 3.13
Maintains compatibility with Python 3.9, 3.10, and 3.11
Performance Optimizations:
Added multiprocessing support for computationally intensive functions (up to 8x faster): patristic distances, pairwise identity, polytomy test, LB score, bipartition support stats, sum of pairs score, saturation analysis, hidden paralogy check, and covarying evolutionary rates
Implemented tree caching with LRU cache to avoid re-parsing files
Added NumPy vectorization for alignment operations (5-10x faster)
Optimized file I/O with streaming for large concatenation operations
Added pickle-based fast tree copying for NNI operations
Bug Fixes:
Fixed tree caching side effects that caused tree modifications to persist
Fixed spurious sequence detection to correctly use only terminal branches
Fixed DNA threader array broadcasting issues
Standardized error exit codes to 2
Fixed test infrastructure issues
Updated saturation slope fitting to use NumPy no-intercept least-squares (fit constrained through the origin), replacing sklearn in this code path
Updated
rcvandrcvtto be case-insensitive and to exclude gaps/ambiguous symbols from composition counts and normalization; RCV now normalizes each taxon by valid (non-excluded) sequence length
2.0.2: Fixed bug in dna threading associated with how gaps were introduced in codons.
2.0.1: Added arguments to exclude sites with gaps in the pairwise identities and saturation functions.
2.0.0: Codebase overhaul to make PhyKIT more mem efficient and faster. For example, using list comprehension when appropriate.
1.21.0: The partition file outputted from the create_concat function has been updated to the following format: - column 1: alignment name - column 2: # of taxa present - column 3: # of taxa missing - column 4: fraction of occupancy - column 5: names of missing taxa (; separated)
1.20.0: Fixed bug for thread_dna function when using a ClipKIT log file. Input protein alignment must be the untrimmed alignment.
1.19.9: Saturation function now also reports the absolute value of 1-saturation. Lower values are indicative of less saturation.
1.19.4: Saturation function forces y-intercept to be zero when calculating slope
1.19.3: Saturation function now uses uncorrected distances instead of pairwise identities
1.19.2: Verbose pairwise identity reporting separates pairwise identities by tabs and not a dash
1.19.0: Added function to test for site-wise compositional biases in an alignment. See function compositional_bias_per_site.
1.18.0: Added function to estimate site-wise evolutionary rate in an alignment. See function evolutionary_rate_per_site.
1.15.0: Added function to recode alignments based on 8 different recoding schemes (7 for amino acids; 1 for nucleotides). See function recode.
1.14.0: Added an optional argument to the thread_dna function. Now, PhyKIT can thread nucleotide sequences onto a trimmed amino acid alignment. To do so, point PhyKIT to the ClipKIT outputted log file using the -c argument. The ClipKIT log file can be generated when trimming an alignment with ClipKIT by adding the -l argument (see here for more details: https://jlsteenwyk.com/ClipKIT/).
1.12.6: relative composition variability is now adapted for calculating compositional biases in individual taxa. The new function in rcvt (relative composition variability, taxon).
1.12.4: calculations of pairwise identity in alignment now supports excluding pairwise combinations with gaps.
1.12.3: hidden paralogy check now simply looks for monophyly or lack thereof for a set of taxa. Hidden paralogy check still reports insufficient taxon representation.
1.12.2: removed root.txt file from DVMC function. User's are now recommended to trim outgroup taxa beforehand
1.11.3: Added an optional argument to the prune_tree function wherein instead of pruning tips specified in the input file, those tips will be kept.
1.11.1: Modified sum of pairs score to divide the correct number of pairs by the number of pairs in the reference alignment rather than the query alignment alignment
1.11.0: Added terminal_branch_stats (alias: tbs) function to examine terminal branch lengths
1.10.1: Modified column score and sum of pairs score to divide the correct number of columns or pairs by the number of columns or pairs in the query alignment rather than the reference alignment
1.10.0: Added tip_to_tip_node_distance (alias: t2t_node_dist; t2t_nd) function to calculate the phylogenetic distance between two leaves in a phylogeny. Distance is measured in nodes between two leaves
1.9.0: Added monophyly_check (alias: is_monophyletic) function to examine monophyly among a specified set of taxa
1.8.0: Added hidden_paralogy_check (alias: clan_check) function to examine phylogenetic tree for issues of hidden paralogy
1.7.0: Added nearest_neighbor_interchange (alias: nni) function to generate all NNI moves for a binary rooted phylogeny
1.6.0: Added tip_to_tip_distance (alias: t2t_dist; t2t) function to calculate phylogenetic distance between two leaves in a phylogeny
1.5.0: Added root_tree (alias: root; rt) function to root a phylogenetic tree
1.4.0: PhyKIT is now Python version 3.9 and BioPython 1.79 compatible
1.3.0: Added function that estimates the evolutionary rate of a gene using tree-based properties. Function name is 'evolutionary_rate' or 'evo_rate'
1.2.2: added function to get the subtree of the last common ancestor among a set of taxa
1.2.0: added command line interfaces for all functions so that each command can easily be executed. For example, 'phykit aln_len -h' can now be called using 'pk_aln_len -h'
1.1.0: added faidx (alias: get_entry; ge) function to extract fasta entries from a multi-fasta file
1.0.3: added rooting procedure before calculating RF to handle comparing unrooted and rooted trees
1.0.2: function that calculates Robinson Foulds distance (robinson_foulds_distance; rf_distance; rf_dist; rf) now can take trees that differ in topology. PhyKIT will first determine shared tips between the two trees and prune both trees to a common set of tips. Next, PhyKIT will calculate the Robinson Foulds distance.
0.1.3: Added function (column_score; cs) to calculate the quality of an alignment given an input query alignment and a reference alignment to compare it to
0.1.2: Added function (sum_of_pairs_score; sops; sop) to calculate the quality of an alignment given an input query alignment and a reference alignment to compare it to
0.0.9: PhyKIT now handles error stemming from piping output