OrthoHMM using high sensitivity and specificity Hidden Markov Models for orthology inference.
If you found OrthoHMM useful, please cite OrthoHMM: Improved Inference of Ortholog Groups using Hidden Markov Models. Steenwyk et al. 2024, bioRxiv. doi: 10.1101/2024.12.07.627370.
Performance
As of v0.2.0, OrthoHMM ships a built-in profile HMM + k-mer prefilter
search engine that replaces the phmmer subprocess. It scales to
100 bacterial proteomes (~352K total proteins) on a single 32-core node:
proteomes |
wall time |
peak RAM |
orthogroups |
|---|---|---|---|
5 |
13 s |
0.29 GB |
1,196 |
20 |
4 min |
0.44 GB |
8,680 |
60 |
28 min |
1.65 GB |
19,029 |
100 |
77 min |
4.67 GB |
27,328 |
Numbers from the bacterial scaling benchmark (RefSeq, 32 threads).
The legacy phmmer path is still available via
--search_mode phmmer but is no longer the default.
Quick Start
1. Install external dependencies
OrthoHMM has one required external binary — mcl — used for the
default clustering step. Install via your package manager
(apt install mcl, brew install mcl, conda install -c bioconda
mcl) or from source.
HMMER is optional and only required if you opt into the legacy
--search_mode phmmer pipeline. If you’d rather avoid the mcl
external dependency, --clustering leiden --cpm_resolution auto uses
pure-Python igraph/leidenalg and is competitive on most inputs.
2. Install OrthoHMM
# install
pip install orthohmm
# run
orthohmm <path_to_directory_of_FASTA_files>
Below are more detailed instructions, including alternative installation methods.
1) Installation
If you are having trouble installing OrthoHMM, please contact the lead developer, Jacob L. Steenwyk, via |contactSteenwyk|_ or |blueskySteenwyk|_ to get help.
1. Install external dependencies
OrthoHMM has one required external binary — mcl — used for the
default clustering step. Install via your package manager
(apt install mcl, brew install mcl, conda install -c bioconda
mcl) or from source.
HMMER is optional and only required if you opt into the legacy
--search_mode phmmer pipeline. If you’d rather avoid the mcl
external dependency, --clustering leiden --cpm_resolution auto uses
pure-Python igraph/leidenalg and is competitive on most inputs.
2a. Install OrthoHMM from pip
To install using pip, we recommend building a virtual environment to avoid software dependency issues. To do so, execute the following commands:
# create virtual environment
python -m venv venv
# activate virtual environment
source venv/bin/activate
# install orthohmm
pip install orthohmm
Note, the virtual environment must be activated to use orthohmm.
Install from source
Similarly, to install from source, we strongly recommend using a virtual environment. To do so, use the following commands:
# download
git clone https://github.com/JLSteenwyk/orthohmm.git
cd orthohmm/
# create virtual environment
python -m venv venv
# activate virtual environment
source venv/bin/activate
# install
make install
To deactivate your virtual environment, use the following command:
# deactivate virtual environment
deactivate
Note, the virtual environment must be activated to use orthohmm.
2b. Install OrthoHMM from source
Similarly, to install from source, we recommend using a virtual environment. To do so, use the following commands:
git clone https://github.com/JLSteenwyk/orthohmm.git
cd orthohmm/
make install
If you run into permission errors when executing make install, create a virtual environemnt for your installation:
git clone https://github.com/JLSteenwyk/orthohmm.git
cd orthohmm/
python -m venv venv
source venv/bin/activate
make install
Note, the virtual environment must be activated to use orthohmm.
2) Usage
To use OrthoHMM in its simpliest form, execute the following command:
orthohmm <path_to_directory_of_FASTA_files>