Installation

MacSyFinder works with models for macromolecular systems that are not shipped with it, you have to install them separately. See the msf_data section below. We also provide container so you can use macsyfinder directly.

MacSyFinder dependencies

Python version >=3.10 is required to run MacSyFinder: https://docs.python.org/3.10/index.html

MacSyFinder has one program dependency:

The hmmsearch program should be installed (e.g., in the PATH) in order to use MacSyFinder. Otherwise, the paths to this executable must be specified in the command-line: see the command-line options.

MacSyFinder also relies on six Python library dependencies:

  • colorlog

  • colorama

  • pyyaml

  • packaging

  • networkx

  • pandas

These dependencies will be automatically retrieved and installed when using pip for installation (see below).

Note

If you intend to build and distribute new models you will need some other dependencies see modeler guide for installation.

Note

If you want to contribute to the MacSyFinder code, check the guide lines (CONTRIBUTING) and specific procedure for developer installation.

MacSyFinder Installation procedure

It is recommended to use pip to install the MacSyFinder package.

Archive overview

  • doc => The documentation in html and pdf

  • test => All what is needed for unitary tests

  • macsypy => The macsyfinder python library

  • setup.py => The installation script

  • setup.cfg => The installation script

  • pyproject.toml => The project installation build tool

  • COPYING => The licensing

  • COPYRIGHT => The copyright

  • README.md => Very brief macsyfinder overview

  • CONTRIBUTORS => List of people who contributed to the code

  • CONTRIBUTING => The guide lines to contribute to the code

Installation steps:

Make sure every required dependency/software is present.

By default MacSyFinder will try to use hmmsearch in your PATH. If hmmsearch is not in the PATH, you have to set the absolute path to hmmsearch in a configuration file or in the command-line upon execution. If the tools are not in the path, some test will be skipped and a warning will be raised.

Perform the installation.

python3 -m pip install macsyfinder

If you do not have the privileges to perform a system-wide installation, you can either install it in your home directory or use a virtual environment.

installation in your home directory
python3 -m pip install --user macsyfinder
installation in a virtualenv
python3 -m venv macsyfinder
cd macsyfinder
source bin/activate
python3 -m pip install macsyfinder

To exit the virtualenv just execute the deactivate command. To run macsyfinder, you need to activate the virtualenv:

source macsyfinder/bin/activate

Then run macsyfinder or msf_data.

Note

Super-user privileges (i.e., sudo) are necessary if you want to install the program in the general file architecture.

Note

If you do not have the privileges, or if you do not want to install MacSyFinder in the Python libraries of your system, you can install MacSyFinder in a virtual environment (http://www.virtualenv.org/).

Warning

When installing a new version of MacSyFinder, do not forget to uninstall the previous version installed !

Uninstalling MacSyFinder

To uninstall MacSyFinder (the last version installed), run

(sudo) pip uninstall macsyfinder

If you install it in a virtualenv, just delete the virtual environment. For instance if you create a virtualenv name macsyfinder

python3 -m venv macsyfinder

To delete it, remove the directory

rm -R macsyfinder

From Conda/Mamba

From version 2.0, MacSyFinder is packaged for Conda/Mamba

mamba install -c macsyfinder=x.x

Where x.x is the macsyfinder version you want to install

From container

With Docker

The docker image is available on Docker Hub (https://hub.docker.com/repository/docker/gempasteur/macsyfinder) The computations are performed under msf user in /home/msf inside the container. So You have to mount a directory from the host in the container to exchange data (inputs data, and results) from the host and the container. The shared directory must be writable by the msf user or overwrite the user in the container by your id (see example below)

Furthermore the models are no longer packaged along macsyfinder. So you have to install them by yourself. For that we provide a command line tool msf_data which is inspired by pip.

msf_data search PACKNAME
msf_data install PACKNAME== or >=, or ... VERSION

To work with Docker you have to install models in a directory which will be mounted in the image at run time

mkdir shared_dir
cd shared_dir

install desired models in my_models directory

docker run -v ${PWD}/:/home/msf -u $(id -u ${USER}):$(id -g ${USER})  gempasteur/macsyfinder:<tag> msf_data install --target /home/msf/my_models <MODELS_PACK>

run msf against all models contains in <MODELS_PACK>

docker run -v ${PWD}/:/home/msf -u $(id -u ${USER}):$(id -g ${USER})  gempasteur/macsyfinder:<tag> macsyfinder --db-type unordered_replicon --models-dir=/home/msf/my_models/ --models  <MODELS_PACK>  all --sequence-db my_genome.fasta -w 12

With Apptainer (formely Singularity)

As the docker image is registered in docker hub you can also use it directly with Apptainer (https://apptainer.org/). Unlike docker you have not to worry about shared directory, your HOME and /tmp are automatically shared.

# install desired models in my_models directory
apptainer run -H ${HOME} docker://gempasteur/macsyfinder:<tag> msf_data install --target my_models <MODELS_PACK>

# run msf against all models contains in <MODELS_PACK>
apptainer run -H ${HOME} docker://gempasteur/macsyfinder:<tag> macsyfinder --db-type unordered_replicon --models-dir=my_models --models <MODELS_PACK> all --sequence-db my_genome.fasta -w 12

If you intend to run apptainer from host which cannot access internet (cluster node for instance), you have to

  1. download the image locally

  2. transfert the image file on the right file system

  3. and then use it.

apptainer build msf-<tag>.simg docker://gempasteur/macsyfinder:<tag>
cp msf-<tag>.simg <cluster_file_system>
apptainer run -H ${HOME} msf-<tag>.simg macsyfinder --db-type unordered_replicon --models-dir=my_models --models <MODELS_PACK> all --sequence-db my_genome.fasta -w 12

Models installation with msf_data

Once MacSyFinder is installed you have access to an utility program to manage the models: msf_data

This script allows to search, download, install and get information from MacSyFinder models stored on github (https://github.com/macsy-models) or locally installed. The general syntax for msf_data is:

msf_data <general options> <subcommand> <sub command options> <arguments>

To list all models available on macsy-models:

msf_data available

To search for models on macsy-models:

msf_data search TXSS

you can also search in models description:

msf_data search -S secretion

To install a model package:

msf_data install <model name>

To install a model when you have not the right to install it system-wide

To install it in your home (./macsyfinder/data):

msf_data install --user <model name>

To install it in any directory:

msf_data install --target <model dir> <model_name>

To know how to cite a model package:

msf_data cite <model name>

To show the name of the models and the structure of installed model package:

msf_data show <model package name>

for instance msf_data show TXSScan

TXSScan
    ├-archaea
    │   └-Archaeal-T4P
    └-bacteria
         ├-diderm
         │   ├-Flagellum
         │   ├-MSH
         │   ├-T1SS
         │   ├-T2SS
         │   ├-T3SS
         │   ├-T4aP
         │   ├-T4bP
         │   ├-T5aSS
         │   ├-T5bSS
         │   ├-T5cSS
         │   ├-T6SSi
         │   ├-T6SSii
         │   ├-T6SSiii
         │   ├-T9SS
         │   ├-Tad
         │   ├-pT4SSi
         │   └-pT4SSt
         └-monoderm
              └-ComM

TXSScan (1.1.3) : 19 models

To show the model definition:

msf_data definition <package or subpackage> model1 [model2, ...]

for instance to show model definitions T6SSii and T6SSiii in TXSS+/bacterial subpackage:

msf_data definition TXSS+/bacterial T6SSii T6SSiii

To show all models definitions in TXSS+/bacterial subpackage:

msf_data definition TXSS+/bacterial

To create a skeleton for your own model package (to access init subcommand check modeler installation):

msf_data init --pack-name <MY_PACK_NAME> --maintainer <"mantainer name"> --email <maintainer email> --authors <"author1, author2, ..">

above msf_data with required options. Below I add optioanl but recommended options.

msf_data init --pack-name <MY_PACK_NAME> --maintainer <mantainer name> --email <maintainer email> --authors <"author1, author2, .."> \
--license cc-by-nc-sa --holders <"the copyright holders"> --desc <"one line package description">

To list all msf_data subcommands:

msf_data --help

To list all available options for a subcommand:

msf_data <subcommand> --help

For models not stored in macsy-models the commands available, search, installation from remote or upgrade from remote are NOT available.

For models NOT stored in macsy-models, you have to manage them semi-manually. Download the archive (do not unarchive it), then use msf_data to install the archive.