. . . . "WorkflowHub" . "https://about.workflowhub.eu/" . . "Workflow RO-Crate Profile" . "0.2.0" . . "Franck Dedeine" . . "Vincent Hervé" . . "Małgorzata Wolniewicz" . . "Nachida Tadrent" . . "https://doi.org/10.1093/bioinformatics/bts480" . "Snakemake" . "https://snakemake.readthedocs.io/" . . "Intel" . . "https://www.wikidata.org/wiki/Q248" . "genome" . . "5.541871921182266" . "4.5" . "SnakeMAGs" . . "8.497536945812808" . "6.9" . "Hervé V. SnakeMAGs" . . "5.541871921182266" . "4.5" . "GUNC database" . . "5.402843601895735" . "5.7" . "Information science" . . "Science and technology/Social sciences/Information science" . "channels defaults conda config" . . "3.6018957345971563" . "3.8" . "data" . . "3.9000609384521634" . "6.4" . "termite genome" . . "3.2227488151658767" . "3.4" . "config" . . "5.850091407678245" . "9.6" . "dataset" . . "3.9609993906154783" . "6.5" . "# Install and activate GUNC environment\nconda create --prefix /path/to/gunc_env\nconda install -c bioconda metabat2 --prefix /path/to/gunc_env\nsource activate /path/to/gunc_env" . . "44.01709401709402" . "10.3" . "genome" . . "6.337599024984765" . "10.4" . "path" . . "4.433497536945812" . "3.6" . "http" . . "5.172413793103448" . "4.2" . "database" . . "7.758620689655172" . "6.3" . "Genetics" . . "Science and technology/Natural science/Biology/Genetics" . "Durbin" . . "3.0788177339901477" . "2.5" . "P. M." . . "3.817733990147783" . "3.1" . "2012" . . "2021, 10:33" . . "2020" . . "lignocellulose" . . "3.9000609384521634" . "6.4" . "2013" . . "GUNC" . . "9.113300492610836" . "7.4" . "2019" . . "H. G." . . "4.1871921182266005" . "3.4" . "plant cell" . . "3.8391224862888484" . "6.3" . "data set" . . "2.559241706161137" . "2.7" . "termite microbiomes" . . "5.023696682464455" . "5.3" . "system adaptation" . . "9.66824644549763" . "10.2" . "database" . . "13.204951856946355" . "9.6" . "taxonomy database" . . "4.644549763033176" . "4.9" . "hindgut" . . "3.595368677635588" . "5.9" . "botany" . . "6.464924346629986" . "4.7" . "novel plant cell cell wall" . . "4.265402843601896" . "4.5" . "2021" . . "2023" . . "analysis Trimmomatic" . . "3.981042654028436" . "4.2" . "Map format" . . "3.1279620853080567" . "3.3" . "termite" . . "8.12807881773399" . "6.6" . "New Brunswick" . . "https://www.wikidata.org/wiki/Q1965" . "data" . . "5.665024630541871" . "4.6" . "lignocellulose degradation enzyme" . . "11.943127962085308" . "12.6" . "mathematical and computer sciences" . . "100.0" . "0.2981753945350647" . "software" . . "36.31361760660248" . "26.4" . "computer science" . . "32.599724896836314" . "23.7" . "D. D. Li" . . "3.571428571428571" . "2.9" . "config file" . . "5.118483412322274" . "5.4" . "hemicellulose" . . "4.064039408866995" . "3.3" . "earth sciences" . . "100.0" . "0.3184818625450134" . "All you have to do now is to indicate the path to the database folder (in our example, the folder is called release207_v2) in the config file, Classification section." . . "15.384615384615385" . "3.6" . "computer programming and software" . . "100.0" . "0.2981753945350647" . "2015" . . "GUNC environment conda" . . "4.265402843601896" . "4.5" . "enzyme" . . "4.143814747105424" . "6.8" . "configuration" . . "3.229737964655698" . "5.3" . "All you have to do now is to indicate the path to the GUNC database file in the config file, Bins quality section." . . "23.931623931623932" . "5.6" . "path" . . "3.351614868982328" . "5.5" . "2014" . . "cell membrane" . . "3.8391224862888484" . "6.3" . "hemicellulose degradation" . . "6.4454976303317535" . "6.8" . "file" . . "9.079829372333943" . "14.9" . "United States of America" . . "https://www.wikidata.org/wiki/Q30" . "http" . . "5.3016453382084086" . "8.7" . "genome Research" . . "2.938388625592417" . "3.1" . "computer programming" . . "6.052269601100413" . "4.4" . "license" . . "3.5344302254722733" . "5.8" . "enzyme" . . "4.064039408866995" . "3.3" . "# First, set up your channel priorities\nconda config --add channels defaults\nconda config --add channels bioconda\nconda config --add channels conda-forge" . . "16.666666666666668" . "3.9" . "hemicellulose" . . "4.448507007921999" . "7.3" . "2:40:15" . . "IT-computer sciences" . . "Science and technology/Technology and engineering/IT-computer sciences" . "2018" . . "index file" . . "3.3175355450236967" . "3.5" . "config" . . "8.374384236453201" . "6.8" . "fastq file" . . "3.1279620853080567" . "3.3" . "National Academy of Sciences" . . "https://www.wikidata.org/wiki/Q270794" . "termite" . . "8.714198659354052" . "14.3" . "2010" . . "channels bioconda conda config" . . "2.4644549763033177" . "2.6" . "database" . . "5.484460694698355" . "9.0" . "genome biology" . . "5.023696682464455" . "5.3" . "service-account-enrichment" . . . . "52444141"^^ . "https://api.rohub.org/api/ros/ea4e5a1d-3ce7-4438-af08-15fdd453600a/crate/download/" . "Stable" . . "2023-09-08 12:14:52.655671+00:00" . "2024-03-05 12:23:17.061318+00:00" . "2023-09-08 12:14:52.655671+00:00" . "[![Snakemake](https://img.shields.io/badge/snakemake-≥7.0.0-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io)\r\n\r\n\r\n# About SnakeMAGs\r\nSnakeMAGs is a workflow to reconstruct prokaryotic genomes from metagenomes. The main purpose of SnakeMAGs is to process Illumina data from raw reads to metagenome-assembled genomes (MAGs).\r\nSnakeMAGs is efficient, easy to handle and flexible to different projects. The workflow is CeCILL licensed, implemented in Snakemake (run on multiple cores) and available for Linux.\r\nSnakeMAGs performed eight main steps:\r\n- Quality filtering of the reads\r\n- Adapter trimming\r\n- Filtering of the host sequences (optional)\r\n- Assembly\r\n- Binning\r\n- Evaluation of the quality of the bins\r\n- Classification of the MAGs\r\n- Estimation of the relative abundance of the MAGs\r\n\r\n\r\n![scheme of workflow](SnakeMAGs_schema.jpg?raw=true)\r\n\r\n# How to use SnakeMAGs\r\n## Install conda\r\nThe easiest way to install and run SnakeMAGs is to use [conda](https://www.anaconda.com/products/distribution). These package managers will help you to easily install [Snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html).\r\n\r\n## Install and activate Snakemake environment\r\nNote: The workflow was developed with Snakemake 7.0.0\r\n```\r\nconda activate\r\n\r\n# First, set up your channel priorities\r\nconda config --add channels defaults\r\nconda config --add channels bioconda\r\nconda config --add channels conda-forge\r\n\r\n# Then, create a new environment for the Snakemake version you require\r\nconda create -n snakemake_7.0.0 snakemake=7.0.0\r\n\r\n# And activate it\r\nconda activate snakemake_7.0.0\r\n```\r\n\r\nAlternatively, you can also install Snakemake via mamba:\r\n```\r\n# If you do not have mamba yet on your machine, you can install it with:\r\nconda install -n base -c conda-forge mamba\r\n\r\n# Then you can install Snakemake\r\nconda activate base\r\nmamba create -c conda-forge -c bioconda -n snakemake snakemake\r\n\r\n# And activate it\r\nconda activate snakemake\r\n\r\n```\r\n\r\n## SnakeMAGs executable\r\nThe easiest way to procure SnakeMAGs and its related files is to clone the repository using git:\r\n```\r\ngit clone https://github.com/Nachida08/SnakeMAGs.git\r\n```\r\nAlternatively, you can download the relevant files:\r\n```\r\nwget https://github.com/Nachida08/SnakeMAGs/blob/main/SnakeMAGs.smk https://github.com/Nachida08/SnakeMAGs/blob/main/config.yaml\r\n```\r\n\r\n## SnakeMAGs input files\r\n- Illumina paired-end reads in FASTQ.\r\n- Adapter sequence file ([adapter.fa](https://github.com/Nachida08/SnakeMAGs/blob/main/adapters.fa)).\r\n- Host genome sequences in FASTA (if host_genome: \"yes\"), in case you work with host-associated metagenomes (e.g. human gut metagenome).\r\n\r\n## Download Genome Taxonomy Database (GTDB)\r\nGTDB-Tk requires ~66G+ of external data (GTDB) that need to be downloaded and unarchived. Because this database is voluminous, we let you decide where you want to store it.\r\nSnakeMAGs do not download automatically GTDB, you have to do it:\r\n\r\n```\r\n#Download the latest release (tested with release207)\r\n#Note: SnakeMAGs uses GTDBtk v2.1.0 and therefore require release 207 as minimum version. See https://ecogenomics.github.io/GTDBTk/installing/index.html#installing for details.\r\nwget https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_v2_data.tar.gz\r\n#Decompress\r\ntar -xzvf *tar.gz\r\n#This will create a folder called release207_v2\r\n```\r\nAll you have to do now is to indicate the path to the database folder (in our example, the folder is called release207_v2) in the config file, Classification section.\r\n\r\n## Download the GUNC database (required if gunc: \"yes\")\r\nGUNC accepts either a progenomes or GTDB based reference database. Both can be downloaded using the ```gunc download_db``` command. For our study we used the default proGenome-derived GUNC database. It requires less resources with similar performance.\r\n\r\n```\r\nconda activate\r\n# Install and activate GUNC environment\r\nconda create --prefix /path/to/gunc_env\r\nconda install -c bioconda metabat2 --prefix /path/to/gunc_env\r\nsource activate /path/to/gunc_env\r\n\r\n#Download the proGenome-derived GUNC database (tested with gunc_db_progenomes2.1)\r\n#Note: SnakeMAGs uses GUNC v1.0.5\r\ngunc download_db -db progenomes /path/to/GUNC_DB\r\n```\r\nAll you have to do now is to indicate the path to the GUNC database file in the config file, Bins quality section.\r\n\r\n## Edit config file\r\nYou need to edit the config.yaml file. In particular, you need to set the correct paths: for the working directory, to specify where are your fastq files, where you want to place the conda environments (that will be created using the provided .yaml files available in [SnakeMAGs_conda_env directory](https://github.com/Nachida08/SnakeMAGs/tree/main/SnakeMAGs_conda_env)), where are the adapters, where is GTDB and optionally where is the GUNC database and where is your host genome reference.\r\n\r\nLastly, you need to allocate the proper computational resources (threads, memory) for each of the main steps. These can be optimized according to your hardware.\r\n\r\n\r\n\r\nHere is an example of a config file:\r\n\r\n```\r\n#####################################################################################################\r\n##### _____ ___ _ _ _ ______ __ __ _______ _____ #####\r\n##### / ___| | \\ | | /\\ | | / / | ____| | \\ / | /\\ / _____| / ___| #####\r\n##### | (___ | |\\ \\ | | / \\ | |/ / | |____ | \\/ | / \\ | | __ | (___ #####\r\n##### \\___ \\ | | \\ \\| | / /\\ \\ | |\\ \\ | ____| | |\\ /| | / /\\ \\ | | |_ | \\___ \\ #####\r\n##### ____) | | | \\ | / /__\\ \\ | | \\ \\ | |____ | | \\/ | | / /__\\ \\ | |____|| ____) | #####\r\n##### |_____/ |_| \\__| /_/ \\_\\ |_| \\_\\ |______| |_| |_| /_/ \\_\\ \\______/ |_____/ #####\r\n##### #####\r\n#####################################################################################################\r\n\r\n############################\r\n### Execution parameters ###\r\n############################\r\n\r\nworking_dir: /path/to/working/directory/ #The main directory for the project\r\nraw_fastq: /path/to/raw_fastq/ #The directory that contains all the fastq files of all the samples (eg. sample1_R1.fastq & sample1_R2.fastq, sample2_R1.fastq & sample2_R2.fastq...)\r\nsuffix_1: \"_R1.fastq\" #Main type of suffix for forward reads file (eg. _1.fastq or _R1.fastq or _r1.fastq or _1.fq or _R1.fq or _r1.fq )\r\nsuffix_2: \"_R2.fastq\" #Main type of suffix for reverse reads file (eg. _2.fastq or _R2.fastq or _r2.fastq or _2.fq or _R2.fq or _r2.fq )\r\n\r\n###########################\r\n### Conda environnemnts ###\r\n###########################\r\n\r\nconda_env: \"/path/to/SnakeMAGs_conda_env/\" #Path to the provided SnakeMAGs_conda_env directory which contains the yaml file for each conda environment\r\n\r\n#########################\r\n### Quality filtering ###\r\n#########################\r\nemail: name.surname@your-univ.com #Your e-mail address\r\nthreads_filter: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_filter: 150 #Memory according to tools need (in GB)\r\n\r\n########################\r\n### Adapter trimming ###\r\n########################\r\nadapters: /path/to/working/directory/adapters.fa #A fasta file contanning a set of various Illumina adaptors (this file is provided and is also available on github)\r\ntrim_params: \"2:40:15\" #For further details, see the Trimmomatic documentation\r\nthreads_trim: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_trim: 150 #Memory according to tools need (in GB)\r\n\r\n######################\r\n### Host filtering ###\r\n######################\r\nhost_genome: \"yes\" #yes or no. An optional step for host-associated samples (eg. termite, human, plant...)\r\nthreads_bowtie2: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nhost_genomes_directory: /path/to/working/host_genomes/ #the directory where the host genome is stored\r\nhost_genomes: /path/to/working/host_genomes/host_genomes.fa #A fasta file containing the DNA sequences of the host genome(s)\r\nthreads_samtools: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_host_filtering: 150 #Memory according to tools need (in GB)\r\n\r\n################\r\n### Assembly ###\r\n################\r\nthreads_megahit: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nmin_contig_len: 1000 #Minimum length (in bp) of the assembled contigs\r\nk_list: \"21,31,41,51,61,71,81,91,99,109,119\" #Kmer size (for further details, see the megahit documentation)\r\nresources_megahit: 250 #Memory according to tools need (in GB)\r\n\r\n###############\r\n### Binning ###\r\n###############\r\nthreads_bwa: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_bwa: 150 #Memory according to tools need (in GB)\r\nthreads_samtools: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_samtools: 150 #Memory according to tools need (in GB)\r\nseed: 19860615 #Seed number for reproducible results\r\nthreads_metabat: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nminContig: 2500 #Minimum length (in bp) of the contigs\r\nresources_binning: 250 #Memory according to tools need (in GB)\r\n\r\n####################\r\n### Bins quality ###\r\n####################\r\n#checkM\r\nthreads_checkm: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_checkm: 250 #Memory according to tools need (in GB)\r\n#bins_quality_filtering\r\ncompletion: 50 #The minimum completion rate of bins\r\ncontamination: 10 #The maximum contamination rate of bins\r\nparks_quality_score: \"yes\" #yes or no. If yes bins are filtered according to the Parks quality score (completion-5*contamination >= 50)\r\n#GUNC\r\ngunc: \"yes\" #yes or no. An optional step to detect and discard chimeric and contaminated genomes using the GUNC tool\r\nthreads_gunc: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_gunc: 250 #Memory according to tools need (in GB)\r\nGUNC_db: /path/to/GUNC_DB/gunc_db_progenomes2.1.dmnd #Path to the downloaded GUNC database (see the readme file)\r\n\r\n######################\r\n### Classification ###\r\n######################\r\nGTDB_data_ref: /path/to/downloaded/GTDB #Path to uncompressed GTDB-Tk reference data (GTDB)\r\nthreads_gtdb: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_gtdb: 250 #Memory according to tools need (in GB)\r\n\r\n##################\r\n### Abundances ###\r\n##################\r\nthreads_coverM: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_coverM: 150 #Memory according to tools need (in GB)\r\n```\r\n# Run SnakeMAGs\r\nIf you are using a workstation with Ubuntu (tested on Ubuntu 22.04):\r\n```{bash}\r\nsnakemake --cores 30 --snakefile SnakeMAGs.smk --use-conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --configfile /path/to/config.yaml --keep-going --latency-wait 180\r\n```\r\n\r\nIf you are working on a cluster with Slurm (tested with version 18.08.7):\r\n```{bash}\r\nsnakemake --snakefile SnakeMAGs.smk --cluster 'sbatch -p --mem -c -o \"cluster_logs/{wildcards}.{rule}.{jobid}.out\" -e \"cluster_logs/{wildcards}.{rule}.{jobid}.err\" ' --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\nIf you are working on a cluster with SGE (tested with version 8.1.9):\r\n```{bash}\r\nsnakemake --snakefile SnakeMAGs.smk --cluster \"qsub -cwd -V -q -pe thread {threads} -e cluster_logs/{rule}.e{jobid} -o cluster_logs/{rule}.o{jobid}\" --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\n\r\n# Test\r\nWe provide you a small data set in the [test](https://github.com/Nachida08/SnakeMAGs/tree/main/test) directory which will allow you to validate your instalation and take your first steps with SnakeMAGs. This data set is a subset from [ZymoBiomics Mock Community](https://www.zymoresearch.com/blogs/blog/zymobiomics-microbial-standards-optimize-your-microbiomics-workflow) (250K reads) used in this tutoriel [metagenomics_tutorial](https://github.com/pjtorres/metagenomics_tutorial).\r\n\r\n1. Before getting started make sure you have cloned the SnakeMAGs repository or you have downloaded all the necessary files (SnakeMAGs.smk, config.yaml, chr19.fa.gz, insub732_2_R1.fastq.gz, insub732_2_R2.fastq.gz). See the [SnakeMAGs executable](#snakemags-executable) section.\r\n2. Unzip the fastq files and the host sequences file.\r\n```\r\ngunzip fastqs/insub732_2_R1.fastq.gz fastqs/insub732_2_R2.fastq.gz host_genomes/chr19.fa.gz\r\n```\r\n3. For better organisation put all the read files in the same directory (eg. fastqs) and the host sequences file in a separate directory (eg. host_genomes)\r\n4. Edit the config file (see [Edit config file](#edit-config-file) section)\r\n5. Run the test (see [Run SnakeMAGs](#run-snakemags) section)\r\n\r\nNote: the analysis of these files took 1159.32 secondes to complete on a Ubuntu 22.04 LTS with an Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz x 40 processor, 96GB of RAM.\r\n\r\n# Genome reference for host reads filtering\r\nFor host-associated samples, one can remove host sequences from the metagenomic reads by mapping these reads against a reference genome. In the case of termite gut metagenomes, we are providing [here](https://zenodo.org/record/6908287#.YuAdFXZBx8M) the relevant files (fasta and index files) from termite genomes.\r\n\r\nUpon request, we can help you to generate these files for your own reference genome and make them available to the community.\r\n\r\nNB. These steps of mapping generate voluminous files such as .bam and .sam. Depending on your disk space, you might want to delete these files after use.\r\n\r\n\r\n# Use case\r\nDuring the test phase of the development of SnakeMAGs, we used this workflow to process 10 publicly available termite gut metagenomes generated by Illumina sequencing, to ultimately reconstruct prokaryotic MAGs. These metagenomes were retrieved from the NCBI database using the following accession numbers: SRR10402454; SRR14739927; SRR8296321; SRR8296327; SRR8296329; SRR8296337; SRR8296343; DRR097505; SRR7466794; SRR7466795. They come from five different studies: Waidele et al, 2019; Tokuda et al, 2018; Romero Victorica et al, 2020; Moreira et al, 2021; and Calusinska et al, 2020.\r\n\r\n## Download the Illumina pair-end reads\r\nWe use fasterq-dump tool to extract data in FASTQ-format from SRA-accessions. It is a commandline-tool which offers a faster solution for downloading those large files.\r\n\r\n```\r\n# Install and activate sra-tools environment\r\n## Note: For this study we used sra-tools 2.11.0\r\n\r\nconda activate\r\nconda install -c bioconda sra-tools\r\nconda activate sra-tools\r\n\r\n# Download fastqs in a single directory\r\nmkdir raw_fastq\r\ncd raw_fastq\r\nfasterq-dump --threads --skip-technical --split-3\r\n```\r\n\r\n## Download Genome reference for host reads filtering\r\n```\r\nmkdir host_genomes\r\ncd host_genomes\r\nwget https://zenodo.org/record/6908287/files/termite_genomes.fasta.gz\r\ngunzip termite_genomes.fasta.gz\r\n```\r\n\r\n## Edit the config file\r\nSee [Edit config file](#edit-config-file) section.\r\n\r\n## Run SnakeMAGs\r\n```\r\nconda activate snakemake_7.0.0\r\nmkdir cluster_logs\r\nsnakemake --snakefile SnakeMAGs.smk --cluster 'sbatch -p --mem -c -o \"cluster_logs/{wildcards}.{rule}.{jobid}.out\" -e \"cluster_logs/{wildcards}.{rule}.{jobid}.err\" ' --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\n## Study results\r\nThe MAGs reconstructed from each metagenome and their taxonomic classification are available in this [repository](https://doi.org/10.5281/zenodo.7661004).\r\n\r\n# Citations\r\n\r\nIf you use SnakeMAGs, please cite:\r\n> Tadrent N, Dedeine F and Hervé V. SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]. F1000Research 2023, 11:1522 (https://doi.org/10.12688/f1000research.128091.2)\r\n\r\n\r\nPlease also cite the dependencies:\r\n- [Snakemake](https://doi.org/10.12688/f1000research.29032.2) : Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., & Köster, J. (2021) Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. *F1000Research* 2021, 10:33.\r\n- [illumina-utils](https://doi.org/10.1371/journal.pone.0066643) : Murat Eren, A., Vineis, J. H., Morrison, H. G., & Sogin, M. L. (2013). A Filtering Method to Generate High Quality Short Reads Using Illumina Paired-End Technology. *PloS ONE*, 8(6), e66643.\r\n- [Trimmomatic](https://doi.org/10.1093/bioinformatics/btu170) : Bolger, A. M., Lohse, M., & Usadel, B. (2014). Genome analysis Trimmomatic: a flexible trimmer for Illumina sequence data. *Bioinformatics*, 30(15), 2114-2120.\r\n- [Bowtie2](https://doi.org/10.1038/nmeth.1923) : Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. *Nature Methods*, 9(4), 357–359.\r\n- [SAMtools](https://doi.org/10.1093/bioinformatics/btp352) : Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. *Bioinformatics*, 25(16), 2078–2079.\r\n- [BEDtools](https://doi.org/10.1093/bioinformatics/btq033) : Quinlan, A. R., & Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. *Bioinformatics*, 26(6), 841–842.\r\n- [MEGAHIT](https://doi.org/10.1093/bioinformatics/btv033) : Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. *Bioinformatics*, 31(10), 1674–1676.\r\n- [bwa](https://doi.org/10.1093/bioinformatics/btp324) : Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. *Bioinformatics*, 25(14), 1754–1760.\r\n- [MetaBAT2](https://doi.org/10.7717/peerj.7359) : Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z. (2019). MetaBAT 2: An adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. *PeerJ*, 2019(7), 1–13.\r\n- [CheckM](https://doi.org/10.1101/gr.186072.114) : Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. *Genome Research*, 25(7), 1043–1055.\r\n- [GTDB-Tk](https://doi.org/10.1093/BIOINFORMATICS/BTAC672) : Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P., Parks, D. H. (2022). GTDB-Tk v2: memory friendly classification with the genome taxonomy database. *Bioinformatics*.\r\n- [CoverM](https://github.com/wwood/CoverM)\r\n- [Waidele et al, 2019](https://doi.org/10.1101/526038) : Waidele, L., Korb, J., Voolstra, C. R., Dedeine, F., & Staubach, F. (2019). Ecological specificity of the metagenome in a set of lower termite species supports contribution of the microbiome to adaptation of the host. *Animal Microbiome*, 1(1), 1–13.\r\n- [Tokuda et al, 2018](https://doi.org/10.1073/pnas.1810550115) : Tokuda, G., Mikaelyan, A., Fukui, C., Matsuura, Y., Watanabe, H., Fujishima, M., & Brune, A. (2018). Fiber-associated spirochetes are major agents of hemicellulose degradation in the hindgut of wood-feeding higher termites. *Proceedings of the National Academy of Sciences of the United States of America*, 115(51), E11996–E12004.\r\n- [Romero Victorica et al, 2020](https://doi.org/10.1038/s41598-020-60850-5) : Romero Victorica, M., Soria, M. A., Batista-García, R. A., Ceja-Navarro, J. A., Vikram, S., Ortiz, M., Ontañon, O., Ghio, S., Martínez-Ávila, L., Quintero García, O. J., Etcheverry, C., Campos, E., Cowan, D., Arneodo, J., & Talia, P. M. (2020). Neotropical termite microbiomes as sources of novel plant cell wall degrading enzymes. *Scientific Reports*, 10(1), 1–14.\r\n- [Moreira et al, 2021](https://doi.org/10.3389/fevo.2021.632590) : Moreira, E. A., Persinoti, G. F., Menezes, L. R., Paixão, D. A. A., Alvarez, T. M., Cairo, J. P. L. F., Squina, F. M., Costa-Leonardo, A. M., Rodrigues, A., Sillam-Dussès, D., & Arab, A. (2021). Complementary contribution of Fungi and Bacteria to lignocellulose digestion in the food stored by a neotropical higher termite. *Frontiers in Ecology and Evolution*, 9(April), 1–12.\r\n- [Calusinska et al, 2020](https://doi.org/10.1038/s42003-020-1004-3) : Calusinska, M., Marynowska, M., Bertucci, M., Untereiner, B., Klimek, D., Goux, X., Sillam-Dussès, D., Gawron, P., Halder, R., Wilmes, P., Ferrer, P., Gerin, P., Roisin, Y., & Delfosse, P. (2020). Integrative omics analysis of the termite gut system adaptation to Miscanthus diet identifies lignocellulose degradation enzymes. *Communications Biology*, 3(1), 1–12.\r\n- [Orakov et al, 2021](https://doi.org/10.1186/s13059-021-02393-0) : Orakov, A., Fullam, A., Coelho, L. P., Khedkar, S., Szklarczyk, D., Mende, D. R., Schmidt, T. S. B., & Bork, P. (2021). GUNC: detection of chimerism and contamination in prokaryotic genomes. *Genome Biology*, 22(1).\r\n- [Parks et al, 2015](https://doi.org/10.1101/gr.186072.114) : Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. *Genome Research*, 25(7), 1043–1055.\r\n# License\r\nThis project is licensed under the CeCILL License - see the [LICENSE](https://github.com/Nachida08/SnakeMAGs/blob/main/LICENCE) file for details.\r\n\r\nDeveloped by Nachida Tadrent at the Insect Biology Research Institute ([IRBI](https://irbi.univ-tours.fr/)), under the supervision of Franck Dedeine and Vincent Hervé.\r\n" . "[![Snakemake](https://img.shields.io/badge/snakemake-≥7.0.0-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io)\r\n\r\n\r\n# About SnakeMAGs\r\nSnakeMAGs is a workflow to reconstruct prokaryotic genomes from metagenomes. The main purpose of SnakeMAGs is to process Illumina data from raw reads to metagenome-assembled genomes (MAGs).\r\nSnakeMAGs is efficient, easy to handle and flexible to different projects. The workflow is CeCILL licensed, implemented in Snakemake (run on multiple cores) and available for Linux.\r\nSnakeMAGs performed eight main steps:\r\n- Quality filtering of the reads\r\n- Adapter trimming\r\n- Filtering of the host sequences (optional)\r\n- Assembly\r\n- Binning\r\n- Evaluation of the quality of the bins\r\n- Classification of the MAGs\r\n- Estimation of the relative abundance of the MAGs\r\n\r\n\r\n![scheme of workflow](SnakeMAGs_schema.jpg?raw=true)\r\n\r\n# How to use SnakeMAGs\r\n## Install conda\r\nThe easiest way to install and run SnakeMAGs is to use [conda](https://www.anaconda.com/products/distribution). These package managers will help you to easily install [Snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html).\r\n\r\n## Install and activate Snakemake environment\r\nNote: The workflow was developed with Snakemake 7.0.0\r\n```\r\nconda activate\r\n\r\n# First, set up your channel priorities\r\nconda config --add channels defaults\r\nconda config --add channels bioconda\r\nconda config --add channels conda-forge\r\n\r\n# Then, create a new environment for the Snakemake version you require\r\nconda create -n snakemake_7.0.0 snakemake=7.0.0\r\n\r\n# And activate it\r\nconda activate snakemake_7.0.0\r\n```\r\n\r\nAlternatively, you can also install Snakemake via mamba:\r\n```\r\n# If you do not have mamba yet on your machine, you can install it with:\r\nconda install -n base -c conda-forge mamba\r\n\r\n# Then you can install Snakemake\r\nconda activate base\r\nmamba create -c conda-forge -c bioconda -n snakemake snakemake\r\n\r\n# And activate it\r\nconda activate snakemake\r\n\r\n```\r\n\r\n## SnakeMAGs executable\r\nThe easiest way to procure SnakeMAGs and its related files is to clone the repository using git:\r\n```\r\ngit clone https://github.com/Nachida08/SnakeMAGs.git\r\n```\r\nAlternatively, you can download the relevant files:\r\n```\r\nwget https://github.com/Nachida08/SnakeMAGs/blob/main/SnakeMAGs.smk https://github.com/Nachida08/SnakeMAGs/blob/main/config.yaml\r\n```\r\n\r\n## SnakeMAGs input files\r\n- Illumina paired-end reads in FASTQ.\r\n- Adapter sequence file ([adapter.fa](https://github.com/Nachida08/SnakeMAGs/blob/main/adapters.fa)).\r\n- Host genome sequences in FASTA (if host_genome: \"yes\"), in case you work with host-associated metagenomes (e.g. human gut metagenome).\r\n\r\n## Download Genome Taxonomy Database (GTDB)\r\nGTDB-Tk requires ~66G+ of external data (GTDB) that need to be downloaded and unarchived. Because this database is voluminous, we let you decide where you want to store it.\r\nSnakeMAGs do not download automatically GTDB, you have to do it:\r\n\r\n```\r\n#Download the latest release (tested with release207)\r\n#Note: SnakeMAGs uses GTDBtk v2.1.0 and therefore require release 207 as minimum version. See https://ecogenomics.github.io/GTDBTk/installing/index.html#installing for details.\r\nwget https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_v2_data.tar.gz\r\n#Decompress\r\ntar -xzvf *tar.gz\r\n#This will create a folder called release207_v2\r\n```\r\nAll you have to do now is to indicate the path to the database folder (in our example, the folder is called release207_v2) in the config file, Classification section.\r\n\r\n## Download the GUNC database (required if gunc: \"yes\")\r\nGUNC accepts either a progenomes or GTDB based reference database. Both can be downloaded using the ```gunc download_db``` command. For our study we used the default proGenome-derived GUNC database. It requires less resources with similar performance.\r\n\r\n```\r\nconda activate\r\n# Install and activate GUNC environment\r\nconda create --prefix /path/to/gunc_env\r\nconda install -c bioconda metabat2 --prefix /path/to/gunc_env\r\nsource activate /path/to/gunc_env\r\n\r\n#Download the proGenome-derived GUNC database (tested with gunc_db_progenomes2.1)\r\n#Note: SnakeMAGs uses GUNC v1.0.5\r\ngunc download_db -db progenomes /path/to/GUNC_DB\r\n```\r\nAll you have to do now is to indicate the path to the GUNC database file in the config file, Bins quality section.\r\n\r\n## Edit config file\r\nYou need to edit the config.yaml file. In particular, you need to set the correct paths: for the working directory, to specify where are your fastq files, where you want to place the conda environments (that will be created using the provided .yaml files available in [SnakeMAGs_conda_env directory](https://github.com/Nachida08/SnakeMAGs/tree/main/SnakeMAGs_conda_env)), where are the adapters, where is GTDB and optionally where is the GUNC database and where is your host genome reference.\r\n\r\nLastly, you need to allocate the proper computational resources (threads, memory) for each of the main steps. These can be optimized according to your hardware.\r\n\r\n\r\n\r\nHere is an example of a config file:\r\n\r\n```\r\n#####################################################################################################\r\n##### _____ ___ _ _ _ ______ __ __ _______ _____ #####\r\n##### / ___| | \\ | | /\\ | | / / | ____| | \\ / | /\\ / _____| / ___| #####\r\n##### | (___ | |\\ \\ | | / \\ | |/ / | |____ | \\/ | / \\ | | __ | (___ #####\r\n##### \\___ \\ | | \\ \\| | / /\\ \\ | |\\ \\ | ____| | |\\ /| | / /\\ \\ | | |_ | \\___ \\ #####\r\n##### ____) | | | \\ | / /__\\ \\ | | \\ \\ | |____ | | \\/ | | / /__\\ \\ | |____|| ____) | #####\r\n##### |_____/ |_| \\__| /_/ \\_\\ |_| \\_\\ |______| |_| |_| /_/ \\_\\ \\______/ |_____/ #####\r\n##### #####\r\n#####################################################################################################\r\n\r\n############################\r\n### Execution parameters ###\r\n############################\r\n\r\nworking_dir: /path/to/working/directory/ #The main directory for the project\r\nraw_fastq: /path/to/raw_fastq/ #The directory that contains all the fastq files of all the samples (eg. sample1_R1.fastq & sample1_R2.fastq, sample2_R1.fastq & sample2_R2.fastq...)\r\nsuffix_1: \"_R1.fastq\" #Main type of suffix for forward reads file (eg. _1.fastq or _R1.fastq or _r1.fastq or _1.fq or _R1.fq or _r1.fq )\r\nsuffix_2: \"_R2.fastq\" #Main type of suffix for reverse reads file (eg. _2.fastq or _R2.fastq or _r2.fastq or _2.fq or _R2.fq or _r2.fq )\r\n\r\n###########################\r\n### Conda environnemnts ###\r\n###########################\r\n\r\nconda_env: \"/path/to/SnakeMAGs_conda_env/\" #Path to the provided SnakeMAGs_conda_env directory which contains the yaml file for each conda environment\r\n\r\n#########################\r\n### Quality filtering ###\r\n#########################\r\nemail: name.surname@your-univ.com #Your e-mail address\r\nthreads_filter: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_filter: 150 #Memory according to tools need (in GB)\r\n\r\n########################\r\n### Adapter trimming ###\r\n########################\r\nadapters: /path/to/working/directory/adapters.fa #A fasta file contanning a set of various Illumina adaptors (this file is provided and is also available on github)\r\ntrim_params: \"2:40:15\" #For further details, see the Trimmomatic documentation\r\nthreads_trim: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_trim: 150 #Memory according to tools need (in GB)\r\n\r\n######################\r\n### Host filtering ###\r\n######################\r\nhost_genome: \"yes\" #yes or no. An optional step for host-associated samples (eg. termite, human, plant...)\r\nthreads_bowtie2: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nhost_genomes_directory: /path/to/working/host_genomes/ #the directory where the host genome is stored\r\nhost_genomes: /path/to/working/host_genomes/host_genomes.fa #A fasta file containing the DNA sequences of the host genome(s)\r\nthreads_samtools: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_host_filtering: 150 #Memory according to tools need (in GB)\r\n\r\n################\r\n### Assembly ###\r\n################\r\nthreads_megahit: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nmin_contig_len: 1000 #Minimum length (in bp) of the assembled contigs\r\nk_list: \"21,31,41,51,61,71,81,91,99,109,119\" #Kmer size (for further details, see the megahit documentation)\r\nresources_megahit: 250 #Memory according to tools need (in GB)\r\n\r\n###############\r\n### Binning ###\r\n###############\r\nthreads_bwa: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_bwa: 150 #Memory according to tools need (in GB)\r\nthreads_samtools: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_samtools: 150 #Memory according to tools need (in GB)\r\nseed: 19860615 #Seed number for reproducible results\r\nthreads_metabat: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nminContig: 2500 #Minimum length (in bp) of the contigs\r\nresources_binning: 250 #Memory according to tools need (in GB)\r\n\r\n####################\r\n### Bins quality ###\r\n####################\r\n#checkM\r\nthreads_checkm: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_checkm: 250 #Memory according to tools need (in GB)\r\n#bins_quality_filtering\r\ncompletion: 50 #The minimum completion rate of bins\r\ncontamination: 10 #The maximum contamination rate of bins\r\nparks_quality_score: \"yes\" #yes or no. If yes bins are filtered according to the Parks quality score (completion-5*contamination >= 50)\r\n#GUNC\r\ngunc: \"yes\" #yes or no. An optional step to detect and discard chimeric and contaminated genomes using the GUNC tool\r\nthreads_gunc: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_gunc: 250 #Memory according to tools need (in GB)\r\nGUNC_db: /path/to/GUNC_DB/gunc_db_progenomes2.1.dmnd #Path to the downloaded GUNC database (see the readme file)\r\n\r\n######################\r\n### Classification ###\r\n######################\r\nGTDB_data_ref: /path/to/downloaded/GTDB #Path to uncompressed GTDB-Tk reference data (GTDB)\r\nthreads_gtdb: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_gtdb: 250 #Memory according to tools need (in GB)\r\n\r\n##################\r\n### Abundances ###\r\n##################\r\nthreads_coverM: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_coverM: 150 #Memory according to tools need (in GB)\r\n```\r\n# Run SnakeMAGs\r\nIf you are using a workstation with Ubuntu (tested on Ubuntu 22.04):\r\n```{bash}\r\nsnakemake --cores 30 --snakefile SnakeMAGs.smk --use-conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --configfile /path/to/config.yaml --keep-going --latency-wait 180\r\n```\r\n\r\nIf you are working on a cluster with Slurm (tested with version 18.08.7):\r\n```{bash}\r\nsnakemake --snakefile SnakeMAGs.smk --cluster 'sbatch -p --mem -c -o \"cluster_logs/{wildcards}.{rule}.{jobid}.out\" -e \"cluster_logs/{wildcards}.{rule}.{jobid}.err\" ' --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\nIf you are working on a cluster with SGE (tested with version 8.1.9):\r\n```{bash}\r\nsnakemake --snakefile SnakeMAGs.smk --cluster \"qsub -cwd -V -q -pe thread {threads} -e cluster_logs/{rule}.e{jobid} -o cluster_logs/{rule}.o{jobid}\" --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\n\r\n# Test\r\nWe provide you a small data set in the [test](https://github.com/Nachida08/SnakeMAGs/tree/main/test) directory which will allow you to validate your instalation and take your first steps with SnakeMAGs. This data set is a subset from [ZymoBiomics Mock Community](https://www.zymoresearch.com/blogs/blog/zymobiomics-microbial-standards-optimize-your-microbiomics-workflow) (250K reads) used in this tutoriel [metagenomics_tutorial](https://github.com/pjtorres/metagenomics_tutorial).\r\n\r\n1. Before getting started make sure you have cloned the SnakeMAGs repository or you have downloaded all the necessary files (SnakeMAGs.smk, config.yaml, chr19.fa.gz, insub732_2_R1.fastq.gz, insub732_2_R2.fastq.gz). See the [SnakeMAGs executable](#snakemags-executable) section.\r\n2. Unzip the fastq files and the host sequences file.\r\n```\r\ngunzip fastqs/insub732_2_R1.fastq.gz fastqs/insub732_2_R2.fastq.gz host_genomes/chr19.fa.gz\r\n```\r\n3. For better organisation put all the read files in the same directory (eg. fastqs) and the host sequences file in a separate directory (eg. host_genomes)\r\n4. Edit the config file (see [Edit config file](#edit-config-file) section)\r\n5. Run the test (see [Run SnakeMAGs](#run-snakemags) section)\r\n\r\nNote: the analysis of these files took 1159.32 secondes to complete on a Ubuntu 22.04 LTS with an Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz x 40 processor, 96GB of RAM.\r\n\r\n# Genome reference for host reads filtering\r\nFor host-associated samples, one can remove host sequences from the metagenomic reads by mapping these reads against a reference genome. In the case of termite gut metagenomes, we are providing [here](https://zenodo.org/record/6908287#.YuAdFXZBx8M) the relevant files (fasta and index files) from termite genomes.\r\n\r\nUpon request, we can help you to generate these files for your own reference genome and make them available to the community.\r\n\r\nNB. These steps of mapping generate voluminous files such as .bam and .sam. Depending on your disk space, you might want to delete these files after use.\r\n\r\n\r\n# Use case\r\nDuring the test phase of the development of SnakeMAGs, we used this workflow to process 10 publicly available termite gut metagenomes generated by Illumina sequencing, to ultimately reconstruct prokaryotic MAGs. These metagenomes were retrieved from the NCBI database using the following accession numbers: SRR10402454; SRR14739927; SRR8296321; SRR8296327; SRR8296329; SRR8296337; SRR8296343; DRR097505; SRR7466794; SRR7466795. They come from five different studies: Waidele et al, 2019; Tokuda et al, 2018; Romero Victorica et al, 2020; Moreira et al, 2021; and Calusinska et al, 2020.\r\n\r\n## Download the Illumina pair-end reads\r\nWe use fasterq-dump tool to extract data in FASTQ-format from SRA-accessions. It is a commandline-tool which offers a faster solution for downloading those large files.\r\n\r\n```\r\n# Install and activate sra-tools environment\r\n## Note: For this study we used sra-tools 2.11.0\r\n\r\nconda activate\r\nconda install -c bioconda sra-tools\r\nconda activate sra-tools\r\n\r\n# Download fastqs in a single directory\r\nmkdir raw_fastq\r\ncd raw_fastq\r\nfasterq-dump --threads --skip-technical --split-3\r\n```\r\n\r\n## Download Genome reference for host reads filtering\r\n```\r\nmkdir host_genomes\r\ncd host_genomes\r\nwget https://zenodo.org/record/6908287/files/termite_genomes.fasta.gz\r\ngunzip termite_genomes.fasta.gz\r\n```\r\n\r\n## Edit the config file\r\nSee [Edit config file](#edit-config-file) section.\r\n\r\n## Run SnakeMAGs\r\n```\r\nconda activate snakemake_7.0.0\r\nmkdir cluster_logs\r\nsnakemake --snakefile SnakeMAGs.smk --cluster 'sbatch -p --mem -c -o \"cluster_logs/{wildcards}.{rule}.{jobid}.out\" -e \"cluster_logs/{wildcards}.{rule}.{jobid}.err\" ' --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\n## Study results\r\nThe MAGs reconstructed from each metagenome and their taxonomic classification are available in this [repository](https://doi.org/10.5281/zenodo.7661004).\r\n\r\n# Citations\r\n\r\nIf you use SnakeMAGs, please cite:\r\n> Tadrent N, Dedeine F and Hervé V. SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]. F1000Research 2023, 11:1522 (https://doi.org/10.12688/f1000research.128091.2)\r\n\r\n\r\nPlease also cite the dependencies:\r\n- [Snakemake](https://doi.org/10.12688/f1000research.29032.2) : Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., & Köster, J. (2021) Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. *F1000Research* 2021, 10:33.\r\n- [illumina-utils](https://doi.org/10.1371/journal.pone.0066643) : Murat Eren, A., Vineis, J. H., Morrison, H. G., & Sogin, M. L. (2013). A Filtering Method to Generate High Quality Short Reads Using Illumina Paired-End Technology. *PloS ONE*, 8(6), e66643.\r\n- [Trimmomatic](https://doi.org/10.1093/bioinformatics/btu170) : Bolger, A. M., Lohse, M., & Usadel, B. (2014). Genome analysis Trimmomatic: a flexible trimmer for Illumina sequence data. *Bioinformatics*, 30(15), 2114-2120.\r\n- [Bowtie2](https://doi.org/10.1038/nmeth.1923) : Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. *Nature Methods*, 9(4), 357–359.\r\n- [SAMtools](https://doi.org/10.1093/bioinformatics/btp352) : Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. *Bioinformatics*, 25(16), 2078–2079.\r\n- [BEDtools](https://doi.org/10.1093/bioinformatics/btq033) : Quinlan, A. R., & Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. *Bioinformatics*, 26(6), 841–842.\r\n- [MEGAHIT](https://doi.org/10.1093/bioinformatics/btv033) : Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. *Bioinformatics*, 31(10), 1674–1676.\r\n- [bwa](https://doi.org/10.1093/bioinformatics/btp324) : Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. *Bioinformatics*, 25(14), 1754–1760.\r\n- [MetaBAT2](https://doi.org/10.7717/peerj.7359) : Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z. (2019). MetaBAT 2: An adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. *PeerJ*, 2019(7), 1–13.\r\n- [CheckM](https://doi.org/10.1101/gr.186072.114) : Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. *Genome Research*, 25(7), 1043–1055.\r\n- [GTDB-Tk](https://doi.org/10.1093/BIOINFORMATICS/BTAC672) : Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P., Parks, D. H. (2022). GTDB-Tk v2: memory friendly classification with the genome taxonomy database. *Bioinformatics*.\r\n- [CoverM](https://github.com/wwood/CoverM)\r\n- [Waidele et al, 2019](https://doi.org/10.1101/526038) : Waidele, L., Korb, J., Voolstra, C. R., Dedeine, F., & Staubach, F. (2019). Ecological specificity of the metagenome in a set of lower termite species supports contribution of the microbiome to adaptation of the host. *Animal Microbiome*, 1(1), 1–13.\r\n- [Tokuda et al, 2018](https://doi.org/10.1073/pnas.1810550115) : Tokuda, G., Mikaelyan, A., Fukui, C., Matsuura, Y., Watanabe, H., Fujishima, M., & Brune, A. (2018). Fiber-associated spirochetes are major agents of hemicellulose degradation in the hindgut of wood-feeding higher termites. *Proceedings of the National Academy of Sciences of the United States of America*, 115(51), E11996–E12004.\r\n- [Romero Victorica et al, 2020](https://doi.org/10.1038/s41598-020-60850-5) : Romero Victorica, M., Soria, M. A., Batista-García, R. A., Ceja-Navarro, J. A., Vikram, S., Ortiz, M., Ontañon, O., Ghio, S., Martínez-Ávila, L., Quintero García, O. J., Etcheverry, C., Campos, E., Cowan, D., Arneodo, J., & Talia, P. M. (2020). Neotropical termite microbiomes as sources of novel plant cell wall degrading enzymes. *Scientific Reports*, 10(1), 1–14.\r\n- [Moreira et al, 2021](https://doi.org/10.3389/fevo.2021.632590) : Moreira, E. A., Persinoti, G. F., Menezes, L. R., Paixão, D. A. A., Alvarez, T. M., Cairo, J. P. L. F., Squina, F. M., Costa-Leonardo, A. M., Rodrigues, A., Sillam-Dussès, D., & Arab, A. (2021). Complementary contribution of Fungi and Bacteria to lignocellulose digestion in the food stored by a neotropical higher termite. *Frontiers in Ecology and Evolution*, 9(April), 1–12.\r\n- [Calusinska et al, 2020](https://doi.org/10.1038/s42003-020-1004-3) : Calusinska, M., Marynowska, M., Bertucci, M., Untereiner, B., Klimek, D., Goux, X., Sillam-Dussès, D., Gawron, P., Halder, R., Wilmes, P., Ferrer, P., Gerin, P., Roisin, Y., & Delfosse, P. (2020). Integrative omics analysis of the termite gut system adaptation to Miscanthus diet identifies lignocellulose degradation enzymes. *Communications Biology*, 3(1), 1–12.\r\n- [Orakov et al, 2021](https://doi.org/10.1186/s13059-021-02393-0) : Orakov, A., Fullam, A., Coelho, L. P., Khedkar, S., Szklarczyk, D., Mende, D. R., Schmidt, T. S. B., & Bork, P. (2021). GUNC: detection of chimerism and contamination in prokaryotic genomes. *Genome Biology*, 22(1).\r\n- [Parks et al, 2015](https://doi.org/10.1101/gr.186072.114) : Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. *Genome Research*, 25(7), 1043–1055.\r\n# License\r\nThis project is licensed under the CeCILL License - see the [LICENSE](https://github.com/Nachida08/SnakeMAGs/blob/main/LICENCE) file for details.\r\n\r\nDeveloped by Nachida Tadrent at the Insect Biology Research Institute ([IRBI](https://irbi.univ-tours.fr/)), under the supervision of Franck Dedeine and Vincent Hervé." . "application/ld+json" . . . . . . . . . . "https://w3id.org/ro-id/ea4e5a1d-3ce7-4438-af08-15fdd453600a" . "https://github.com/Nachida08/SnakeMAGs.git" . . "SnakeMAGs.smk" . "Research Object Crate for SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes" . "https://workflowhub.eu/workflows/554/ro_crate?version=1" . . . . "https://w3id.org/ro-id/635e700e-d0d4-44e3-86c1-cc6d1be9692f" . "https://w3id.org/ro-id/6818ab24-054d-4361-a874-66f8eadcdcfb" . "https://w3id.org/ro-id/79953e99-cb51-47ee-92f2-e153597557cc" . "https://w3id.org/ro-id/7a2c9019-ce99-4114-b043-59c555f6a675" . "https://w3id.org/ro-id/b1e226f2-7d2f-4fe9-986d-764713fb1367" . "https://w3id.org/ro-id/ed00d922-02c4-43ea-92fb-8370919fd3ba" . "https://w3id.org/ro-id/0117b55d-e90e-4e79-9563-53af932769bc" . "https://w3id.org/ro-id/caa86656-c17a-4a74-b434-13559f4fdb9a" . "https://w3id.org/ro-id/7643560e-07af-467d-9264-0dc4822271cc" . "https://w3id.org/ro-id/a802c9dc-d9c6-4e34-a39e-64169e862ca3" . "https://w3id.org/ro-id/17b77132-61af-4022-bb05-d8dcdd1629d8" . "https://w3id.org/ro-id/1bb54e24-2510-4650-b3ee-693506c81df9" . "https://w3id.org/ro-id/207dde09-f94a-4c1e-ac69-e12a018a2a54" . "https://w3id.org/ro-id/219ac64a-68b3-4600-89d7-892dac1b800a" . "https://w3id.org/ro-id/425f8800-6707-498e-95e7-7a07d6566c45" . "https://w3id.org/ro-id/5854d68d-3739-4763-b1f4-efb6bb33831a" . "https://w3id.org/ro-id/66e3c089-3752-499d-abbe-9f4229a57da4" . "https://w3id.org/ro-id/8aa43439-c880-4ebf-b6e0-425641410be2" . "https://w3id.org/ro-id/8b9de606-a58b-41d5-8d9c-6f5abb63763c" . "https://w3id.org/ro-id/964efa79-beaa-4933-ab64-741c20d6e74b" . "https://w3id.org/ro-id/9c85b21d-0945-40ed-a42a-c528171ded71" . "https://w3id.org/ro-id/a6ec004c-b2b8-461e-9db1-f0646a8a0f54" . "https://w3id.org/ro-id/aaa473f3-3e15-4492-b70c-5728f09ea390" . "https://w3id.org/ro-id/b2b2e1e4-1b56-4466-b081-2b735ab21aa9" . "https://w3id.org/ro-id/c043fdf5-0682-4f07-bab4-8d65eb709011" . "https://w3id.org/ro-id/d2d800f1-563d-4a32-b307-adf71714af88" . "https://w3id.org/ro-id/dc7f6540-5a7c-49e8-a20b-888b4dddd2cd" . "https://w3id.org/ro-id/eb227f58-9f6c-41b2-861a-8745cb2d0b84" . "https://w3id.org/ro-id/ffc5195e-f8e1-4f31-a169-4eb0d8f5902c" . "https://w3id.org/ro-id/7f8d90d3-686c-4c55-9674-f7a543556a4a" . "https://w3id.org/ro-id/f9310a1c-e153-4306-a690-7bc38d542374" . "https://w3id.org/ro-id/0fa4f9f3-5b02-4461-8a7e-8c0bc9f69530" . "https://w3id.org/ro-id/39cc26d1-6e94-40b7-b703-83c7779e442d" . "https://w3id.org/ro-id/c5a4842b-7866-4ea4-ac14-9b71fcc45510" . "https://w3id.org/ro-id/034c6461-3a06-46db-b7c1-3e95baa11d72" . "https://w3id.org/ro-id/03ec855b-4577-4303-beaa-0185ec7e1ffa" . "https://w3id.org/ro-id/04948723-5999-40da-9ba7-1bcb48307a49" . "https://w3id.org/ro-id/22fe22b6-4110-41a6-8d44-689ce3000808" . "https://w3id.org/ro-id/33f09808-1448-49db-8ee9-3e719b7e0f19" . "https://w3id.org/ro-id/36203ca3-3e4c-4114-afd6-8a724d8a3d48" . "https://w3id.org/ro-id/3a704fc1-5976-4c29-9ef7-950e61db1301" . "https://w3id.org/ro-id/3c9f4c17-2072-4143-ba39-810474de08a3" . "https://w3id.org/ro-id/4d8428b9-7947-48f5-a3b4-140569d5d1d1" . "https://w3id.org/ro-id/57904cc8-ab5a-424f-9131-7674c210b089" . "https://w3id.org/ro-id/758c8645-6d01-418a-9e20-7c8eb76b5c96" . "https://w3id.org/ro-id/775706eb-a1f7-4537-93c6-cae84d779a36" . "https://w3id.org/ro-id/7cee9a67-aa6c-4afa-b4d3-e3f91232323d" . "https://w3id.org/ro-id/7ef79ae7-4bf3-4bac-9b22-7bfc8fe7244d" . "https://w3id.org/ro-id/b602637d-20fc-49a5-ad4f-7bd03f2b1f16" . "https://w3id.org/ro-id/ca434270-dd98-4e5a-b559-79f326ec22a1" . "https://w3id.org/ro-id/ef9ec23c-7135-4bb8-820e-d970be7eb7dc" . "https://w3id.org/ro-id/f7b9b274-711a-4f91-92bf-7d1325c00e9c" . "https://w3id.org/ro-id/799089ef-100e-4e3e-aaa6-19f1f60d61c2" . "https://w3id.org/ro-id/81f08d93-2571-4519-9ea6-c700be6c2649" . "https://w3id.org/ro-id/095a3747-f378-4558-8ec0-e9d6099c6b15" . "https://w3id.org/ro-id/14df34d3-c0f1-41b0-bc4c-cadae39eac6b" . "https://w3id.org/ro-id/1a20e20f-8304-44b4-b602-4b61a0d0e11a" . "https://w3id.org/ro-id/5c6eefea-99d3-4d7c-bd4a-c5359c82f5cc" . "https://w3id.org/ro-id/5cfab39f-d6cd-499f-a959-35db2fad6ac7" . "https://w3id.org/ro-id/5d6b6eee-e3ac-4a0a-9240-677c2bf9ab9d" . "https://w3id.org/ro-id/655b2e90-c869-4319-b452-06d5a8d400b3" . "https://w3id.org/ro-id/6898fcfc-ec2a-4e8b-9c20-b8d0d2a2afec" . "https://w3id.org/ro-id/71414ad6-b3d1-4462-a560-e52197281bd7" . "https://w3id.org/ro-id/75373bfc-e8a1-43b1-9cf7-a39ada17e3c5" . "https://w3id.org/ro-id/77853e27-dc8a-4b1a-b34c-02e7a9c0bf00" . "https://w3id.org/ro-id/7d6124bd-3cbb-468f-a57a-6eedd11aa21c" . "https://w3id.org/ro-id/87f85a41-253b-48ce-8e71-9ac2bed2dc8b" . "https://w3id.org/ro-id/9d98528f-1d57-40ff-b1af-46fa543a3ee6" . "https://w3id.org/ro-id/ac30161b-497f-457b-828b-22cc344da5a1" . "https://w3id.org/ro-id/c8a2f439-6202-4114-b0e7-8df97e875941" . "https://w3id.org/ro-id/ca62b88b-2b0e-4d5e-9a8c-846992030d16" . "https://w3id.org/ro-id/d6e16f12-42b0-4151-bba5-16bb40ee68ef" . "https://w3id.org/ro-id/e6c3e635-ac19-4cdd-a9c6-daba3b09b7cb" . "https://w3id.org/ro-id/216aefca-4616-44c8-8c13-2584ad555e14" . "https://w3id.org/ro-id/80c095cc-0b47-4108-b2ca-3ebd5762d28c" . "https://w3id.org/ro-id/95386af5-b9bb-4698-acfa-48385e0d1ef7" . "https://w3id.org/ro-id/b932a393-337b-4633-9ac7-0f874c7c3b9b" . "https://w3id.org/ro-id/3fef4c50-e761-440a-bd90-875910aa3bea" . "https://w3id.org/ro-id/41405e04-8a90-4140-a0e2-72be499339c1" . "https://w3id.org/ro-id/42257f30-22cf-45a4-a2da-f12565ff8519" . "https://w3id.org/ro-id/42e1f717-6e1e-41f3-bc28-997de4a67494" . "https://w3id.org/ro-id/4fc089af-d937-4749-a121-ba5ef056a808" . "https://w3id.org/ro-id/6a713bec-572f-4e65-9857-c7ec122c65ea" . "https://w3id.org/ro-id/6ed2bfd0-531b-4945-a839-4174e9e29182" . "https://w3id.org/ro-id/8605625b-8054-404a-8a65-631793685d28" . "https://w3id.org/ro-id/97873799-96ba-4ea7-afe9-e9a2107a9e42" . "https://w3id.org/ro-id/c334ee6b-4f5d-4f0f-adda-4c95d227f1b1" . "https://w3id.org/ro-id/c62eb665-cc7c-427f-9f48-b153884ed7c8" . "https://w3id.org/ro-id/d48cbf6d-6b8f-4422-b077-1d4dedfa2711" . "https://w3id.org/ro-id/f1d4517b-70ff-4ef9-822a-391802e82fee" . "https://w3id.org/ro-id/ff5351ce-7f15-4e52-b0e6-a99b9760ad76" . "Franck Dedeine. \"Research Object Crate for SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes.\" ROHub. Sep 08 ,2023. https://w3id.org/ro-id/ea4e5a1d-3ce7-4438-af08-15fdd453600a." . . . . "test" . . . . "host_genomes" . . . . . "fastqs" . . . . . . . . . . . . . . . "SnakeMAGs_conda_env" . . . . "17606811"^^ . "https://api.rohub.org/api/resources/08e0d3e2-f08c-4351-9c05-bcd819415ca1/download/" . . "2023-09-08 12:14:53.971686+00:00" . "2023-09-08 12:15:02.531809+00:00" . . "chr19.fa.gz" . "2023-09-08 12:14:53.971686+00:00" . . . . "2366"^^ . "https://api.rohub.org/api/resources/0c0da1c3-4a8c-433a-90ff-2e2c73df4698/download/" . . "2023-09-08 12:14:53.805739+00:00" . "2023-09-08 12:14:57.512865+00:00" . . "IU.yaml" . "2023-09-08 12:14:53.805739+00:00" . . . . "17733118"^^ . "https://api.rohub.org/api/resources/0d714d3f-9bad-4a8e-aa5e-e8209e644df5/download/" . . "2023-09-08 12:14:53.925313+00:00" . "2023-09-08 12:15:01.136362+00:00" . . "insub732_2_R2.fastq.gz" . "2023-09-08 12:14:53.925313+00:00" . . . . "865"^^ . "https://api.rohub.org/api/resources/0e302d55-8dd6-4d8d-ad7e-43f32951db1e/download/" . . "2023-09-08 12:14:53.807242+00:00" . "2023-09-08 12:14:58.126946+00:00" . . "METABAT2.yaml" . "2023-09-08 12:14:53.807242+00:00" . . . . "906"^^ . "https://api.rohub.org/api/resources/16e462fa-4cd2-4da4-b21d-0dd924bbce08/download/" . . "2023-09-08 12:14:53.806514+00:00" . "2023-09-08 12:14:57.679923+00:00" . . "MEGAHIT.yaml" . "2023-09-08 12:14:53.806514+00:00" . . . . "16954448"^^ . "https://api.rohub.org/api/resources/3023d11f-c833-4bc1-a1e7-f3025849fdcd/download/" . . "2023-09-08 12:14:53.880406+00:00" . "2023-09-08 12:14:59.764373+00:00" . . "insub732_2_R1.fastq.gz" . "2023-09-08 12:14:53.880406+00:00" . . . . "102488" . "https://api.rohub.org/api/resources/32373097-ac81-4ab2-b955-d6a0eeccc257/download/" . . "2023-09-08 12:14:53.810032+00:00" . "2023-09-08 12:14:58.734789+00:00" . "image/jpeg" . . "SnakeMAGs_schema.jpg" . "2023-09-08 12:14:53.810032+00:00" . . . . . . "7807"^^ . "https://api.rohub.org/api/resources/37f0314d-a81b-4e97-900c-c5e316529390/download/" . . "2023-09-08 12:14:53.811612+00:00" . "2023-09-08 12:14:59.122989+00:00" . . "config.yaml" . "2023-09-08 12:14:53.811612+00:00" . . . . "377"^^ . "https://api.rohub.org/api/resources/4ad8b10b-80c5-4681-a146-a4420b85bbf2/download/" . . "2023-09-08 12:14:53.802014+00:00" . "2023-09-08 12:14:55.995604+00:00" . . "BWA.yaml" . "2023-09-08 12:14:53.802014+00:00" . . . . "912"^^ . "https://api.rohub.org/api/resources/62f91d61-decd-438b-a8ef-92e0b1bca441/download/" . . "2023-09-08 12:14:53.808733+00:00" . "2023-09-08 12:14:58.559136+00:00" . . "TRIMMOMATIC.yaml" . "2023-09-08 12:14:53.808733+00:00" . . . . "1594"^^ . "https://api.rohub.org/api/resources/73bc538b-17a9-47c7-96c6-89041e1af50a/download/" . . "2023-09-08 12:14:53.804238+00:00" . "2023-09-08 12:14:57.023879+00:00" . . "GTDBTK.yaml" . "2023-09-08 12:14:53.804238+00:00" . . . . "1319"^^ . "https://api.rohub.org/api/resources/857d9978-6c7b-40b3-9092-94f338b16034/download/" . . "2023-09-08 12:14:53.803518+00:00" . "2023-09-08 12:14:56.809487+00:00" . . "COVERM.yaml" . "2023-09-08 12:14:53.803518+00:00" . . . . "7596"^^ . "https://api.rohub.org/api/resources/8a4f77c7-7b79-4795-9e75-733f3bf74ef9/download/" . . "2023-09-08 12:14:53.812720+00:00" . "2023-09-08 12:14:59.561998+00:00" . . "config.yaml" . "2023-09-08 12:14:53.812720+00:00" . . . . "1023"^^ . "https://api.rohub.org/api/resources/8bab6cad-93da-4362-ae84-4c69d1dffafd/download/" . . "2023-09-08 12:14:53.801278+00:00" . "2023-09-08 12:14:55.753669+00:00" . . "BOWTIE2.yaml" . "2023-09-08 12:14:53.801278+00:00" . . . "https://ror.org/https://workflowhub.eu/workflows/554?version=1" . . "81107"^^ . "https://api.rohub.org/api/resources/91ad0de4-d0d4-4dbf-af64-c7b3e266ebf6/download/" . . "2023-09-08 12:14:53.973230+00:00" . "2023-09-08 12:15:07.247971+00:00" . "text/html" . . "ro-crate-preview.html" . "2023-09-08 12:14:53.973230+00:00" . . . . . "21781"^^ . "https://api.rohub.org/api/resources/a8209885-306a-464c-9936-6e8e7ff616d1/download/" . . "2023-09-08 12:14:53.797588+00:00" . "2023-09-08 12:14:54.933784+00:00" . . "LICENCE" . "2023-09-08 12:14:53.797588+00:00" . . . . "1971"^^ . "https://api.rohub.org/api/resources/a9adc721-5f39-4836-b613-c3c336261de6/download/" . . "2023-09-08 12:14:53.804981+00:00" . "2023-09-08 12:14:57.309048+00:00" . . "GUNC.yaml" . "2023-09-08 12:14:53.804981+00:00" . . . "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE/" . . "17575" . "https://api.rohub.org/api/resources/b1322e11-1226-4817-8a5e-a283a8def83a/download/" . . "2023-08-02 11:41:06+00:00" . "2023-09-08 12:15:07.067874+00:00" . "[![Snakemake](https://img.shields.io/badge/snakemake-≥7.0.0-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io)\r\n\r\n\r\n# About SnakeMAGs\r\nSnakeMAGs is a workflow to reconstruct prokaryotic genomes from metagenomes. The main purpose of SnakeMAGs is to process Illumina data from raw reads to metagenome-assembled genomes (MAGs).\r\nSnakeMAGs is efficient, easy to handle and flexible to different projects. The workflow is CeCILL licensed, implemented in Snakemake (run on multiple cores) and available for Linux.\r\nSnakeMAGs performed eight main steps:\r\n- Quality filtering of the reads\r\n- Adapter trimming\r\n- Filtering of the host sequences (optional)\r\n- Assembly\r\n- Binning\r\n- Evaluation of the quality of the bins\r\n- Classification of the MAGs\r\n- Estimation of the relative abundance of the MAGs\r\n\r\n\r\n![scheme of workflow](SnakeMAGs_schema.jpg?raw=true)\r\n\r\n# How to use SnakeMAGs\r\n## Install conda\r\nThe easiest way to install and run SnakeMAGs is to use [conda](https://www.anaconda.com/products/distribution). These package managers will help you to easily install [Snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html).\r\n\r\n## Install and activate Snakemake environment\r\nNote: The workflow was developed with Snakemake 7.0.0\r\n```\r\nconda activate\r\n\r\n# First, set up your channel priorities\r\nconda config --add channels defaults\r\nconda config --add channels bioconda\r\nconda config --add channels conda-forge\r\n\r\n# Then, create a new environment for the Snakemake version you require\r\nconda create -n snakemake_7.0.0 snakemake=7.0.0\r\n\r\n# And activate it\r\nconda activate snakemake_7.0.0\r\n```\r\n\r\nAlternatively, you can also install Snakemake via mamba:\r\n```\r\n# If you do not have mamba yet on your machine, you can install it with:\r\nconda install -n base -c conda-forge mamba\r\n\r\n# Then you can install Snakemake\r\nconda activate base\r\nmamba create -c conda-forge -c bioconda -n snakemake snakemake\r\n\r\n# And activate it\r\nconda activate snakemake\r\n\r\n```\r\n\r\n## SnakeMAGs executable\r\nThe easiest way to procure SnakeMAGs and its related files is to clone the repository using git:\r\n```\r\ngit clone https://github.com/Nachida08/SnakeMAGs.git\r\n```\r\nAlternatively, you can download the relevant files:\r\n```\r\nwget https://github.com/Nachida08/SnakeMAGs/blob/main/SnakeMAGs.smk https://github.com/Nachida08/SnakeMAGs/blob/main/config.yaml\r\n```\r\n\r\n## SnakeMAGs input files\r\n- Illumina paired-end reads in FASTQ.\r\n- Adapter sequence file ([adapter.fa](https://github.com/Nachida08/SnakeMAGs/blob/main/adapters.fa)).\r\n- Host genome sequences in FASTA (if host_genome: \"yes\"), in case you work with host-associated metagenomes (e.g. human gut metagenome).\r\n\r\n## Download Genome Taxonomy Database (GTDB)\r\nGTDB-Tk requires ~66G+ of external data (GTDB) that need to be downloaded and unarchived. Because this database is voluminous, we let you decide where you want to store it.\r\nSnakeMAGs do not download automatically GTDB, you have to do it:\r\n\r\n```\r\n#Download the latest release (tested with release207)\r\n#Note: SnakeMAGs uses GTDBtk v2.1.0 and therefore require release 207 as minimum version. See https://ecogenomics.github.io/GTDBTk/installing/index.html#installing for details.\r\nwget https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_v2_data.tar.gz\r\n#Decompress\r\ntar -xzvf *tar.gz\r\n#This will create a folder called release207_v2\r\n```\r\nAll you have to do now is to indicate the path to the database folder (in our example, the folder is called release207_v2) in the config file, Classification section.\r\n\r\n## Download the GUNC database (required if gunc: \"yes\")\r\nGUNC accepts either a progenomes or GTDB based reference database. Both can be downloaded using the ```gunc download_db``` command. For our study we used the default proGenome-derived GUNC database. It requires less resources with similar performance.\r\n\r\n```\r\nconda activate\r\n# Install and activate GUNC environment\r\nconda create --prefix /path/to/gunc_env\r\nconda install -c bioconda metabat2 --prefix /path/to/gunc_env\r\nsource activate /path/to/gunc_env\r\n\r\n#Download the proGenome-derived GUNC database (tested with gunc_db_progenomes2.1)\r\n#Note: SnakeMAGs uses GUNC v1.0.5\r\ngunc download_db -db progenomes /path/to/GUNC_DB\r\n```\r\nAll you have to do now is to indicate the path to the GUNC database file in the config file, Bins quality section.\r\n\r\n## Edit config file\r\nYou need to edit the config.yaml file. In particular, you need to set the correct paths: for the working directory, to specify where are your fastq files, where you want to place the conda environments (that will be created using the provided .yaml files available in [SnakeMAGs_conda_env directory](https://github.com/Nachida08/SnakeMAGs/tree/main/SnakeMAGs_conda_env)), where are the adapters, where is GTDB and optionally where is the GUNC database and where is your host genome reference.\r\n\r\nLastly, you need to allocate the proper computational resources (threads, memory) for each of the main steps. These can be optimized according to your hardware.\r\n\r\n\r\n\r\nHere is an example of a config file:\r\n\r\n```\r\n#####################################################################################################\r\n##### _____ ___ _ _ _ ______ __ __ _______ _____ #####\r\n##### / ___| | \\ | | /\\ | | / / | ____| | \\ / | /\\ / _____| / ___| #####\r\n##### | (___ | |\\ \\ | | / \\ | |/ / | |____ | \\/ | / \\ | | __ | (___ #####\r\n##### \\___ \\ | | \\ \\| | / /\\ \\ | |\\ \\ | ____| | |\\ /| | / /\\ \\ | | |_ | \\___ \\ #####\r\n##### ____) | | | \\ | / /__\\ \\ | | \\ \\ | |____ | | \\/ | | / /__\\ \\ | |____|| ____) | #####\r\n##### |_____/ |_| \\__| /_/ \\_\\ |_| \\_\\ |______| |_| |_| /_/ \\_\\ \\______/ |_____/ #####\r\n##### #####\r\n#####################################################################################################\r\n\r\n############################\r\n### Execution parameters ###\r\n############################\r\n\r\nworking_dir: /path/to/working/directory/ #The main directory for the project\r\nraw_fastq: /path/to/raw_fastq/ #The directory that contains all the fastq files of all the samples (eg. sample1_R1.fastq & sample1_R2.fastq, sample2_R1.fastq & sample2_R2.fastq...)\r\nsuffix_1: \"_R1.fastq\" #Main type of suffix for forward reads file (eg. _1.fastq or _R1.fastq or _r1.fastq or _1.fq or _R1.fq or _r1.fq )\r\nsuffix_2: \"_R2.fastq\" #Main type of suffix for reverse reads file (eg. _2.fastq or _R2.fastq or _r2.fastq or _2.fq or _R2.fq or _r2.fq )\r\n\r\n###########################\r\n### Conda environnemnts ###\r\n###########################\r\n\r\nconda_env: \"/path/to/SnakeMAGs_conda_env/\" #Path to the provided SnakeMAGs_conda_env directory which contains the yaml file for each conda environment\r\n\r\n#########################\r\n### Quality filtering ###\r\n#########################\r\nemail: name.surname@your-univ.com #Your e-mail address\r\nthreads_filter: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_filter: 150 #Memory according to tools need (in GB)\r\n\r\n########################\r\n### Adapter trimming ###\r\n########################\r\nadapters: /path/to/working/directory/adapters.fa #A fasta file contanning a set of various Illumina adaptors (this file is provided and is also available on github)\r\ntrim_params: \"2:40:15\" #For further details, see the Trimmomatic documentation\r\nthreads_trim: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_trim: 150 #Memory according to tools need (in GB)\r\n\r\n######################\r\n### Host filtering ###\r\n######################\r\nhost_genome: \"yes\" #yes or no. An optional step for host-associated samples (eg. termite, human, plant...)\r\nthreads_bowtie2: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nhost_genomes_directory: /path/to/working/host_genomes/ #the directory where the host genome is stored\r\nhost_genomes: /path/to/working/host_genomes/host_genomes.fa #A fasta file containing the DNA sequences of the host genome(s)\r\nthreads_samtools: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_host_filtering: 150 #Memory according to tools need (in GB)\r\n\r\n################\r\n### Assembly ###\r\n################\r\nthreads_megahit: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nmin_contig_len: 1000 #Minimum length (in bp) of the assembled contigs\r\nk_list: \"21,31,41,51,61,71,81,91,99,109,119\" #Kmer size (for further details, see the megahit documentation)\r\nresources_megahit: 250 #Memory according to tools need (in GB)\r\n\r\n###############\r\n### Binning ###\r\n###############\r\nthreads_bwa: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_bwa: 150 #Memory according to tools need (in GB)\r\nthreads_samtools: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_samtools: 150 #Memory according to tools need (in GB)\r\nseed: 19860615 #Seed number for reproducible results\r\nthreads_metabat: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nminContig: 2500 #Minimum length (in bp) of the contigs\r\nresources_binning: 250 #Memory according to tools need (in GB)\r\n\r\n####################\r\n### Bins quality ###\r\n####################\r\n#checkM\r\nthreads_checkm: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_checkm: 250 #Memory according to tools need (in GB)\r\n#bins_quality_filtering\r\ncompletion: 50 #The minimum completion rate of bins\r\ncontamination: 10 #The maximum contamination rate of bins\r\nparks_quality_score: \"yes\" #yes or no. If yes bins are filtered according to the Parks quality score (completion-5*contamination >= 50)\r\n#GUNC\r\ngunc: \"yes\" #yes or no. An optional step to detect and discard chimeric and contaminated genomes using the GUNC tool\r\nthreads_gunc: 50 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_gunc: 250 #Memory according to tools need (in GB)\r\nGUNC_db: /path/to/GUNC_DB/gunc_db_progenomes2.1.dmnd #Path to the downloaded GUNC database (see the readme file)\r\n\r\n######################\r\n### Classification ###\r\n######################\r\nGTDB_data_ref: /path/to/downloaded/GTDB #Path to uncompressed GTDB-Tk reference data (GTDB)\r\nthreads_gtdb: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_gtdb: 250 #Memory according to tools need (in GB)\r\n\r\n##################\r\n### Abundances ###\r\n##################\r\nthreads_coverM: 10 #The number of threads to run this process. To be adjusted according to your hardware\r\nresources_coverM: 150 #Memory according to tools need (in GB)\r\n```\r\n# Run SnakeMAGs\r\nIf you are using a workstation with Ubuntu (tested on Ubuntu 22.04):\r\n```{bash}\r\nsnakemake --cores 30 --snakefile SnakeMAGs.smk --use-conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --configfile /path/to/config.yaml --keep-going --latency-wait 180\r\n```\r\n\r\nIf you are working on a cluster with Slurm (tested with version 18.08.7):\r\n```{bash}\r\nsnakemake --snakefile SnakeMAGs.smk --cluster 'sbatch -p --mem -c -o \"cluster_logs/{wildcards}.{rule}.{jobid}.out\" -e \"cluster_logs/{wildcards}.{rule}.{jobid}.err\" ' --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\nIf you are working on a cluster with SGE (tested with version 8.1.9):\r\n```{bash}\r\nsnakemake --snakefile SnakeMAGs.smk --cluster \"qsub -cwd -V -q -pe thread {threads} -e cluster_logs/{rule}.e{jobid} -o cluster_logs/{rule}.o{jobid}\" --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\n\r\n# Test\r\nWe provide you a small data set in the [test](https://github.com/Nachida08/SnakeMAGs/tree/main/test) directory which will allow you to validate your instalation and take your first steps with SnakeMAGs. This data set is a subset from [ZymoBiomics Mock Community](https://www.zymoresearch.com/blogs/blog/zymobiomics-microbial-standards-optimize-your-microbiomics-workflow) (250K reads) used in this tutoriel [metagenomics_tutorial](https://github.com/pjtorres/metagenomics_tutorial).\r\n\r\n1. Before getting started make sure you have cloned the SnakeMAGs repository or you have downloaded all the necessary files (SnakeMAGs.smk, config.yaml, chr19.fa.gz, insub732_2_R1.fastq.gz, insub732_2_R2.fastq.gz). See the [SnakeMAGs executable](#snakemags-executable) section.\r\n2. Unzip the fastq files and the host sequences file.\r\n```\r\ngunzip fastqs/insub732_2_R1.fastq.gz fastqs/insub732_2_R2.fastq.gz host_genomes/chr19.fa.gz\r\n```\r\n3. For better organisation put all the read files in the same directory (eg. fastqs) and the host sequences file in a separate directory (eg. host_genomes)\r\n4. Edit the config file (see [Edit config file](#edit-config-file) section)\r\n5. Run the test (see [Run SnakeMAGs](#run-snakemags) section)\r\n\r\nNote: the analysis of these files took 1159.32 secondes to complete on a Ubuntu 22.04 LTS with an Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz x 40 processor, 96GB of RAM.\r\n\r\n# Genome reference for host reads filtering\r\nFor host-associated samples, one can remove host sequences from the metagenomic reads by mapping these reads against a reference genome. In the case of termite gut metagenomes, we are providing [here](https://zenodo.org/record/6908287#.YuAdFXZBx8M) the relevant files (fasta and index files) from termite genomes.\r\n\r\nUpon request, we can help you to generate these files for your own reference genome and make them available to the community.\r\n\r\nNB. These steps of mapping generate voluminous files such as .bam and .sam. Depending on your disk space, you might want to delete these files after use.\r\n\r\n\r\n# Use case\r\nDuring the test phase of the development of SnakeMAGs, we used this workflow to process 10 publicly available termite gut metagenomes generated by Illumina sequencing, to ultimately reconstruct prokaryotic MAGs. These metagenomes were retrieved from the NCBI database using the following accession numbers: SRR10402454; SRR14739927; SRR8296321; SRR8296327; SRR8296329; SRR8296337; SRR8296343; DRR097505; SRR7466794; SRR7466795. They come from five different studies: Waidele et al, 2019; Tokuda et al, 2018; Romero Victorica et al, 2020; Moreira et al, 2021; and Calusinska et al, 2020.\r\n\r\n## Download the Illumina pair-end reads\r\nWe use fasterq-dump tool to extract data in FASTQ-format from SRA-accessions. It is a commandline-tool which offers a faster solution for downloading those large files.\r\n\r\n```\r\n# Install and activate sra-tools environment\r\n## Note: For this study we used sra-tools 2.11.0\r\n\r\nconda activate\r\nconda install -c bioconda sra-tools\r\nconda activate sra-tools\r\n\r\n# Download fastqs in a single directory\r\nmkdir raw_fastq\r\ncd raw_fastq\r\nfasterq-dump --threads --skip-technical --split-3\r\n```\r\n\r\n## Download Genome reference for host reads filtering\r\n```\r\nmkdir host_genomes\r\ncd host_genomes\r\nwget https://zenodo.org/record/6908287/files/termite_genomes.fasta.gz\r\ngunzip termite_genomes.fasta.gz\r\n```\r\n\r\n## Edit the config file\r\nSee [Edit config file](#edit-config-file) section.\r\n\r\n## Run SnakeMAGs\r\n```\r\nconda activate snakemake_7.0.0\r\nmkdir cluster_logs\r\nsnakemake --snakefile SnakeMAGs.smk --cluster 'sbatch -p --mem -c -o \"cluster_logs/{wildcards}.{rule}.{jobid}.out\" -e \"cluster_logs/{wildcards}.{rule}.{jobid}.err\" ' --jobs --use-conda --conda-frontend conda --conda-prefix /path/to/SnakeMAGs_conda_env/ --jobname \"{rule}.{wildcards}.{jobid}\" --latency-wait 180 --configfile /path/to/config.yaml --keep-going\r\n```\r\n\r\n## Study results\r\nThe MAGs reconstructed from each metagenome and their taxonomic classification are available in this [repository](https://doi.org/10.5281/zenodo.7661004).\r\n\r\n# Citations\r\n\r\nIf you use SnakeMAGs, please cite:\r\n> Tadrent N, Dedeine F and Hervé V. SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes [version 2; peer review: 2 approved]. F1000Research 2023, 11:1522 (https://doi.org/10.12688/f1000research.128091.2)\r\n\r\n\r\nPlease also cite the dependencies:\r\n- [Snakemake](https://doi.org/10.12688/f1000research.29032.2) : Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., & Köster, J. (2021) Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. *F1000Research* 2021, 10:33.\r\n- [illumina-utils](https://doi.org/10.1371/journal.pone.0066643) : Murat Eren, A., Vineis, J. H., Morrison, H. G., & Sogin, M. L. (2013). A Filtering Method to Generate High Quality Short Reads Using Illumina Paired-End Technology. *PloS ONE*, 8(6), e66643.\r\n- [Trimmomatic](https://doi.org/10.1093/bioinformatics/btu170) : Bolger, A. M., Lohse, M., & Usadel, B. (2014). Genome analysis Trimmomatic: a flexible trimmer for Illumina sequence data. *Bioinformatics*, 30(15), 2114-2120.\r\n- [Bowtie2](https://doi.org/10.1038/nmeth.1923) : Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. *Nature Methods*, 9(4), 357–359.\r\n- [SAMtools](https://doi.org/10.1093/bioinformatics/btp352) : Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. *Bioinformatics*, 25(16), 2078–2079.\r\n- [BEDtools](https://doi.org/10.1093/bioinformatics/btq033) : Quinlan, A. R., & Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. *Bioinformatics*, 26(6), 841–842.\r\n- [MEGAHIT](https://doi.org/10.1093/bioinformatics/btv033) : Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. *Bioinformatics*, 31(10), 1674–1676.\r\n- [bwa](https://doi.org/10.1093/bioinformatics/btp324) : Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. *Bioinformatics*, 25(14), 1754–1760.\r\n- [MetaBAT2](https://doi.org/10.7717/peerj.7359) : Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z. (2019). MetaBAT 2: An adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. *PeerJ*, 2019(7), 1–13.\r\n- [CheckM](https://doi.org/10.1101/gr.186072.114) : Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. *Genome Research*, 25(7), 1043–1055.\r\n- [GTDB-Tk](https://doi.org/10.1093/BIOINFORMATICS/BTAC672) : Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P., Parks, D. H. (2022). GTDB-Tk v2: memory friendly classification with the genome taxonomy database. *Bioinformatics*.\r\n- [CoverM](https://github.com/wwood/CoverM)\r\n- [Waidele et al, 2019](https://doi.org/10.1101/526038) : Waidele, L., Korb, J., Voolstra, C. R., Dedeine, F., & Staubach, F. (2019). Ecological specificity of the metagenome in a set of lower termite species supports contribution of the microbiome to adaptation of the host. *Animal Microbiome*, 1(1), 1–13.\r\n- [Tokuda et al, 2018](https://doi.org/10.1073/pnas.1810550115) : Tokuda, G., Mikaelyan, A., Fukui, C., Matsuura, Y., Watanabe, H., Fujishima, M., & Brune, A. (2018). Fiber-associated spirochetes are major agents of hemicellulose degradation in the hindgut of wood-feeding higher termites. *Proceedings of the National Academy of Sciences of the United States of America*, 115(51), E11996–E12004.\r\n- [Romero Victorica et al, 2020](https://doi.org/10.1038/s41598-020-60850-5) : Romero Victorica, M., Soria, M. A., Batista-García, R. A., Ceja-Navarro, J. A., Vikram, S., Ortiz, M., Ontañon, O., Ghio, S., Martínez-Ávila, L., Quintero García, O. J., Etcheverry, C., Campos, E., Cowan, D., Arneodo, J., & Talia, P. M. (2020). Neotropical termite microbiomes as sources of novel plant cell wall degrading enzymes. *Scientific Reports*, 10(1), 1–14.\r\n- [Moreira et al, 2021](https://doi.org/10.3389/fevo.2021.632590) : Moreira, E. A., Persinoti, G. F., Menezes, L. R., Paixão, D. A. A., Alvarez, T. M., Cairo, J. P. L. F., Squina, F. M., Costa-Leonardo, A. M., Rodrigues, A., Sillam-Dussès, D., & Arab, A. (2021). Complementary contribution of Fungi and Bacteria to lignocellulose digestion in the food stored by a neotropical higher termite. *Frontiers in Ecology and Evolution*, 9(April), 1–12.\r\n- [Calusinska et al, 2020](https://doi.org/10.1038/s42003-020-1004-3) : Calusinska, M., Marynowska, M., Bertucci, M., Untereiner, B., Klimek, D., Goux, X., Sillam-Dussès, D., Gawron, P., Halder, R., Wilmes, P., Ferrer, P., Gerin, P., Roisin, Y., & Delfosse, P. (2020). Integrative omics analysis of the termite gut system adaptation to Miscanthus diet identifies lignocellulose degradation enzymes. *Communications Biology*, 3(1), 1–12.\r\n- [Orakov et al, 2021](https://doi.org/10.1186/s13059-021-02393-0) : Orakov, A., Fullam, A., Coelho, L. P., Khedkar, S., Szklarczyk, D., Mende, D. R., Schmidt, T. S. B., & Bork, P. (2021). GUNC: detection of chimerism and contamination in prokaryotic genomes. *Genome Biology*, 22(1).\r\n- [Parks et al, 2015](https://doi.org/10.1101/gr.186072.114) : Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. *Genome Research*, 25(7), 1043–1055.\r\n# License\r\nThis project is licensed under the CeCILL License - see the [LICENSE](https://github.com/Nachida08/SnakeMAGs/blob/main/LICENCE) file for details.\r\n\r\nDeveloped by Nachida Tadrent at the Insect Biology Research Institute ([IRBI](https://irbi.univ-tours.fr/)), under the supervision of Franck Dedeine and Vincent Hervé." . "SnakeMAGs_schema.jpg" . "Bioinformatics, Metagenomics, binning, MAG" . . "SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes" . "https://workflowhub.eu/projects/183" . "#snakemake" . "2023-08-02 11:41:06+00:00" . "https://about.workflowhub.eu/" . "https://workflowhub.eu/workflows/554?version=1" . "1" . . . . . . "13954"^^ . "https://api.rohub.org/api/resources/b60fea1e-3999-4edf-b020-64565bcbe305/download/" . . "2023-09-08 12:14:53.810850+00:00" . "2023-09-08 12:14:58.933443+00:00" . . "adapters.fa" . "2023-09-08 12:14:53.810850+00:00" . . . . "418"^^ . "https://api.rohub.org/api/resources/c6f9ce5e-7de9-49f7-948b-4c13cfbd663e/download/" . . "2023-09-08 12:14:53.800549+00:00" . "2023-09-08 12:14:55.555873+00:00" . . "BEDTOOLS.yaml" . "2023-09-08 12:14:53.800549+00:00" . . . . "2017"^^ . "https://api.rohub.org/api/resources/dc248e07-f7db-4afd-a2f9-6d677f924882/download/" . . "2023-09-08 12:14:53.802785+00:00" . "2023-09-08 12:14:56.620353+00:00" . . "CHECKM.yaml" . "2023-09-08 12:14:53.802785+00:00" . . . . "24254"^^ . "https://api.rohub.org/api/resources/f13b9a0d-00a5-4966-8d06-86106dcfbc2b/download/" . . "2023-09-08 12:14:53.798590+00:00" . "2023-09-08 12:14:55.137185+00:00" . "text/markdown" . . "README.md" . "2023-09-08 12:14:53.798590+00:00" . . . . "808"^^ . "https://api.rohub.org/api/resources/f6d59238-3ae4-4372-a5ad-2a1d32c86570/download/" . . "2023-09-08 12:14:53.807992+00:00" . "2023-09-08 12:14:58.380563+00:00" . . "SAMTOOLS.yaml" . "2023-09-08 12:14:53.807992+00:00" . . . . . . "spirochete" . . "3.473491773308958" . "5.7" . "biology" . . "5.364511691884457" . "3.9" . "lignocellulose" . . "3.9408866995073892" . "3.2" . "2009" . . "file" . . "5.049261083743842" . "4.1" . "atmospheric sciences" . . "100.0" . "0.3184818625450134" . "2022" . . "route" . . "4.448507007921999" . "7.3" . "Metagenomic tools" . . . . "2025-11-11T16:09:13.448+01:00"^^ . . . "Research Object Crate for SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes" . "RSA" . "MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA4pPaESKwmC6l37P86K6TNLq6yeQtc7m9CvcqauLs/1FC0viHvQnFBgxj0a+loPDv/Egwe6OqFpa0iW9Ypnyz9YPoh+pxbRXonbuMOb+8Ry9hXZ+TEKfWjhjVDGEaClwfRwglh2HI/xfV4CD9AgvDOEoZQiyta8a90PYwJ3G6e70oCHTn61+OWTkI9KRYHOYgg3btdy2Z7q/30PTFawb2ZT5aIfIJYobUYv2a7yhtcqWCHZeKv0bxGnRjTFNx1rscBMlLJSzvRtpQc1cCRVEPFZHo1adaXCI9tGvn4cxeNQ96y8dxkN1XhpaJairde+23MDzf42Oe97KG2HYzKiyVnQIDAQAB" . "f45dWiPFSgsMl4FVMk/f1/ksvvc9Snpuw0mGeoFzjsDP9qN2fXp1WUEFex3tFQeTwCOBHIdvVK6B/5zxWpbzCZR+peSZAIMRPwrZUSTdoFjqvvAkRnWy1n3OsFkTixtKIlJLS86GGMEcpxPfodULBgo6FGE5+VgPmiCfbbDr+eY+A7Hk325SdLy44wND5vqB4BR7oRtWoGgrOW1X7luCw5+N0IPvnnXfMnaZWsloDt50c84I93G2b56obj/QBjoACRyXMzbL1L1Ez79hrB0Zyp11HZHJMwqvlKaqeGY8x1/05jx0zRrcwc9LxuxXB38cbfFY3V/26voL9i3a6QAVpw==" . . .