yongxinliu / easymetagenome Goto Github PK

View Code? Open in Web Editor NEW

219.0 10.0 111.0 198.13 MB

Easy Metagenome Pipeline

License: GNU General Public License v3.0

Shell 1.10% R 0.15% TeX 0.16% HTML 95.24% Roff 3.35%

easymetagenome's Introduction

易宏基因组——简单易用的宏基因组分析流程

EasyMetagenome - the easy-to-use metagenome analysis pipeline

版本Version：v1.21

更新时间Update：2024/5/17

图1. 易宏基因组的工作流程：分析宏基因组测序从原始数据到物种和功能组成表.

Figure 1. EasyMetagenome Pipeline from raw data to taxonomic & functional table for analyzing metagenomic sequencing.

文件介绍File introduction

0Install.sh：软件和数据库安装Software and database installation
1Pipeline.sh：分析流程Analysis pipeline
2StatPlot.sh：统计和可视化Statistics and visualization

Shell脚本(.sh)兼容Markdown格式，可使用有道云笔记、VSCode等工具中Markdown格式查看，有目录导航更方便浏览和阅读。

The Shell scripts are compatible with the Markdown format, which can be viewed in the Markdown editor, such as YoudaoCloudNotes, VSCode, and the menu navigation is more convenient for browsing and reading.

各文档附录部分为常见问题，供参考。

The appendices of each document are frequently asked questions for reference.

使用方法Instructions

在64位版本Linux系统，如Ubuntu 20.04+ / CentOS 7.7+，按代码文件0，1，2顺序逐个运行

In 64-bit version Linux system, such as Ubuntu 20.04+ / CentOS 7.7+, run step by step according to scripts 0, 1, 2.

在终端命令行，或RStudio的Terminal环境下使用，可以有道云笔记中显示代码目录，方便预览大纲

Used in the terminal command line or RStudio's terminal environment. You can display the code menu in Youdaoyun Notes, which is convenient for previewing the outline

Citation (引文)

If used this script, please cited

使用此脚本，请引用下文：

Yong-Xin Liu, Lei Chen, Tengfei Ma, Xiaofang Li, Maosheng Zheng, Xin Zhou, Liang Chen, Xubo Qian, Jiao Xi, Hongye Lu, Huiluo Cao, Xiaoya Ma, Bian Bian, Pengfan Zhang, Jiqiu Wu, Ren-You Gan, Baolei Jia, Linyang Sun, Zhicheng Ju, Yunyun Gao, Tao Wen, Tong Chen. 2023. EasyAmplicon: An easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research. iMeta 2: e83. https://doi.org/10.1002/imt2.83

Yong-Xin Liu, Yuan Qin, Tong Chen, Meiping Lu, Xubo Qian, Xiaoxuan Guo, Yang Bai. 2021. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein & Cell 12: 315-330. https://doi.org/10.1007/s13238-020-00724-8 (Highly Cited)

easymetagenome's People

Contributors

Stargazers

Watchers

Forkers

zhangzl96 xuhuyang ericqli liupfskygre songzhang-master chuym726 chenyang666892 jianghexiliu haotengyan shunone wocer2019 joshualiuxu watsonwoo hbtang25 dayueban liaochenlanruo jameyzhu vileu jsk-cpp mengdy0217 superboy666 titanium1024 maqianyao herokoking zhouhui0916 aningvi lijianweicode solocell jinbinchan hongbinlang liushiling jadeplus baishengjun rememberwhen25 hnnd xphab hliujing elaine-fan wangerhua1005 luomei308 akeiredell lj365146534 linguopeng panzhuoddv linwin1995 eppendora skythunder-github ichronostasis ruijingfang helianthuszhu liu5796796 yujijun ewanxiong cardiffle 25280841 liuxiaomin826 anyihu bossning lwwal78 kaixuncao foreveryoungzoe zhaoze2020 sukecosine ijustwanthaveaname zhang-ek wangxuan0812 limingxiang edgarselfcontribution mikesqc chauncy-fang tong-chen y-antian gitcjz 99qaz weizeiwei wangmin981104 sunqiangzai lly10086 bingli2019 chengzhuangchen cmuzhang-99 zhangxiaodong8315 wbw1111 wangyueyis huiyu123 terencedong lianmsu canfeng-hua phil1134yhb chaoli-microbio 631543791 zr0719 xin8you zhixuanyan amazingzhoujl mssusai niicaii pengbingming jinhuili-lab biostatyu

easymetagenome's Issues

megahit使用问题

老师您好！
您的代码中吗，使用megahit组装的时候，都是基于所有样本进行拼接的，但是若实操的时候，面临大样本数据，计算机内存不足以进行混合组装，只能使用单样本组装后，如何进行后续去冗余，基因定量，功能注释等步骤呢。是单样本组装后，将所有样本分别组装后的contigs，cat汇总成一个大contig文件,在进行后续去冗余等步骤吗？希望老师能解答我的问题，十分感谢！

Script目录在哪

刘老师，流程中提到了Script目录，但是在项目文件里面我没找到，我应该如何获取？这个Script目录是保密的？我应该如何获取？谢谢！

Bracken估计丰度，不同样本的结果文件行数不同

按照 1.Pipeline 里的代码用Bracken重新估算 C1和C2的丰度，但是汇总结果文件时发现，结果文件的行数并不相同。但是Kraken2的结果行数是相同的，按照1.Pipeline里的代码可以直接进行合并。下面是bracken的参数-r 100 -l S。
bracken -d ${kraken2_db} -i $work_dir/02_Kraken2/${Sample}.report
-r ${len} -l ${tax} -o $work_dir/02_Kraken2/${Sample}.bracken
-w $work_dir/02_Kraken2/${Sample}.bracken.report
请问造成这种结果的原因是什么？这样的结果是否有问题？

humann3多线程求助

您好，在运行单样本使用humann3时可以正常运行，使用rush多任务时报错，请问如何解决

如果服务器性能好，请设置--threads值为8/16/32

tail -n+2 result/metadata.txt | cut -f1 | rush -j 2
"humann --input temp/concat/{1}.fq
--output temp/humann3/ --threads 3 --metaphlan-options '--bowtie2db /db/metaphlan4 --index mpa_vOct22_CHOCOPhlAnSGB_202212 --offline'"
报错如下：
(humann3) xumingyuan@cn5:meta$tail -n+2 result/metadata.txt|cut -f1|rush -j 2 \

"humann --input temp/concat/${i}.fq --output temp/humann3/ --threads 32 -- metaphlan-options '--bowtie2db /home/xumingyuan/data_HD/database/metaphlan4 --index mpa_vOct22_CHOCOPhlAnSGB_202212 --offline'"
/bin/bash: line 1: humann: command not found
/bin/bash: line 1: humann: command not found
[ERRO]
我修改补全了humann3的地址，tail -n+2 result/metadata.txt|cut -f1|rush -j 2
"/home/xumingyuan/data_HD/micromamba/envs/humann3/bin/humann --input temp/concat/${i}.fq --output temp/humann3/ --threads 32 -- metaphlan-options '--bowtie2db /home/xumingyuan/data_HD/database/metaphlan4 --index mpa_vOct22_CHOCOPhlAnSGB_202212 --offline'"
humann: error: unrecognized arguments: -- metaphlan-options --bowtie2db /home/xumingyuan/data_HD/database/metaphlan4 --index mpa_vOct22_CHOCOPhlAnSGB_202212 --offline
[ERRO] 。
非常感谢。

snakemake

老师好，我们能提供在集群上运行的snakemanke版本吗？

db文件的位置

作者您好！我想请问一下*EasyMicrobiome/dbcan2/CAZyDB.08062022数据库在哪里呢？我没有找到。
如果使用以下方法下载的数据库是否是一致的呢？

访问CAZy官网：[https://www.cazy.org/；](https://www.cazy.org/%EF%BC%9B)
点击页面上方的“Download”按钮，进入下载页面；
在下载页面中，找到CAZy Database一栏，点击右侧的Download链接；
进入CAZy Database下载页面后，选择与您的系统版本相对应的文件进行下载，例如Linux系统可以下载cazy_2022-08-06.tar.gz文件；
解压文件：tar zxvf cazy_2022-08-06.tar.gz；
将解压后的文件放到您需要使用的位置。

Issue regarding output of lefse-plot_features.py from LEfse

Hi,
I'm new here and found your project interesting and helpful!

But I'm kinda confused about the lefse output for the single feature produced by lefse-plot_features.py. Cuz there is no subclass(SampleID) annotation under the Fig. yield, like your own example result showed:

only group classification above the bars.

Is it because the input file doesn't include the subclass(SampleID) information? If so, then how should I insert that info into input file? by format2lefse? I failed to modify format2lefse Rscript to make it. Can you help me with this?

How to select reads to a fixed level

Hi Ph.D. Liu,

I am puzzled with the part that how to select a fixed number of reads across all samples in the metagenomic analysis. Is there any software or script that is fittable to humann2 pipeline?

Thank you!

data cannot be download

Cannot access http://210.75.224.110/ , so data cannot be download(such as http://210.75.224.110/share/Resfams-proteins.dmnd)