microgenomics / tutorials Goto Github PK

In this repo you will find demos and tutorials prepared by members of the CBIB. Feel free to use them for non-commercial activities given that you give proper credit. Have fun!

License: MIT License

Python 100.00%

tutorials's People

Contributors

Stargazers

Watchers

Forkers

enzoandree snashraf palc fw1121 wanjinchang tomrconnor abremges mostafaya pingpi357 russellj7 inambioinfoku training-resources asmmhossain nfellaby avrajit hshcao roman-g1 francescafanelli80 rpucheq jingmingxia herricjb hyinli wangdi2014 kartikafauzia pxhhappy jasonzhao0307 taojianchang golden75 parisache tauqeer9 menickname yiyanyang0728 ryotag stephenda thiduyendo marietteek rajukoorakula markk86 advbactres zagrosman ariasamin daraghhill kfwins2022 slipa17 mattoslmp

tutorials's Issues

Interpretation of roary output

Hello All,
I would be grateful if you can interpret the attached image which is an output of your roary tutorial.

Especially with regards to the number of genomes and the corresponding genes. If lets say 2 genomes, which of the genomes does it refer to ? Do I assume it is an average ?

Pangenome sequence analysis

Hello,

I've started working with the DBGenerator.py script a few days ago. It's a great help for me, and I got it to work with Python 2.7 and with genomes which still have their version numbers attached (e.g. NC_008253.1). It worked well on my example data, but not on the whole dataset.

The problem are apparently duplicated genes in the gene_presence_absence.csv table. In that table, a sample can have multiple gene IDs for one gene, separated by tabs. In the genomas_locus.csv, I then get multiple entries as well, like this:
NZ_CP027766.1|['NZ_CP027766.1_00163', 'NZ_CP027766.1_00164']
NZ_CP027766.1|['NZ_CP027766.1_00163', 'NZ_CP027766.1_00164']

I am not sure how to proceed with this. Did this happen in your analysis as well? You have the

else:
    print(locus)
    raise

lines in the get_locus_sequence() function, so maybe you were looking at this already?

Thanks!
@LilithElina

Possible modifications or aids for the user

Hi.
I really appreciate the tutorial. I did it and appreciate some modifications or missing data, which would improve it. I hope its not obvious information~

In the part of " Determining the Pangenome" and "roary_plots" you show core alignment tree but the program only provide the accessory (at least of me) and you need to executed "FastTree –nt –gtr core_gene_alignment.aln > core_alignment.newick" to obtained and plot it.
Other, in the part of " Pangenome sequence analysis" the name of python script is DBGenerator.py and the command line above say ...GeneratorDB.py.. this provide a mistake to execute the script and maybe some people do not notice it.

Thanks for all.

About sqlite3

hi:
i'am sorry to bother you,i have a trouble in database,i have created a database,but i can't select the result.

Pangenome sequence analysis using the python script

Have anyone ever had a problem with the python script, which takes the sequences from the .ffn files from Prokka into the analysis results from Roary?
Ive used the script from: https://github.com/EnzoAndree/tutorials/blob/patch-1/DBGenerator.py
The three first parts of the script works fine and makes three .csv files. But the "get_locus_sequence" does not, and produces a empty table. Ive have tried with different number of sequences and with the demo set without any luck.
Anyone knows anything?

Disk quota exceeded when running Roary

Hi,

I was following this tutorial and I met problems when running roary on .gff produced with Prokka.
Here's the message:

Warning: unable to close filehandle $bed_fh properly: Disk quota exceeded at /home/nickolas/anaconda3/lib/site_perl/5.26.2/Bio/Roary/BedFromGFFRole.pm line 41.
sh: line 1: 22825 Aborted

The demo directory is created, but every file is empty.
I installed all the dependencies and I looked at my disk quota with the quota command-line.

Can you help me? Thanks in advance,

Nicolas

unable to run DBGenerator

nameError : lista not defined

DBGenerator.py

Hi there,

I'm following your more than useful pipeline to anyalise 43 bacterial samples.
But when running the DBGenerator.py script, I get the four output with info only for just 2 samples up to 43!
I have the script, the gene_presence_absence.csv fine and the ffn folder containing all the .ffn files from Prokka in the same folder.
This is the command with the output:

python3 DBGenerator.py ffn
Starting get_genomas_locus
Starting get_pangenoma
Starting get_pangenoma_locus
END get_pangenoma
Process Process-1:
Traceback (most recent call last):
File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "DBGenerator.py", line 24, in get_genomas_locus
for locus in loci: # separalos!
NameError: name 'loci' is not defined
Starting get_locus_sequence
END get_pangenoma_locus
END get_locus_sequence

(I'm on a server running CentOS 7).

Suggestions?

Edit.
Solved thanks to the other closed issues.
Please update the main code in the front page: https://github.com/microgenomics/tutorials/blob/master/DBGenerator.py with the one in the zip file here: #2 (comment)

microgenomics / tutorials Goto Github PK

tutorials's People

Contributors

Stargazers

Watchers

Forkers

tutorials's Issues

Interpretation of roary output

Pangenome sequence analysis

Possible modifications or aids for the user

About sqlite3

Pangenome sequence analysis using the python script

Disk quota exceeded when running Roary

unable to run DBGenerator

DBGenerator.py

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent