eltebioinformatics / gmt_files_for_mulea Goto Github PK
View Code? Open in Web Editor NEWGMT files for the mulea R package
Home Page: https://www.biorxiv.org/content/10.1101/2024.02.28.582444v1
GMT files for the mulea R package
Home Page: https://www.biorxiv.org/content/10.1101/2024.02.28.582444v1
Please change these urls in the headers to those after the arrow:
www.ensembl.org -> https://www.ensembl.org
www.yeastract.com -> http://www.yeastract.com
(the second is http and not hppts!)
because it fails the muleaData checking...
There is a GMT file called "Transcription_factor_ATRM_Arabidopsis_thaliana_EnsemblID.gmt 2" besides the ""
The content of the 2 files differs*
Please delete one of them and make the extension to GMT.
diff Transcription_factor_ATRM_Arabidopsis_thaliana_EnsemblID.gmt\ 2 Transcription_factor_ATRM_Arabidopsis_thaliana_EnsemblID.gmt
57c57
< ANAC012 ANAC012 AT1G16490 AT1G17950 AT1G62990 AT1G66230 AT1G73410 AT1G79180 AT4G12350 AT5G12870 AT5G16600 AT5G56110ANAC012 ANAC012 AT1G16490 AT1G17950 AT1G62990 AT1G66230 AT1G73410 AT1G79180 AT4G12350 AT4G22680 AT4G33450 AT5G12870 AT5G16600 AT5G56110
62,63c62,63
< BZIP60 BZIP60 AT1G09080 AT2G31955 AT5G28540 AT5G42020
< ABF2 ABF2 AT1G77120 AT2G18050 AT5G20830 AT5G52300 AT5G52310
BZIP60 BZIP60 AT1G09080 AT2G31955 AT5G20990 AT5G28540 AT5G42020
ABF2 ABF2 AT1G77120 AT5G20830 AT5G52300 AT5G52310
108c108
< AMS AMS AT1G13140 AT1G59740 AT1G66850 AT1G67990 AT1G73220 AT1G75790 AT1G75920 AT3G13220AT3G28740 AT3G51590 AT4G00040 AT5G17050 AT5G49070
AMS AMS AT1G13140 AT1G59740 AT1G66850 AT1G67990 AT1G73220 AT1G75920 AT3G13220 AT3G28740AT3G51590 AT4G00040 AT5G17050 AT5G49070
148c148
< EMB2301 EMB2301 AT1G16490 AT1G62990 AT1G79180 AT5G12870 AT5G56110
EMB2301 EMB2301 AT1G16490 AT1G62990 AT1G79180 AT4G33450 AT5G12870 AT5G56110
267c267
< DREB2A DREB2A AT1G01470 AT1G52690 AT2G41190 AT2G42540 AT3G12580 AT3G17520 AT3G50970 AT4G33720 AT5G52300 AT5G52310
DREB2A DREB2A AT1G01470 AT1G52690 AT2G41190 AT2G42540 AT3G12580 AT3G17520 AT3G50970 AT5G52300 AT5G52310
273c273
< WRKY26 WRKY26 AT1G63650 AT5G60890
WRKY26 WRKY26 AT1G63650 AT3G55730 AT5G60890
282c282
< MYB46 MYB46 AT1G16490 AT1G62990 AT1G79180
MYB46 MYB46 AT1G16490 AT1G62990 AT1G79180 AT4G22680
296c296
< ARF7 ARF7 AT1G04240 AT1G19220 AT2G42430 AT2G45420 AT3G15540 AT3G20840 AT3G50340 AT3G58190AT4G14550 AT4G14560 AT4G37390 AT4G37650
ARF7 ARF7 AT1G04240 AT1G19220 AT2G42430 AT2G45420 AT3G15540 AT3G20840 AT3G58190 AT4G14550AT4G14560 AT4G37390 AT4G37650
315c315
< ERF2 ERF2 AT1G06160 AT1G72260 AT5G44420
ERF2 ERF2 AT1G06160 AT1G72260 AT3G55730 AT5G44420
Can you delete the .DS_Store files from the GMT folders and the __MACOSX folder from the zipped folder?
Please delete GMT files having less than 5 entries (lines). ie.
Or having entries with single genes (except for 1 o 2 entries) only. i.e.
Please update the 3rd and the 4th sheet of the "mulea Supplementary Table 1" accordingly
@olbeimarton can you add the script to download KEGG data to the scripts_to_create_GMT_files/KEGG folder?
Please create 2 main folders:
There are 2 gmt_download_date rows with 2 different dates in the headers of the mirtarbase GMTs.
muleaData/GO_MF_Caenorhabditis_elegans_EntrezID 3.rds
muleaData/GO_MF_Caenorhabditis_elegans_EntrezID.rds
There are empty GMT files i.e.
Please check why are these empty. Because of mapping error or missing IDs? If these IDs cannot be mapped please delete the empty GMTs. If can be mapped, please remap and check.
Please update the 4th sheet of the "mulea Supplementary Table 1" accordingly
For example:
# Gene set GMT file for mulea Bioconductor R package
NA
NA
# ID_type: EntrezID
# source_url: www.ensembl.org
# source_PMID: 34791404
# source_primary_ID: EnsemblID / LocusID
# source_version: 109
# source_last_update: 2023
# gmt_download_date: 02-12-2022
# gmt_version: 1
# gmt_entry_names: chromosome location
# chromosome location chromosome location Genes
NA
NA
NA
NA
The first 2 NAs should be:
# taxon_name: Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
# taxonomy_ID: 99287
The further NAs should be deleted.
The folder is called Bacteroides_thetaiotaomicron_VPI_5482_226186
while in the header of GMTs
# taxon_name: Bacteroides thetaiotaomicron
# taxonomy_ID: 818
Please rewrite the headers like this:
# taxon_name: Bacteroides thetaiotaomicron VPI-5482
# taxonomy_ID: 226186
In the Homo_sapiens_9606 folder the TFLink_ALL_LS_human.zip is unzippable:
the
unzip TFLink_ALL_LS_human.zip
command gives the following error message:
Archive: TFLink_ALL_LS_human.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of TFLink_ALL_LS_human.zip or
TFLink_ALL_LS_human.zip.zip, and cannot find TFLink_ALL_LS_human.zip.ZIP, period.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.