Coder Social home page Coder Social logo

eltebioinformatics / gmt_files_for_mulea Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 982.88 MB

GMT files for the mulea R package

Home Page: https://www.biorxiv.org/content/10.1101/2024.02.28.582444v1

Python 76.81% R 3.41% Shell 19.78%
gene-set-enrichment gene-sets ontologies

gmt_files_for_mulea's Issues

Transcription_factor_ATRM_Arabidopsis_thaliana_EnsemblID.gmt 2

There is a GMT file called "Transcription_factor_ATRM_Arabidopsis_thaliana_EnsemblID.gmt 2" besides the ""
The content of the 2 files differs*
Please delete one of them and make the extension to GMT.

diff Transcription_factor_ATRM_Arabidopsis_thaliana_EnsemblID.gmt\ 2 Transcription_factor_ATRM_Arabidopsis_thaliana_EnsemblID.gmt

57c57
< ANAC012 ANAC012 AT1G16490 AT1G17950 AT1G62990 AT1G66230 AT1G73410 AT1G79180 AT4G12350 AT5G12870 AT5G16600 AT5G56110

ANAC012 ANAC012 AT1G16490 AT1G17950 AT1G62990 AT1G66230 AT1G73410 AT1G79180 AT4G12350 AT4G22680 AT4G33450 AT5G12870 AT5G16600 AT5G56110
62,63c62,63
< BZIP60 BZIP60 AT1G09080 AT2G31955 AT5G28540 AT5G42020
< ABF2 ABF2 AT1G77120 AT2G18050 AT5G20830 AT5G52300 AT5G52310


BZIP60 BZIP60 AT1G09080 AT2G31955 AT5G20990 AT5G28540 AT5G42020
ABF2 ABF2 AT1G77120 AT5G20830 AT5G52300 AT5G52310
108c108
< AMS AMS AT1G13140 AT1G59740 AT1G66850 AT1G67990 AT1G73220 AT1G75790 AT1G75920 AT3G13220AT3G28740 AT3G51590 AT4G00040 AT5G17050 AT5G49070


AMS AMS AT1G13140 AT1G59740 AT1G66850 AT1G67990 AT1G73220 AT1G75920 AT3G13220 AT3G28740AT3G51590 AT4G00040 AT5G17050 AT5G49070
148c148
< EMB2301 EMB2301 AT1G16490 AT1G62990 AT1G79180 AT5G12870 AT5G56110


EMB2301 EMB2301 AT1G16490 AT1G62990 AT1G79180 AT4G33450 AT5G12870 AT5G56110
267c267
< DREB2A DREB2A AT1G01470 AT1G52690 AT2G41190 AT2G42540 AT3G12580 AT3G17520 AT3G50970 AT4G33720 AT5G52300 AT5G52310


DREB2A DREB2A AT1G01470 AT1G52690 AT2G41190 AT2G42540 AT3G12580 AT3G17520 AT3G50970 AT5G52300 AT5G52310
273c273
< WRKY26 WRKY26 AT1G63650 AT5G60890


WRKY26 WRKY26 AT1G63650 AT3G55730 AT5G60890
282c282
< MYB46 MYB46 AT1G16490 AT1G62990 AT1G79180


MYB46 MYB46 AT1G16490 AT1G62990 AT1G79180 AT4G22680
296c296
< ARF7 ARF7 AT1G04240 AT1G19220 AT2G42430 AT2G45420 AT3G15540 AT3G20840 AT3G50340 AT3G58190AT4G14550 AT4G14560 AT4G37390 AT4G37650


ARF7 ARF7 AT1G04240 AT1G19220 AT2G42430 AT2G45420 AT3G15540 AT3G20840 AT3G58190 AT4G14550AT4G14560 AT4G37390 AT4G37650
315c315
< ERF2 ERF2 AT1G06160 AT1G72260 AT5G44420


ERF2 ERF2 AT1G06160 AT1G72260 AT3G55730 AT5G44420

GMTs with a sinlge entry

Please delete GMT files having less than 5 entries (lines). ie.

  • Pathways_SignaLink_Drosophila_melanogaster_EntrezID.gmt
  • Pathways_SignaLink_Drosophila_melanogaster_UniprotID.gmt
  • Pathways_SignaLink_Drosophila_melanogaster_EnsemblID.gmt
  • miRNA_regulation_miRTarBase_Gallus_gallus_EntrezID.gmt
  • GO_CC_Saccharomyces_cerevisiae_EnsemblID.gmt

Or having entries with single genes (except for 1 o 2 entries) only. i.e.

  • Transcription_factor_TFLink_Rattus_norvegicus_SS_GeneSymbol.gmt
  • Transcription_factor_TFLink_Rattus_norvegicus_SS_EntrezID.gmt
  • Transcription_factor_TFLink_Rattus_norvegicus_SS_UniprotID.gmt
  • Transcription_factor_TFLink_Rattus_norvegicus_SS_EnsemblID.gmt
  • GO_CC_Saccharomyces_cerevisiae_entrezID.gmt

Please update the 3rd and the 4th sheet of the "mulea Supplementary Table 1" accordingly

KEGG script

@olbeimarton can you add the script to download KEGG data to the scripts_to_create_GMT_files/KEGG folder?

Folder structire

Please create 2 main folders:

  • GMT_files for the GMT folders and files
  • scripts_to_create_GMT_files for the scripts

duplicated files

muleaData/GO_MF_Caenorhabditis_elegans_EntrezID 3.rds
muleaData/GO_MF_Caenorhabditis_elegans_EntrezID.rds

Delete empty GMT files

There are empty GMT files i.e.

  • Protein_domain_PFAM_Bacteroides_thetaiotaomicron_VPI_5482_EntrezID.gmt
  • Protein_domain_PFAM_Bacteroides_thetaiotaomicron_VPI_5482_LocusID.gmt
  • Genomic_location_Ensembl_Daphnia_pulex_5genes_EntrezID.gmt
  • Genomic_location_Ensembl_Daphnia_pulex_10genes_EntrezID.gmt
  • Genomic_location_Ensembl_Daphnia_pulex_20genes_EntrezID.gmt
  • GO_BP_Saccharomyces_cerevisiae_entrezID.gmt

Please check why are these empty. Because of mapping error or missing IDs? If these IDs cannot be mapped please delete the empty GMTs. If can be mapped, please remap and check.

Please update the 4th sheet of the "mulea Supplementary Table 1" accordingly

NA-s in the header of GMT files of Salmonella_enterica_subsp_enterica_serovar_Typhimurium_str_LT2_99287

For example:

# Gene set GMT file for mulea Bioconductor R package
NA
NA
# ID_type: EntrezID
# source_url: www.ensembl.org 
# source_PMID: 34791404
# source_primary_ID: EnsemblID / LocusID 
# source_version: 109
# source_last_update: 2023
# gmt_download_date: 02-12-2022
# gmt_version: 1
# gmt_entry_names: chromosome location
# chromosome location chromosome location Genes

NA
NA

NA
NA

The first 2 NAs should be:

# taxon_name: Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
# taxonomy_ID: 99287

The further NAs should be deleted.

Bacteroides thetaiotaomicron taxon issues

The folder is called Bacteroides_thetaiotaomicron_VPI_5482_226186
while in the header of GMTs

# taxon_name: Bacteroides thetaiotaomicron
# taxonomy_ID: 818

Please rewrite the headers like this:

# taxon_name: Bacteroides thetaiotaomicron VPI-5482
# taxonomy_ID: 226186

TFLink_ALL_LS_human.zip unzipping error

In the Homo_sapiens_9606 folder the TFLink_ALL_LS_human.zip is unzippable:
the
unzip TFLink_ALL_LS_human.zip
command gives the following error message:

Archive: TFLink_ALL_LS_human.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of TFLink_ALL_LS_human.zip or
TFLink_ALL_LS_human.zip.zip, and cannot find TFLink_ALL_LS_human.zip.ZIP, period.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.