Coder Social home page Coder Social logo

peakabro's People

Contributors

stanstrup avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

jorainer

peakabro's Issues

Define a use case

Would be nice if you could define a simple common use case, i.e. define a list of compounds that you would like to annotate and provide the code how this could/can be done.

This would also showcase your tidyverse approach, so I could check if an S4 approach could be implemented (in parallel?).

Where to distribute the package?

@stanstrup, I was wondering where you plan to distribute/add the package? Do you want to add it to CRAN or Bioconductor (I would not host exclusively on github)?.
I personally would prefer Bioconductor, because of the regular release cycles, possibility to add annotation data and because other related packages are already in Bioc (e.g. xcms ChemmineR).

I would like to see PeakABro added to Bioconductor, with the package not containing compound data, but only the code to generate and use these. The compound data (being versioned!) could ideally be added to AnnotationHub.

Note that if it's going to be Bioconductor it has to fulfill some criteria (http://www.bioconductor.org/developers/package-guidelines/) and also conform (more or less) their coding style (http://bioconductor.org/developers/how-to/coding-style/ - specifically, make sure that the lines in the R sources, including comments and documentations, are not longer than 80 character).

Mark hits that have multiple hits

Note when a compound have been annotated to several features.
Useful to know if a compound have been annotated to other features too.

License issues

  1. Which databases can I include data from?
  2. If there are ones I cannot they will need to be download and table generated by the user. Is there such a thing as "in-package cache"?
  3. Which license can the package have if it includes db data?
  4. Is license a concern at all? As far as I know data cannot be copyrighted so is there any concern at all?

The info I extract is: id, name, inchi, formula, and mass..

For the moment I force-removed the files until this is settled.

Databases to convert to tables

Functions added to package:

  • LipidMaps
  • LipidBlast
  • HMDB
  • MyCompoundDB
  • PhenolExplorer
  • PubChem. Too big? Not really useful?

License situation clearified

  • LipidMaps
  • LipidBlast - Confirmed CC BY. So OK with attribution.
  • HMDB
  • MyCompoundDB
  • PhenolExplorer
  • PubChem. Too big? Not really useful?

Please suggest.

package design suggestion

@stanstrup, very nice work indeed! Always wanted to have such a packages and also started implementing something (https://github.com/jotsetung/xcmsExtensions), but never too serious.

My suggestion(s):

Keep functionality separate from the data:
Have dedicated data packages. This allows to have data packages from different sources or from different versions. See e.g. ensembldb and the EnsDb.Hsapiens.v75 package, or GenomicFeatures and the separate TxDb packages.

Define a CompoundDb class with main methods to query and access the database. E.g. have a method compound that retrieves compounds from the database and supports multiple filters. I know that's a little different setup (Bioconductor's rich, S4-based) than the tidyverse one, still, I think one could combine both worlds.

One could then define e.g. a HMDB class that simply extends the CompoundDb to accommodate HMDB-specific fields and attributes.

What might also be interesting is to implement the filters (or create filters that extend) AnnotationFilter (e.g. MzFilter or MassFilter). The MzFilter would have to calculate the mass for the provided mz. Ideal would be to have a MassFilter that takes a MzFilter as input, calculated the theoretical mass for the mz and returns a MassFilter.

The big advantages of having this setup would be:

  • Versioned data packages.
  • Data packages could be added to AnnotationHub.
  • Using the same interface (methods and filters) for different data resources
    simplifies the use for the user (e.g. use the same method to retrieve
  • Integration into Bioconductor. Common concept for annotation resources.
  • You don't have to worry about licensing of the data resource in the main
    package. Each data package could/should have its own version. For those that
    don't allow sharing the data you could just provide the functionality to
    create the resource in the package and let it to the user to create the
    package for themselfs (if they have the license to do so).

I would be happy to contribute here (especially related to the database class,
interface methods and filters as I did all this already in ensembldb).

open for discussion

View button only works once

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.