uptake / cran-server Goto Github PK
View Code? Open in Web Editor NEWSelf-hosted R package repository
License: BSD 3-Clause "New" or "Revised" License
Self-hosted R package repository
License: BSD 3-Clause "New" or "Revised" License
Unit tests should run first, then integration.
You shouldn't have to reach into lib
to get classes. All the relevant public ones should be exported.
This project should be documented on RTD!
This involves:
.readthedocs.yml
, e.g. https://github.com/jameslamb/doppel-cli/blob/master/.readthedocs.ymlFileStorage
class currently assumes the existence of a ./src/contrib
folder. It should create it if the folder doesn't exist on init.
@ngparas what do you think about swapping out Vue for straight Jinja2 templates? We could probably achieve the same effect.
I don't think we need to inherit from the conda image.
See this line
https://github.com/UptakeOpenSource/cran-server/blob/999b7b8ff5d5b0418f8feccf76711d11912a8541/Dockerfile#L1
Add pycodestyle
to the CI setup for this project to prevent PRs from introducing style issues and to document the preferred style for this project.
See this example PR for an example of how to add this type of check to your build. Basically you need to run pycodestyle
command on the repo and can optionally configure what it checks with a tox.ini
file.
Docs here: http://pycodestyle.pycqa.org/en/latest/intro.html
Consider adding a /delete route to the server. Currently the only way to remove packages is to manually edit the PACKAGES file and manually remove the artifact from s3/the file system.
Add support for authentication and roles. As a added bonus, it would be nice if we could support OAuth if possible.
Issue #9 is dependent on this.
@ntdef @ngparas I think it would be valuable to create one or more "milestones" here and attach open issues to them. This has been really helpful for @jayqi and @bburns632 and I on pkgnet.
Because the fileobj isn't copied into a buffer local to the Package instance we require resets outside of the class e.g. https://github.com/UptakeOpenSource/cran-server/blob/master/cranserver/server.py#L88
The constructor should copy the fileobj into a new BytesIO
There're on R-related tags for this project
Boto3 seems rather large when all we need to do is touch S3. I've been looking at tinyS3 which seems pretty neat.
The default configuration (local filesystem) used in the quickstart can have inconsistent behavior when installing packages from R.
Following the instructions in the quickstart, I uploaded httr
version 1.3.1
to test installs. I can start an R session and run
> install.packages('httr', repos = c('http://localhost:8080'))
and it will either succeed or fail out after not finding the bin/macosx PACKAGES file seemingly randomly.
A few other projects allow you the ability to specify a subset of CRAN to mirror. Might be a desirable feature.
Support other cloud storage besides aws s3
Need to add a LICENSE file to the root level.
Tests right now are running at the package root and creating a /src/contrib
directory there.
I hooked up our gh-pages
to look at the docs/
folder on master
. My suspicions were confirmed...markdowns don't get automagically rendered there.
https://uptakeopensource.github.io/cran-server/why_cran_server.md opens as a download link :/
I think we need to render the HTML equivalent of that file and check it into the repo in the docs/
folder. Thoughts?
Test the S3 storage backed against S3.
Right now this project only supports packages stored under /src/
. To offer a full featured CRAN repo we need to support binary packages.
I don't have a Windows environment to test with but we should procure one.
It would go along way to add a benefits / why this is awesome section in the readme.
Some questions might be:
A flashy example would add value too.
It would be great to get the UI stuff working with a build tool like Webpack
Can you add a note in the README that makes it explicit that there's a UI bundled with this? Right now if I just read the README I'd think this just runs the CRAN service.
Can you add something like "navigate to <host>:8080
in your browser to see the UI..."
Consider supporting a sqlite backend instead of writing to the PACKAGES text file directly
Need to add requests
to this list.
See docs here.
Locking logic is currently implemented in server.py. We should consider pushing this logic into the storage instead of the server itself. For example, suppose we used something like sqlite or redis to handle the package metadata instead of writing to the PACKAGES file directly. We would prefer those handle locking over doing it ourselves, and it would allow multiple instances of the server container to run concurrently.
This issue addresses the TODO left in the code here:
https://github.com/UptakeOpenSource/cran-server/blob/master/cran-server/server.py#L102
Need to add some more description before jumping right into the Quick Start section.
Add build status badge to README.
This repo needs some kind of CI. I'm not sure the best way to test it, but we should at least set up a skeleton with Travis to test that the service can be run and basic functionality works.
Will make it easier to welcome new contributors to help out!
The code is modular enough to support other backends, but to swap out a backend you would actually have to modify the codebase as-is. Ideally, you should be able to point to point to a Python class at startup and use that as the storage backend.
It's probably worthwhile to look into how Flask/Gunicorn does this when looking for the app
object.
Consider support for installing old versions of packages with devtools: https://github.com/r-lib/devtools/blob/master/R/install-version.r#L45
Currently the only way to do it is to go to the path directly,
install.packages("http://cran.mycompany.com/src/contrib/mypackage_0.3.2.tar.gz", repos = NULL)
Right now some files laying around in the repo will be copied in when building the container.
For background see Do Not Ignore .dockerignore
It would be great if cran-server supported something like MRAN snapshots https://mran.microsoft.com/documents/rro/reproducibility
There are some environment variables in server.py
that aren't properly documented that affect deployment.
server.py relies on the R CMD build tarball naming convention when receiving uploads and doesn't do much in the way of validation. We should at least check that uploads conform to this convention
First tests should target the file system storage backend.
The library should be properly documented.
Per TODO removed in #71 .
The __iter__
method on FileStorage
should exclude PACKAGES
when listing the main directory.
Dockerfile installs are happening separate from the setup.py.
I think it would be awesome to have a vignette describing how to use this on AWS's Elastic Container Service. Ideally this would cover:
Thoughts?
Need to get a code coverage badge on here.
Needs some tests that go beyond unit tests.
Consider recording metrics on package downloads
We need some good library api docs.
Apparently install.packages
support multicore processing. We should test that cran-server
doesn't break when a client uses the option.
Looks like the AWS S3 version was not updated to meet the new storage API.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.