Coder Social home page Coder Social logo

cs3org / reva Goto Github PK

View Code? Open in Web Editor NEW
159.0 15.0 112.0 65.24 MB

WebDAV/gRPC/HTTP high performance server to link high level clients to storage backends

Home Page: https://reva.link

License: Apache License 2.0

Go 98.81% Makefile 0.16% PHP 0.04% Gherkin 0.07% Shell 0.89% Dockerfile 0.02%
golang storage sync interoperability-platform cloud cloud-storage synchronization share application opensource

reva's Introduction

License GoDoc Gitter chat Build Status Go Report Card FOSSA Status

Reva Logo

Reva is an interoperability platform consisting of several daemons written in Go. It acts as bridge between high-level clients (mobile, web, desktop) and the underlying storage (CephFS, EOS, local filesytems). It exports well-known APIs, like WebDAV, to faciliate access from these devices. It also exports a high-performance gRPC API, codenamed CS3APIS, to easily integrate with other systems. Reva is meant to be a high performant and customizable HTTP and GRPC server.

Installation

Head to Documentation for documentation or download to get the latest available release.

Documentation & Support

Read the getting started guide and the other feature guides.

Contributing: Build and run it yourself

You need to have Go (version 1.21 or higher), git and make installed. Some of these commands may require sudo, depending on your system setup.

# build
$ git clone https://github.com/cs3org/reva
$ cd reva
$ make revad
$ cmd/revad/revad --version

You can also read the build from sources guide and the setup tutorial.

Contributing: Run tests

To run unit tests do: make test-go

To run GRPC integration tests do: make test-integration You can get more verbose output with ginkgo -v -r tests/integration/.

To run EOS tests you need to have an up and running Docker system: make docker-eos-full-tests

Versioning

There are currently two major versions in active development.

1.x versions

The master branch is the stable development branch. Releases from master are tagged as 1.x.x versions following semver. Use this version for standalone deployment.

2.x versions

The edge branch is used as a dependency for ownCloud's OCIS product and differs from 1.X versions. Please do not use 2.X for standalone deployments and always use them as part of the OCIS product.

Docker images

See https://hub.docker.com/r/cs3org/reva.

Plugin development

You can extend Reva without having to create PR's to this repo. To do so, you can create plugins, pease checkout the Tutorials.

License

To promote free and unrestricted adoption of CS3 APIs and the reference implementation Reva by all EFSS implementations and all platforms and application providers, both community and commercial, Open Source and Open Core, CERN released the source code repositories under Apache 2.0 license.

Further evolution of the CS3 APIs will be driven by the needs of the Educational and Research community with the goal of maximizing the portability of the applications and service extensions.

Reva is distributed under Apache 2.0 license.

Logo

Reva logo's have been designed and contributed to the project by Eamon Maguire.

reva's People

Contributors

aduffeck avatar butonic avatar c0rby avatar daniel-wwu-it avatar dependabot-preview[bot] avatar dependabot[bot] avatar ffurano avatar glpatcern avatar gmgigi96 avatar grgprarup avatar iljan avatar individual-it avatar ishank011 avatar javfg avatar kiranparajuli589 avatar labkode avatar lovisalugnegard avatar madsi1m avatar micbar avatar michielbdejong avatar mirekys avatar phil-davis avatar redblom avatar refs avatar sagargi avatar samualfageme avatar saw-jan avatar swikritit avatar vascoguita avatar wkloucek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reva's Issues

reva-comply tool

The idea is to develop a tool that will verify the compliance level of any CS3 service with the CS3 APIS.

Think of litmus for webdav.

This tool is needed to certify future CS3-based services deployments to ensure interoperability between entities and implementations.

The tool will also help to identify bugs and missing features on implementations of CS3 services.

Improve configuration

I had a lengthy discussion with @felixboehm about the toml config file ... while I like the declarative nature of it Felix would like to see some form of dynamic configuration, so an admin can do

wget https://.../reva
reva
reva service enable ocdavsvc
reva service enable authsvc
reva service enable storageprovider --driver local

to get a working instance.

To me that is sugar on top. AFAICT the cli still needs to change the config and kick the reva process to reload the changes (or we add monitoring, but the less magic the better IMO). @labkode IIRC you mentioned reva or the os being able to queue any requests while reva is restarting / rereading the config. Could you elaborate and give an example? Maybe as a PR in the docs?

Anyway, I do agree that the single config file mixes two things that should be separate:

  1. the services that should be started
  2. the configuration for each service

I think we should use a config folder that contains a config file per service. Each service should write its default config if it is not present. That way the services are responsible for defining their configuration.
Reva itself should maybe write individual config files for core.toml, log.toml http.toml and grpc.toml
Maybe the services should have a subfolder in the reva config dir. That would allow them to freely choose config file names, without being able to overwrite another services config.

When using docker containers we can then mount a config dir (passing in env vars should still be possible in order to overwrite individual settings)

Wire grpc transport logging to main logger

grpc transport error are logged to stdout in a different format:

12:41PM INF cmd/revad/svcs/grpcsvcs/interceptors/log/log.go:69 > unary code=OK end="14/Aug/2019:12:41:57 +0200" from=tcp://[::1]:38948 pid=6337 pkg=grpcserver start="14/Aug/2019:12:41:57 +0200" time_ns=151380 traceid=9a56c6f3bc72caad71ec39c8601109dd uri=/cs3.storageproviderv0alpha.StorageProviderService/ListContainer user-agent=grpc-go/1.23.0
ERROR: 2019/08/14 12:41:57 grpc: server failed to encode response:  rpc error: code = Internal desc = grpc: error while marshaling: proto: repeated field Infos has nil element

We need to write the grpclog to our logger

replace datasvc with a tus.io capable endpoint

We currently use the datasvc to upload files. It directly streams the PUT request body sent from the ocdavsvc to the storage drivers. This is the good part. The rest is far from scalable:

  • The eos driver currently writes a temp file because the eosclient uses xrdcopy to copy that into eos. It should use the Range defined PUT requests to stream.
  • If we implement chunking (no matter if that is old chunking or new chunking) ocdavsvc needs to write a temporary file, because it can only send a PUT to the datasvc.
    • For eos that leads to writing the file 3 times:
    • first in ocdavsvc when receiving individual chunks,
    • then as a temp file by the eos driver and,
    • finally, when copying it with xrdcopy to eos.
    • For owncloud / local we still need to write 2 copies:
      • first in ocdavsvc when receiving individual chunks,
      • finally, when wrriting the file with the owncloud / local driver
    • For s3
      • first in ocdavsvc when receiving individual chunks,
      • then as a temp file by the s3 driver and,
      • finally, when sending it as multipart upload to s3.
        The bottleneck is the datasvc which forces a single file to be transferred.

I propose to replace the datasvc with a tusd based implementation.

  • https://tus.io is an openprotocol specification how to do resumable uploads
  • it supports extensions and currently describes creation, expiration, checksum, termination and concatenation which allows parallel uploads. We could define a bulk or batch extension to upload multiple small files in one request.
  • tusd is the go reference implementation (MIT licensed)
    • It supports single file uploads with a single request, even though that is not yet added to the spec: tus/tus-resumable-upload-protocol#88, protocol PR [just got merged]. This works and creates a file without having to send subsequent PATCH requests:
      curl -X POST localhost:9997/tus/random-0 -v \
        -u aaliyah_abernathy:secret \
        -H "Tus-Resumable: 1.0.0" \
        -H "Content-Type: application/offset+octet-stream" \
        -H "Upload-Metadata: filename d29ybGRfZG9taW5hdGlvbl9wbGFuLnBkZg==" \
        -H "Upload-Offset: 0" \
        -H "Upload-Length: 40" \
        -d "1234567890123456789012345678901234567890"
      
    • it supports parallel uploads which can be used to pass through the owncloud chunking from ocdavsvc and s3 multipart uploads
  • the handlers can be reused and we can use our own way of routing requests.
    • we could add a handler for old PUT requests as they are handled by the datasvc
  • It can be extended with custom Filestore implementations (eg for eos, owncloud / lcocal or s3 )
  • It has a hook system that we can use to trigger work flows when a file has finished
  • It supports different locking implementations
  • Supports HTTP/2

Really a compelling protocol, IMO.

But for reva it would have some consequences:

  • clients would need to use the tus protocol to upload files if they want to directly use CS3
    • this makes tus in effect the upload protocol for CS3
    • several client implementations exist: go, js, java, .net, android, ios, pythen, php and even bash
    • this is a good thing IMO
  • we can encapsulate workflows as part of the upload process. tusd alread supports multiple ways of executing hooks, eg sending http requests or executing shell scripts...
    • This would contain the worker queue to the tusdsvc ... locking is also thought of.
  • the Upload() function of the current storage drivers has to be replaced with storage specific tusd Filestore implementations. As a result, file uploads will no longer go through the storage drivers, unless we make them aware to new files. Which is something we need to do anyway to pick up file changes that bypass reva, eg via ssh.

Open questions:

  • How do we get the CS3 fileid/reference after a file has finished uploading? We could use a hook to generate one ... or it can be done with the storage specific Filestore ...
  • How can we get progress information? It does support HEAD requests on the upload resource ... we could expos workflow progress? maybe describe a workflow or progress extension?
  • will cross storage moves be affected? yes, they will use the tus protocol as well ... parrallel transfer would become possible...
  • do we need a throttling extension? maybe?
  • can we implement fetching of chunks while the file has not yet finished uploading? Maybe. The tusd service supports GET requests, but I don't know if it allows them for unfinished uploads. it should be fossible if we do some bookkeeping of which bytes have been uploaded. we can prevent chunks with maybe a get-after-workflow extension that encrypts the stream before sending it to the server. the uploading client sends the decryption key so the server can execute the workflow and release the key to clients that have downloaded the encrypted file. they now only need to decrypt it. ... well ... only decreases latency anyway ... only works if the workflow does not change the file ... which is ok I guess ...
  • can we add zsync support to this? Yes, AFAICT.
  • What about encryption? e2e will pass through. server side / at rest should be handled by the storage drivers?

Overall, I think this will clarify the responsibilities and makes a LOT of sense because it takes away many of the decisions we still need to make.

Can you elaborate on out of bound file transfer?

I am struggling to rebase our changes on top of the review branch. In the review branch you are planning to move the file up and download out of the cs3 APIs. Can you elaborate on how you plan to do the actual file transfer?

We will need to send the file stream from the ocdavsvc service to the actual storage provider. Do you want to open another htt2 connection for that? or use the existing one to multipex binary chunks over it?

AFAIR we will always have the ocdavsvc or another gateway component in front of the actual storage provider ... so what is your vision on this?

Refactor: make wider usage of `NewErrorFromCode`

The recently-introduced function NewErrorFromCode(code rpcpb.Code, pkgname string) error in svcs/grpcsvcs/status/status.go should be used more widely.

To be done once the code is run and we have a clearer idea of how the logs should look like - so as discussed with @labkode this is to not forget about that.

Use Drone instead of Travis

I would like to suggest to switch to https://cloud.drone.io instead of Travis CI, with Drone we got a lot more flexibility while testing and building the project, we can use any Docker container within the pipelines, we get native agents for arm and arm64 for free which could be nice to build releases automatically. The cloud offering is free for any open source project.

At ownCloud we are already using Drone for a long time.

ocssvc: better names for properties: ocs.go

Perhaps some of the names could be better and self-explanatory, e.g.:

type CapabilitiesDav struct {
  Chunking string `json:"chunking" xml:"chunking"`
  }
 

ChunkingVersion instead of Chunking?

Otherwise its going to be an endless guessing game in the source code to figure out what it means.
 

grpc: add uuid_manager to authsvc

Some storage providers and other services need a non-reassignable globally unique identifier for a user.

I propose to add make adding uuids to the user optional, by

  • adding a uuid_manager to the authsvc
  • adding a uuid attribute to the user.User struct

See also owncloud-archive/nexus#2 (comment) for the thought process behind this.

http: fix prometheus metrics collector

As reported by the staticcheck:

[gonzalhu@labradorbox reva]$ staticcheck ./...
cmd/revad/httpserver/httpserver.go:208:25: prometheus.InstrumentHandler is deprecated: InstrumentHandler has several issues. Use the tooling provided in package promhttp instead. The issues are the following: (1) It uses Summaries rather than Histograms. Summaries are not useful if aggregation across multiple instances is required. (2) It uses microseconds as unit, which is deprecated and should be replaced by seconds. (3) The size of the request is calculated in a separate goroutine. Since this calculator requires access to the request header, it creates a race with any writes to the header performed during request handling.  httputil.ReverseProxy is a prominent example for a handler performing such writes. (4) It has additional issues with HTTP/2, cf. https://github.com/prometheus/client_golang/issues/272.  (SA1019)

Config generation: driver "local" not found

reva gen config by default writes:

[grpc.services.usershareprovidersvc]
driver = "local"

local driver does not exist, producing an erroneous configuration. Specifying the flag dd to one of the suggested options: local || owncloud does not work either. We'd need to set it to memory as it is the only usershare manager currently available.

ocdav: correctly marshal to XML and JSON

From @refs:

LGTM. Needs to be followed up with a proper fix for this, that is the OCS api JSON and XML encoding diferr.

https://cloud.owncloud.com/ocs/v1.php/cloud/capabilities?format=json

...
"user": {
"send_mail": true
},
"resharing": true,
"group_sharing": true,
"auto_accept_share": true,
"share_with_group_members_only": true,
"share_with_membership_groups_only": true,
"can_share": true,
...
https://cloud.owncloud.com/ocs/v1.php/cloud/capabilities?format=xml

...

<send_mail>1</send_mail>

1
<group_sharing>1</group_sharing>
<auto_accept_share>1</auto_accept_share>
<share_with_group_members_only>1</share_with_group_members_only>
<share_with_membership_groups_only>1</share_with_membership_groups_only>
<can_share>1</can_share>
...

open questions after reading CONTRIBUTING.md

Hi!

I read the CONTRIBUTING.md and stumbled across two things:

  1. revad do not have the option -v as mentioned
    The flag changed a few month ago.
    I submitted a pull request for that -> #184
  2. You mention a CHANGELOG.md at Point 7 in the section "Providing Patches"
    The link seems dead and I wasn't able to find this file.

Felix

Scheduler /queue and worker

Discussing chunked upload with @labkode we iterated over the concept of a scheduler to handle asynchronous tasks.

When the last cunk of a file has ben added an assemble task is added to a queue. A worker willl take the task, assemble the file and produce the final metadata in the storage.

We can use the queue to implement metadata (etag and mtime) propagation fkr storages that do not support it natively (all but eos and another owncloud mounted via webdav).

But we need to clarify how to deal with worker crashes. One possibility is to not only push an event/task to the queue but also create an event/task file on disk. The worker can modify the event and even move them to an activity log. The activity log is tightly related. We want to be able to show the list of activities for a user. We could use these events on disk for persistency. Any crashed worker can just be restarted and his task can be readded.

in any case we want to keep the queue implementation simple. It will crash. It is more important to be able to aggregate events and be resilient.

http: ocdavsvc - add If-Not-Match header on PUT

If-Match is implemented - https://github.com/cernbox/reva/blob/a6be57bfedc01eddf38148bb7015c2fb8a7a0058/services/httpsvc/ocdavsvc/put.go#L151

To be capable to ensure that overwrite upon creating a new file is not possible the If-Not-Match header with the special value '*' should be implemented.

How does it work:
the client sends If-Not-Match: '*' with a request which shall create a new file. The server checks if the file does not exist otherwise 412 is returned.

Questions about REVA

The goal is to scale development

  • onboard new developers (golang is young, few good people available)
  • make learning reva easier

What is reva?

  • Reference implementation for CS3 API, or
  • A framework to implement CS3 services?

How to do that?

Further benefits

  • battle hardened frameworks that are used in production
  • avoid not invented here syndrome
  • stop wasting time writing existing functionality
  • services can progress independently
  • keep dependencies minimal per services
  • reva as the core framework can keep dependencies minimal

Drawbacks

  • CS3 api changes force an update of all services
    • protobuf is versioned
    • when reaching v1 we need to move away from monorepo anyway
  • quality of dependencies might be sub par
    • umm there are no tests in reva, yet

Alternatives

  • reuse reva services in separate repos
    • http services are instantiated witgh New, http.Handler interface can be used with any mux
    • grpc services are registered with a grpc.Server in New() as well

status.php fails with 401 because of auth service

When I try to hookup the desktop client, it requests status.php and it gets a 401 back.

reva        | 12:49PM INF cmd/revad/svcs/httpsvcs/handlers/auth/auth.go:93 > core access token not set pid=1 pkg=auth trace=fff307f3-af2b-44dc-a145-0992d7239bab
reva        | fff307f3-af2b-44dc-a145-0992d7239bab
caddy       | 172.20.0.5 - - [06/Mar/2019:12:49:31 +0000] "GET /.well-known/openid-configuration HTTP/1.1" 200 1095
reva        | 12:49PM ERR cmd/revad/svcs/grpcsvcs/authsvc/authsvc.go:129 > authsvc: error authenticating user: could not verify bearer token: oidc: malformed jwt: square/go-jose: compact JWS format must have three parts pid=1 pkg=authsvc trace=fff307f3-af2b-44dc-a145-0992d7239bab
reva        | 12:49PM INF cmd/revad/svcs/grpcsvcs/interceptors/log/log.go:69 > GRPC unary call code=OK end="06/Mar/2019:12:49:31 +0000" from=tcp://127.0.0.1:50468 pid=1 pkg=grpc-interceptor-log start="06/Mar/2019:12:49:31 +0000" time_ns=8627224 trace=fff307f3-af2b-44dc-a145-0992d7239bab uri=/cs3.authv0alpha.AuthService/GenerateAccessToken user-agent=grpc-go/1.18.0
reva        | 12:49PM ERR cmd/revad/svcs/httpsvcs/handlers/auth/auth.go:118 > code=9 pid=1 pkg=auth trace=fff307f3-af2b-44dc-a145-0992d7239bab
reva        | 12:49PM ERR cmd/revad/svcs/httpsvcs/handlers/log/log.go:108 > HTTP call end="06/Mar/2019:12:49:31 +0000" host=172.20.0.3 method=GET pid=1 pkg=log proto=HTTP/1.1 size=0 start="06/Mar/2019:12:49:31 +0000" status=401 time_ns=11598101 trace=fff307f3-af2b-44dc-a145-0992d7239bab uri=/status.php
caddy       | 172.20.0.1 - - [06/Mar/2019:12:49:31 +0000] "GET /status.php HTTP/2.0" 401 0

Fix panic when disabling http server

6:21PM WRN grace/grace.go:172 > error reading pidfile pid=24835 pkg=grace
6:21PM INF grace/grace.go:181 > pidfile written to gateway.pid pid=24835 pkg=grace
6:21PM INF main.go:88 > running on 4 cpus pid=24835
6:21PM INF grpcserver/grpcserver.go:239 > chainning grpc unary interceptor log with priority 200 pid=24835 pkg=grpcserver
6:21PM INF grpcserver/grpcserver.go:268 > chainning grpc streaming interceptor log with priority 200 pid=24835 pkg=grpcserver
panic: runtime error: index out of range

goroutine 14 [running]:
main.main.func2(0xc000244a90, 0x1, 0x1, 0xc000244ac0, 0x1, 0x1, 0xc00008cc00, 0xc000300240)
        /home/gonzalhu/Development/reva/cmd/revad/main.go:126 +0x14d
created by main.main
        /home/gonzalhu/Development/reva/cmd/revad/main.go:125 +0x3dc

storage: provide search API

AFAICT the cs3 apis do not talk about search.
For now we could implement searching in file metadata and content based on elasticsearch, however I think we need a search api in cs3 as well because the storage itself might have search capabilities. At least optionally.

@labkode what are your thoughts on search?

also see https://github.com/owncloud/nexus/issues/17

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.