Coder Social home page Coder Social logo

ckanext-s3filestore's Introduction

ckanext-s3filestore's People

Contributors

amercader avatar brew avatar dumyan avatar goranmaxim avatar mbocevski avatar orihoch avatar tino097 avatar visar avatar zoranpandovski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ckanext-s3filestore's Issues

Error while installing using pip

I am using ckan version 2.6.0 under ubuntu 14.04

If I execute pip install ckanext-s3filestore it exit with this error.

I did the previous steps which was enable virtualenv

  File "/usr/lib/ckan/default/lib/python2.7/site.py", line 703, in <module>
    main()
  File "/usr/lib/ckan/default/lib/python2.7/site.py", line 683, in main
    paths_in_sys = addsitepackages(paths_in_sys)
  File "/usr/lib/ckan/default/lib/python2.7/site.py", line 282, in addsitepackages
    addsitedir(sitedir, known_paths)
  File "/usr/lib/ckan/default/lib/python2.7/site.py", line 204, in addsitedir
    addpackage(sitedir, name, known_paths)
  File "/usr/lib/ckan/default/lib/python2.7/site.py", line 173, in addpackage
    exec(line)
  File "<string>", line 1, in <module>
KeyError: 'ckanext'

using ListAllMyBuckets prevents tighten the policies

It looks like the plugin is using the ListAllMyBuckets.
I don't see a valid reason why it would attempt to do that, since we are providing the bucket name already in the plugin configuration.
As you know, the ListAllMyBuckets can only take a wildcard as resource argument, which means all buckets names in the account can be seen.

Any chance to fix that?

AttributeError: 'S3ResourceUploader' object has no attribute 'old_filename'

Background

I'm seeing a range of simple exceptions coming out of datahub.io (running CKAN 2.4 with this extension for file storage).

Issue

AttributeError: 'S3ResourceUploader' object has no attribute 'old_filename'
  File "raven/middleware.py", line 35, in __call__
    iterable = self.application(environ, start_response)
  File "webob/dec.py", line 147, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "webob/dec.py", line 208, in call_func
    return self.func(req, *args, **kwargs)
  File "fanstatic/publisher.py", line 234, in __call__
    return request.get_response(self.app)
  File "webob/request.py", line 1053, in get_response
    application, catch_exc_info=False)
  File "webob/request.py", line 1022, in call_application
    app_iter = application(self.environ, start_response)
  File "webob/dec.py", line 147, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "webob/dec.py", line 208, in call_func
    return self.func(req, *args, **kwargs)
  File "fanstatic/injector.py", line 54, in __call__
    response = request.get_response(self.app)
  File "webob/request.py", line 1053, in get_response
    application, catch_exc_info=False)
  File "webob/request.py", line 1022, in call_application
    app_iter = application(self.environ, start_response)
  File "ckan/config/middleware.py", line 389, in inner
    result = application(environ, start_response)
  File "beaker/middleware.py", line 73, in __call__
    return self.app(environ, start_response)
  File "beaker/middleware.py", line 155, in __call__
    return self.wrap_app(environ, session_start_response)
  File "routes/middleware.py", line 131, in __call__
    response = self.app(environ, start_response)
  File "pylons/wsgiapp.py", line 125, in __call__
    response = self.dispatch(controller, environ, start_response)
  File "pylons/wsgiapp.py", line 324, in dispatch
    return controller(environ, start_response)
  File "ckan/lib/base.py", line 338, in __call__
    res = WSGIController.__call__(self, environ, start_response)
  File "pylons/controllers/core.py", line 221, in __call__
    response = self._dispatch_call()
  File "pylons/controllers/core.py", line 172, in _dispatch_call
    response = self._inspect_call(func)
  File "pylons/controllers/core.py", line 107, in _inspect_call
    result = self._perform_call(func, args)
  File "ckan/controllers/package.py", line 600, in resource_edit
    get_action('resource_update')(context, data)
  File "ckan/logic/__init__.py", line 429, in wrapped
    result = _action(context, data_dict, **kw)
  File "ckan/logic/action/update.py", line 164, in resource_update
    upload.upload(id, uploader.get_max_resource_size())
  File "{__PATH__}/ckanext-s3filestore/ckanext/s3filestore/uploader.py", line 246, in upload
    filepath = self.get_path(id, self.old_filename)

'access denied bucket' error when linking minio

Hi I am trying to link my personal minio through that extention.
I have no place to ask, so I leave here.

I tried to test the public minio first before linking it with the private minio.

minio url : http://play.min.io/

And I made a bucket.

bucket name : leaguetest

Changed the bucket to public.

./mc policy set public play/leaguetest/

Access permission for play/leaguetest/ is set to public

[root@m01 minio]# ./mc policy set public play/leaguetest/
Access permission for `play/leaguetest/` is set to `public`
[root@m01 minio]# ./mc policy get play/leaguetest/
Access permission for `play/leaguetest/` is `public`
[root@m01 minio]#
[root@m01 minio]#
[root@m01 minio]# ./mc policy list play/leaguetest/
leaguetest/* => readwrite

The .ini file is set as follows:

ckanext.s3filestore.aws_access_key_id = Q3AM3UQ867SPQQA43P2F
ckanext.s3filestore.aws_secret_access_key = zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG
ckanext.s3filestore.aws_bucket_name = leaguetest/
ckanext.s3filestore.host_name = http://play.min.io/
ckanext.s3filestore.region_name= us-east-1
ckanext.s3filestore.signature_version = s3v4

But I got the following error.

Traceback (most recent call last):
  File "/usr/local/bin/ckan-paster", line 8, in <module>
    sys.exit(run())
  File "/usr/lib/ckan/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
    result = self.command()
  File "/usr/lib/ckan/venv/src/ckan/ckan/lib/cli.py", line 357, in command
    self._load_config(cmd!='upgrade')
  File "/usr/lib/ckan/venv/src/ckan/ckan/lib/cli.py", line 330, in _load_config
    self.site_user = load_config(self.options.config, load_site_user)
  File "/usr/lib/ckan/venv/src/ckan/ckan/lib/cli.py", line 237, in load_config
    load_environment(conf.global_conf, conf.local_conf)
  File "/usr/lib/ckan/venv/src/ckan/ckan/config/environment.py", line 112, in load_environment
    p.load_all()
  File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 140, in load_all
    load(*plugins)
  File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 168, in load
    plugins_update()
  File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 122, in plugins_update
    environment.update_config()
  File "/usr/lib/ckan/venv/src/ckan/ckan/config/environment.py", line 288, in update_config
    plugin.configure(config)
  File "/usr/lib/ckan/venv/lib/python2.7/site-packages/ckanext/s3filestore/plugin.py", line 38, in configure
    ckanext.s3filestore.uploader.BaseS3Uploader().get_s3_bucket(
  File "/usr/lib/ckan/venv/lib/python2.7/site-packages/ckanext/s3filestore/uploader.py", line 29, in __init__
    self.bucket = self.get_s3_bucket(self.bucket_name)
  File "/usr/lib/ckan/venv/lib/python2.7/site-packages/ckanext/s3filestore/uploader.py", line 59, in get_s3_bucket
    'Access to bucket {0} denied'.format(bucket_name))
ckanext.s3filestore.uploader.S3FileStoreException: Access to bucket leaguetest denied

In the minio client, you can check that the upload is as follows.

./mc cp nohup.out play/leaguetest/
nohup.out:       788 B / 788 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 6.01 KiB/s 0s

How do I put a pemission policy on my minio client?

Does not work with requests served by Flask

The file upload object must be different than the one on Pylons requests:

ckan-dev_1    |   File "/usr/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
ckan-dev_1    |     rv = self.dispatch_request()
ckan-dev_1    |   File "/usr/lib/python2.7/site-packages/flask_debugtoolbar/__init__.py", line 125, in dispatch_request
ckan-dev_1    |     return view_func(**req.view_args)
ckan-dev_1    |   File "/usr/lib/python2.7/site-packages/flask/views.py", line 84, in view
ckan-dev_1    |     return self.dispatch_request(*args, **kwargs)
ckan-dev_1    |   File "/usr/lib/python2.7/site-packages/flask/views.py", line 149, in dispatch_request
ckan-dev_1    |     return meth(*args, **kwargs)
ckan-dev_1    |   File "/srv/app/src/ckan/ckan/views/admin.py", line 124, in post
ckan-dev_1    |     }, data_dict)
ckan-dev_1    |   File "/srv/app/src/ckan/ckan/logic/__init__.py", line 464, in wrapped
ckan-dev_1    |     result = _action(context, data_dict, **kw)
ckan-dev_1    |   File "/srv/app/src_extensions/ckanext-lacounts/ckanext/lacounts/actions.py", line 16, in config_option_update
ckan-dev_1    |     'featured_image_upload', 'clear_featured_image_upload')
ckan-dev_1    |   File "/srv/app/src_extensions/ckanext-s3filestore/ckanext/s3filestore/uploader.py", line 182, in update_data_dict
ckan-dev_1    |     self.upload_file = self.upload_field_storage.file
ckan-dev_1    |   File "/usr/lib/python2.7/site-packages/werkzeug/datastructures.py", line 2745, in __getattr__
ckan-dev_1    |     return getattr(self.stream, name)
ckan-dev_1    | AttributeError: SpooledTemporaryFile instance has no attribute 'file'

As of CKAN 2.8 this affects uploads on the admin interface (eg logo). But on 2.9 it will affect all uploads.

Extension still requires ckan.storage_path to be set up and be writable

This probably needs fixing in CKAN core, but flagging here for now. Because of various direct calls to the uploader.get_storage_path() method around CKAN, you still need to provide a ckan.storage_path config option even if you want to use s3 exclusively.

Eg:

  • On middleware.py: here a storage folder is created on the main uploads dir
  • On helpers.py uploads_enabled. This is used to show the Upload button on the UI.

There might be others. It would be good to refactor those so they use get_uploader or whatever instead so they consider extensions

Add support for Cloudfront Signed URLs

Cloudfront - AWS's CDN, improves the user experience.

It can also be deployed to help control bandwidth cost and stop scrapers/bots from abusing downloads.

By using CloudFront Signed URLs, we can add code to render signed URLs that expire after a given time (e.g. every hour).

This way, users can still download files without requiring a login and have the additional benefit of low-latency access, while minimizing scraper/bot abuse.

Tests run slow with httpretty 0.8.3 installed

This extension doesn't use it directly (the mocks3 library, moto, uses it), but it seems the version of httpretty used by ckan is causing ckanext-s3filestore tests to run very very slowly (20+ mins for six tests). Downgrading to the version previously used by ckan, 0.6.2, allows the test to run faster again. Upgrading httpretty to the most recent version 0.8.12 causes other errors.

Need more config for accessing AWS S3

Hello. I am an user that is using CKAN 2.8.1 on Ubuntu 16.04.
For using S3 as a datastore for CKAN I am trying to use plugin - s3filestore.
But I found an error when accessing S3 by boto.
I tested something changed, so I could see that it's working - here is pr #21

So I want to fix it by pull request. Please let me know the result of pr as soon as possible.
Thank you!

OSError: [Errno 13] Permission denied: 'data'

Background

I'm seeing a range of simple exceptions coming out of datahub.io (running CKAN 2.4 with this extension for file storage).

Issue

OSError: [Errno 13] Permission denied: 'data'
  File "raven/middleware.py", line 35, in __call__
    iterable = self.application(environ, start_response)
  File "webob/dec.py", line 147, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "webob/dec.py", line 208, in call_func
    return self.func(req, *args, **kwargs)
  File "fanstatic/publisher.py", line 234, in __call__
    return request.get_response(self.app)
  File "webob/request.py", line 1053, in get_response
    application, catch_exc_info=False)
  File "webob/request.py", line 1022, in call_application
    app_iter = application(self.environ, start_response)
  File "webob/dec.py", line 147, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "webob/dec.py", line 208, in call_func
    return self.func(req, *args, **kwargs)
  File "fanstatic/injector.py", line 39, in __call__
    return request.get_response(self.app)
  File "webob/request.py", line 1053, in get_response
    application, catch_exc_info=False)
  File "webob/request.py", line 1022, in call_application
    app_iter = application(self.environ, start_response)
  File "ckan/config/middleware.py", line 389, in inner
    result = application(environ, start_response)
  File "beaker/middleware.py", line 73, in __call__
    return self.app(environ, start_response)
  File "beaker/middleware.py", line 155, in __call__
    return self.wrap_app(environ, session_start_response)
  File "routes/middleware.py", line 131, in __call__
    response = self.app(environ, start_response)
  File "pylons/wsgiapp.py", line 125, in __call__
    response = self.dispatch(controller, environ, start_response)
  File "pylons/wsgiapp.py", line 324, in dispatch
    return controller(environ, start_response)
  File "ckan/lib/base.py", line 338, in __call__
    res = WSGIController.__call__(self, environ, start_response)
  File "pylons/controllers/core.py", line 221, in __call__
    response = self._dispatch_call()
  File "pylons/controllers/core.py", line 172, in _dispatch_call
    response = self._inspect_call(func)
  File "pylons/controllers/core.py", line 107, in _inspect_call
    result = self._perform_call(func, args)
  File "ckan/controllers/storage.py", line 168, in file
    exists = self.ofs.exists(BUCKET, label)
  File "ckan/controllers/storage.py", line 115, in ofs
    StorageController._ofs_impl = get_ofs()
  File "ckan/controllers/storage.py", line 82, in get_ofs
    ofs = get_impl(storage_backend)(**kw)
  File "ofs/local/pairtreestore.py", line 30, in __init__
    self._open_store()
  File "ofs/local/pairtreestore.py", line 34, in _open_store
    self._store = PairtreeStorageClient(self.uri_base, self.storage_dir, shorty_length=self.shorty_length, hashing_type=self.hashing_type)
  File "pairtree/pairtree_client.py", line 91, in __init__
    self._init_store()
  File "pairtree/pairtree_client.py", line 243, in _init_store
    os.mkdir(self.store_dir)

Add Requester Pays option

Data publishers should not be penalized for success.

For data publishers that want to control their bandwidth costs, but still want to give "fast lane" access for high-volume, commercial users, S3's Requester Pays feature is a great option.

When this option is selected when publishing a file, the file will also be published in a Requester Pays bucket.

Since this is an option for advanced users, I think there's no need to go beyond exposing the file download link in the requester pays bucket, with instructions to use s3cmd CLI or a tool like s3browser.

Note this is how Cornell university is distributing bulk data from arxiv.org - https://arxiv.org/help/bulk_data_s3

Uploader does not set file size

The standard CKAN filestore uploader (ckan/lib/uploader.py:ResourceUpload:__init__) detects the size of the uploaded file and adds that to the resource metadata. The S3 uploader does not.

Possible to use own endpoint?

Hi,
As i see it, the extension is written to use Amazon S3, is it possible to use another custom endpoint? We are using HCP (Hitachi Content Platform) with S3, and i have read some documentation: http://boto.cloudhackers.com/en/latest/boto_config_tut.html that suggests that its possible to use boto with a custom endpoint with credentials.

My question is if it's possible currently, and if not if it's something that you are planning to support. If i were to implement such connection myself, is there something i should be aware of?

I know this is not an "issue" per se, so if i'm posting in the wrong forum let me know :)

Error getting resource

Resource successfully loaded in AWS. But link is broken somehow.

Here the apache2 log error:

[Wed Mar 23 15:42:39.168997 2016] [:error] [pid 22405:tid 140562148165376] 2016-03-23 15:42:39,168 INFO  [ckan.lib.base]  /dataset/bffdfabb-21e3-4e4e-ba70-416b6c668cef/resource/bca4a3ec-c90e-4e74-aa11-f5d959b9853b/download/owls.jpg render time 0.310 seconds
[Wed Mar 23 15:42:39.170157 2016] [:error] [pid 22405:tid 140562148165376] [remote 127.0.0.1:18995] mod_wsgi (pid=22405): Exception occurred processing WSGI script '/etc/ckan/default/apache.wsgi'.
[Wed Mar 23 15:42:39.170194 2016] [:error] [pid 22405:tid 140562148165376] [remote 127.0.0.1:18995] TypeError: expected byte string object for header name, value of type unicode found

ckan == 2.5.1

Problem on server only. Locally all working fine. (development mode)

error: SignatureDoesNotMatch on text preview

Reproduction steps

  • Upload a .txt file
  • Add a text preview to the resource views

expected

  • Text preview of the resource is shown

Actual

SignatureDoesNotMatch
The request signature we calculated does not match the signature you provided. Check your key and signing

Notes

  • Clicking on the resource download link works and downloads the resource, only the text preview doesn't work
  • It seems to be related to how the text preview does the request for the resource

Don't store data locally

Do you think it wood be better idea, to store data only on S3?
As far as I'm understand by now data just copied to S3 and my dataset displayed locally stored data.

I need to protect some of my datasets with S3 Encryption and I can work on PR if you like to.

Data Explorer: HTTPError: HTTP Error 401: Unauthorized

I can only imagine the Data Explorer can't access the dataset because it's stored on S3, is there any suggestion where i can get started to integrate the two? At the moment, I'm getting the following error in the Data Explorer view for TSV files:

This resource view is not available at the moment. Click here for more information.

Could not load view: DataProxy returned an error (Data transformation failed. HTTPError: HTTP Error 401: Unauthorized)

Support for IAM roles?

Can this extension rely on IAM roles, instead of requiring a secret key? We're running on AWS EC2, so in theory it should be possible to access S3 without needing any keys.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.