- warning
This repository is no longer maintained.
Please use one of these forks instead:
Use Amazon S3 as a filestore for CKAN
License: GNU Affero General Public License v3.0
This repository is no longer maintained.
Please use one of these forks instead:
I am using ckan version 2.6.0 under ubuntu 14.04
If I execute pip install ckanext-s3filestore
it exit with this error.
I did the previous steps which was enable virtualenv
File "/usr/lib/ckan/default/lib/python2.7/site.py", line 703, in <module>
main()
File "/usr/lib/ckan/default/lib/python2.7/site.py", line 683, in main
paths_in_sys = addsitepackages(paths_in_sys)
File "/usr/lib/ckan/default/lib/python2.7/site.py", line 282, in addsitepackages
addsitedir(sitedir, known_paths)
File "/usr/lib/ckan/default/lib/python2.7/site.py", line 204, in addsitedir
addpackage(sitedir, name, known_paths)
File "/usr/lib/ckan/default/lib/python2.7/site.py", line 173, in addpackage
exec(line)
File "<string>", line 1, in <module>
KeyError: 'ckanext'
It looks like the plugin is using the ListAllMyBuckets.
I don't see a valid reason why it would attempt to do that, since we are providing the bucket name already in the plugin configuration.
As you know, the ListAllMyBuckets can only take a wildcard as resource argument, which means all buckets names in the account can be seen.
Any chance to fix that?
I'm seeing a range of simple exceptions coming out of datahub.io (running CKAN 2.4 with this extension for file storage).
AttributeError: 'S3ResourceUploader' object has no attribute 'old_filename'
File "raven/middleware.py", line 35, in __call__
iterable = self.application(environ, start_response)
File "webob/dec.py", line 147, in __call__
resp = self.call_func(req, *args, **self.kwargs)
File "webob/dec.py", line 208, in call_func
return self.func(req, *args, **kwargs)
File "fanstatic/publisher.py", line 234, in __call__
return request.get_response(self.app)
File "webob/request.py", line 1053, in get_response
application, catch_exc_info=False)
File "webob/request.py", line 1022, in call_application
app_iter = application(self.environ, start_response)
File "webob/dec.py", line 147, in __call__
resp = self.call_func(req, *args, **self.kwargs)
File "webob/dec.py", line 208, in call_func
return self.func(req, *args, **kwargs)
File "fanstatic/injector.py", line 54, in __call__
response = request.get_response(self.app)
File "webob/request.py", line 1053, in get_response
application, catch_exc_info=False)
File "webob/request.py", line 1022, in call_application
app_iter = application(self.environ, start_response)
File "ckan/config/middleware.py", line 389, in inner
result = application(environ, start_response)
File "beaker/middleware.py", line 73, in __call__
return self.app(environ, start_response)
File "beaker/middleware.py", line 155, in __call__
return self.wrap_app(environ, session_start_response)
File "routes/middleware.py", line 131, in __call__
response = self.app(environ, start_response)
File "pylons/wsgiapp.py", line 125, in __call__
response = self.dispatch(controller, environ, start_response)
File "pylons/wsgiapp.py", line 324, in dispatch
return controller(environ, start_response)
File "ckan/lib/base.py", line 338, in __call__
res = WSGIController.__call__(self, environ, start_response)
File "pylons/controllers/core.py", line 221, in __call__
response = self._dispatch_call()
File "pylons/controllers/core.py", line 172, in _dispatch_call
response = self._inspect_call(func)
File "pylons/controllers/core.py", line 107, in _inspect_call
result = self._perform_call(func, args)
File "ckan/controllers/package.py", line 600, in resource_edit
get_action('resource_update')(context, data)
File "ckan/logic/__init__.py", line 429, in wrapped
result = _action(context, data_dict, **kw)
File "ckan/logic/action/update.py", line 164, in resource_update
upload.upload(id, uploader.get_max_resource_size())
File "{__PATH__}/ckanext-s3filestore/ckanext/s3filestore/uploader.py", line 246, in upload
filepath = self.get_path(id, self.old_filename)
Hi I am trying to link my personal minio through that extention.
I have no place to ask, so I leave here.
I tried to test the public minio first before linking it with the private minio.
minio url : http://play.min.io/
And I made a bucket.
bucket name : leaguetest
Changed the bucket to public.
./mc policy set public play/leaguetest/
Access permission for play/leaguetest/
is set to public
[root@m01 minio]# ./mc policy set public play/leaguetest/
Access permission for `play/leaguetest/` is set to `public`
[root@m01 minio]# ./mc policy get play/leaguetest/
Access permission for `play/leaguetest/` is `public`
[root@m01 minio]#
[root@m01 minio]#
[root@m01 minio]# ./mc policy list play/leaguetest/
leaguetest/* => readwrite
The .ini file is set as follows:
ckanext.s3filestore.aws_access_key_id = Q3AM3UQ867SPQQA43P2F
ckanext.s3filestore.aws_secret_access_key = zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG
ckanext.s3filestore.aws_bucket_name = leaguetest/
ckanext.s3filestore.host_name = http://play.min.io/
ckanext.s3filestore.region_name= us-east-1
ckanext.s3filestore.signature_version = s3v4
But I got the following error.
Traceback (most recent call last):
File "/usr/local/bin/ckan-paster", line 8, in <module>
sys.exit(run())
File "/usr/lib/ckan/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
invoke(command, command_name, options, args[1:])
File "/usr/lib/ckan/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
exit_code = runner.run(args)
File "/usr/lib/ckan/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
result = self.command()
File "/usr/lib/ckan/venv/src/ckan/ckan/lib/cli.py", line 357, in command
self._load_config(cmd!='upgrade')
File "/usr/lib/ckan/venv/src/ckan/ckan/lib/cli.py", line 330, in _load_config
self.site_user = load_config(self.options.config, load_site_user)
File "/usr/lib/ckan/venv/src/ckan/ckan/lib/cli.py", line 237, in load_config
load_environment(conf.global_conf, conf.local_conf)
File "/usr/lib/ckan/venv/src/ckan/ckan/config/environment.py", line 112, in load_environment
p.load_all()
File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 140, in load_all
load(*plugins)
File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 168, in load
plugins_update()
File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 122, in plugins_update
environment.update_config()
File "/usr/lib/ckan/venv/src/ckan/ckan/config/environment.py", line 288, in update_config
plugin.configure(config)
File "/usr/lib/ckan/venv/lib/python2.7/site-packages/ckanext/s3filestore/plugin.py", line 38, in configure
ckanext.s3filestore.uploader.BaseS3Uploader().get_s3_bucket(
File "/usr/lib/ckan/venv/lib/python2.7/site-packages/ckanext/s3filestore/uploader.py", line 29, in __init__
self.bucket = self.get_s3_bucket(self.bucket_name)
File "/usr/lib/ckan/venv/lib/python2.7/site-packages/ckanext/s3filestore/uploader.py", line 59, in get_s3_bucket
'Access to bucket {0} denied'.format(bucket_name))
ckanext.s3filestore.uploader.S3FileStoreException: Access to bucket leaguetest denied
In the minio client, you can check that the upload is as follows.
./mc cp nohup.out play/leaguetest/
nohup.out: 788 B / 788 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 6.01 KiB/s 0s
How do I put a pemission policy on my minio client?
Is this a feature, or a bug? Do I need to configure the plugin in a way that datasets are automatically deleted from S3 once they're deleted from CKAN?
Commit 1a4944c, by changing from a direct file download to a redirect, broke the fallback handling that would serve a file from the filesystem (ckanext.s3filestore.filesystem_download_fallback
).
The file upload object must be different than the one on Pylons requests:
ckan-dev_1 | File "/usr/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
ckan-dev_1 | rv = self.dispatch_request()
ckan-dev_1 | File "/usr/lib/python2.7/site-packages/flask_debugtoolbar/__init__.py", line 125, in dispatch_request
ckan-dev_1 | return view_func(**req.view_args)
ckan-dev_1 | File "/usr/lib/python2.7/site-packages/flask/views.py", line 84, in view
ckan-dev_1 | return self.dispatch_request(*args, **kwargs)
ckan-dev_1 | File "/usr/lib/python2.7/site-packages/flask/views.py", line 149, in dispatch_request
ckan-dev_1 | return meth(*args, **kwargs)
ckan-dev_1 | File "/srv/app/src/ckan/ckan/views/admin.py", line 124, in post
ckan-dev_1 | }, data_dict)
ckan-dev_1 | File "/srv/app/src/ckan/ckan/logic/__init__.py", line 464, in wrapped
ckan-dev_1 | result = _action(context, data_dict, **kw)
ckan-dev_1 | File "/srv/app/src_extensions/ckanext-lacounts/ckanext/lacounts/actions.py", line 16, in config_option_update
ckan-dev_1 | 'featured_image_upload', 'clear_featured_image_upload')
ckan-dev_1 | File "/srv/app/src_extensions/ckanext-s3filestore/ckanext/s3filestore/uploader.py", line 182, in update_data_dict
ckan-dev_1 | self.upload_file = self.upload_field_storage.file
ckan-dev_1 | File "/usr/lib/python2.7/site-packages/werkzeug/datastructures.py", line 2745, in __getattr__
ckan-dev_1 | return getattr(self.stream, name)
ckan-dev_1 | AttributeError: SpooledTemporaryFile instance has no attribute 'file'
As of CKAN 2.8 this affects uploads on the admin interface (eg logo). But on 2.9 it will affect all uploads.
This probably needs fixing in CKAN core, but flagging here for now. Because of various direct calls to the uploader.get_storage_path()
method around CKAN, you still need to provide a ckan.storage_path
config option even if you want to use s3 exclusively.
Eg:
storage
folder is created on the main uploads diruploads_enabled
. This is used to show the Upload button on the UI.There might be others. It would be good to refactor those so they use get_uploader
or whatever instead so they consider extensions
Cloudfront - AWS's CDN, improves the user experience.
It can also be deployed to help control bandwidth cost and stop scrapers/bots from abusing downloads.
By using CloudFront Signed URLs, we can add code to render signed URLs that expire after a given time (e.g. every hour).
This way, users can still download files without requiring a login and have the additional benefit of low-latency access, while minimizing scraper/bot abuse.
This extension doesn't use it directly (the mocks3 library, moto, uses it), but it seems the version of httpretty used by ckan is causing ckanext-s3filestore tests to run very very slowly (20+ mins for six tests). Downgrading to the version previously used by ckan, 0.6.2, allows the test to run faster again. Upgrading httpretty to the most recent version 0.8.12 causes other errors.
This extension does not currently provide a way to migrate existing filestore resources into S3. There is code to do this at https://github.com/datagovsg/ckanext-s3-resources/blob/master/ckanext/datagovsg_s3_resources/commands.py which could potentially be copied and adapted.
This is particularly important because of #28; if the fallback is not working, it becomes all the more important to be able to migrate resources into S3.
Hello. I am an user that is using CKAN 2.8.1 on Ubuntu 16.04.
For using S3 as a datastore for CKAN I am trying to use plugin - s3filestore.
But I found an error when accessing S3 by boto.
I tested something changed, so I could see that it's working - here is pr #21
So I want to fix it by pull request. Please let me know the result of pr as soon as possible.
Thank you!
I'm seeing a range of simple exceptions coming out of datahub.io (running CKAN 2.4 with this extension for file storage).
OSError: [Errno 13] Permission denied: 'data'
File "raven/middleware.py", line 35, in __call__
iterable = self.application(environ, start_response)
File "webob/dec.py", line 147, in __call__
resp = self.call_func(req, *args, **self.kwargs)
File "webob/dec.py", line 208, in call_func
return self.func(req, *args, **kwargs)
File "fanstatic/publisher.py", line 234, in __call__
return request.get_response(self.app)
File "webob/request.py", line 1053, in get_response
application, catch_exc_info=False)
File "webob/request.py", line 1022, in call_application
app_iter = application(self.environ, start_response)
File "webob/dec.py", line 147, in __call__
resp = self.call_func(req, *args, **self.kwargs)
File "webob/dec.py", line 208, in call_func
return self.func(req, *args, **kwargs)
File "fanstatic/injector.py", line 39, in __call__
return request.get_response(self.app)
File "webob/request.py", line 1053, in get_response
application, catch_exc_info=False)
File "webob/request.py", line 1022, in call_application
app_iter = application(self.environ, start_response)
File "ckan/config/middleware.py", line 389, in inner
result = application(environ, start_response)
File "beaker/middleware.py", line 73, in __call__
return self.app(environ, start_response)
File "beaker/middleware.py", line 155, in __call__
return self.wrap_app(environ, session_start_response)
File "routes/middleware.py", line 131, in __call__
response = self.app(environ, start_response)
File "pylons/wsgiapp.py", line 125, in __call__
response = self.dispatch(controller, environ, start_response)
File "pylons/wsgiapp.py", line 324, in dispatch
return controller(environ, start_response)
File "ckan/lib/base.py", line 338, in __call__
res = WSGIController.__call__(self, environ, start_response)
File "pylons/controllers/core.py", line 221, in __call__
response = self._dispatch_call()
File "pylons/controllers/core.py", line 172, in _dispatch_call
response = self._inspect_call(func)
File "pylons/controllers/core.py", line 107, in _inspect_call
result = self._perform_call(func, args)
File "ckan/controllers/storage.py", line 168, in file
exists = self.ofs.exists(BUCKET, label)
File "ckan/controllers/storage.py", line 115, in ofs
StorageController._ofs_impl = get_ofs()
File "ckan/controllers/storage.py", line 82, in get_ofs
ofs = get_impl(storage_backend)(**kw)
File "ofs/local/pairtreestore.py", line 30, in __init__
self._open_store()
File "ofs/local/pairtreestore.py", line 34, in _open_store
self._store = PairtreeStorageClient(self.uri_base, self.storage_dir, shorty_length=self.shorty_length, hashing_type=self.hashing_type)
File "pairtree/pairtree_client.py", line 91, in __init__
self._init_store()
File "pairtree/pairtree_client.py", line 243, in _init_store
os.mkdir(self.store_dir)
Data publishers should not be penalized for success.
For data publishers that want to control their bandwidth costs, but still want to give "fast lane" access for high-volume, commercial users, S3's Requester Pays feature is a great option.
When this option is selected when publishing a file, the file will also be published in a Requester Pays bucket.
Since this is an option for advanced users, I think there's no need to go beyond exposing the file download link in the requester pays bucket, with instructions to use s3cmd CLI or a tool like s3browser.
Note this is how Cornell university is distributing bulk data from arxiv.org - https://arxiv.org/help/bulk_data_s3
AWS S3 has native support for torrents.
http://docs.aws.amazon.com/AmazonS3/latest/dev/S3Torrent.html
This should help data publishers distribute large datasets without running up their bandwidth bill and actively engaging the community to give back and help with file distribution.
The standard CKAN filestore uploader (ckan/lib/uploader.py:ResourceUpload:__init__
) detects the size of the uploaded file and adds that to the resource metadata. The S3 uploader does not.
The ckanext.s3filestore.uploader.S3ResourceUploader.get_path
function signature is not compatible with the default ckan.lib.uploader.ResourceUpload.get_path
signature. Any code that tries to retrieve ckan.lib.uploader
and call get_path
on it will fail as a result.
Hi,
As i see it, the extension is written to use Amazon S3, is it possible to use another custom endpoint? We are using HCP (Hitachi Content Platform) with S3, and i have read some documentation: http://boto.cloudhackers.com/en/latest/boto_config_tut.html that suggests that its possible to use boto with a custom endpoint with credentials.
My question is if it's possible currently, and if not if it's something that you are planning to support. If i were to implement such connection myself, is there something i should be aware of?
I know this is not an "issue" per se, so if i'm posting in the wrong forum let me know :)
Resource successfully loaded in AWS. But link is broken somehow.
Here the apache2 log error:
[Wed Mar 23 15:42:39.168997 2016] [:error] [pid 22405:tid 140562148165376] 2016-03-23 15:42:39,168 INFO [ckan.lib.base] /dataset/bffdfabb-21e3-4e4e-ba70-416b6c668cef/resource/bca4a3ec-c90e-4e74-aa11-f5d959b9853b/download/owls.jpg render time 0.310 seconds
[Wed Mar 23 15:42:39.170157 2016] [:error] [pid 22405:tid 140562148165376] [remote 127.0.0.1:18995] mod_wsgi (pid=22405): Exception occurred processing WSGI script '/etc/ckan/default/apache.wsgi'.
[Wed Mar 23 15:42:39.170194 2016] [:error] [pid 22405:tid 140562148165376] [remote 127.0.0.1:18995] TypeError: expected byte string object for header name, value of type unicode found
ckan == 2.5.1
Problem on server only. Locally all working fine. (development mode)
Does this work with private datasets? Meaning, URLs are not accessible if a dataset they belong to is private?
SignatureDoesNotMatch
The request signature we calculated does not match the signature you provided. Check your key and signing
Do you think it wood be better idea, to store data only on S3?
As far as I'm understand by now data just copied to S3 and my dataset displayed locally stored data.
I need to protect some of my datasets with S3 Encryption and I can work on PR if you like to.
I can only imagine the Data Explorer can't access the dataset because it's stored on S3, is there any suggestion where i can get started to integrate the two? At the moment, I'm getting the following error in the Data Explorer view for TSV files:
This resource view is not available at the moment. Click here for more information.
Could not load view: DataProxy returned an error (Data transformation failed. HTTPError: HTTP Error 401: Unauthorized)
Can this extension rely on IAM roles, instead of requiring a secret key? We're running on AWS EC2, so in theory it should be possible to access S3 without needing any keys.
A few changes to fix mime-type might be good to tag a new patch release? v0.1.1...master
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.