Comments (17)
Hi @ulli-snyman, thank you for contributing to Kedro with adding a feature request and potentially adding support for new data sets. Adding support for GCS is something we would love to have as part of kedro.contrib.io
and we will be more than happy to welcome contributions for the datasets. I will mark this as good first issue
label, so anyone interested in doing can pick it up.
from kedro.
Hi @ulli-snyman, thank you for your interest in contributing to Kedro! I'm the QA on the Kedro team. You should be able to mock out calls to the GCP library. An example is shown here. You can also see how the developers test their client code here. If you need any further assistance, please let us know.
from kedro.
Looks like there's still no reply, and sure, go for it! @plauto
Thank you in advance for the contribution.
from kedro.
Hey @idanov, Getting ready for my first PR towards this issue, covering the CSVDataSet method,
I am busy writing tests and have run into a bit of a roadblock, Currently there is no Functionality to mock a GCS bucket as in the moto
package.
The common approach for testing with CGP services is to actually read/write to the service.
I'm scratching my head a bit here as I can run the tests with my credentials in a a testing project but that is specific to me and anyone else wanting to test this would need a GCP project to test this in.
Ive set the tests to take in GCP Configuration from ENV Vars, this is the best way I can see this working out... Would you be fine with this or have you got any other ideas as to how we could test this?
from kedro.
Hey @lorenabalan,
Things have been busy my side, will try wrap things up in the PR by the end of the month.
from kedro.
Hey there,
if that's OK, can I start working on it? I have some experience with GCP and I'd be happy to implemnt those features :)
from kedro.
from kedro.
I've updated the title with our internal ticket number to keep track of this more easily. :)
@ulli-snyman how is this coming along? Do you need any help from our side?
from kedro.
Things have been busy my side, will try wrap things up in the PR by the end of the month.
Totally fine, just wanted to check in and make sure that you're not stuck on something from our end. :)
from kedro.
Hi @plauto! We would love the help! But it might be a good idea to just sync with @ulli-snyman as he mentioned that he has started working on a PR. Let's give him until the end of the week to reply about how far he's gotten and whether or not he needs help. If there's no status update then it's all yours.
from kedro.
Sounds good to me! Thanks @yetudada
from kedro.
Hey! If that’s ok, can I start working on it this week?
from kedro.
@plauto How's the development coming along? If you would like our early feedback/comments, feel free to open a draft PR so we can see if you are on the right track :)
from kedro.
@921kiyo I am going to push a draft PR. Sorry for being a bit late on this, but I could find some time to work on it end of last week. There are still a couple of things to finish (e.g. unit tests for Versioned Dataset which have a bit of complexity due to the way I have structured unit tests). I look forward to get a feedback from you, when you will have some time. After that it shouldn't take long to finish up the rest!
from kedro.
This blog is the general information for the feature. You got good work for this blog. We have a developing our creative content of this mind. Thank you for this blog. This for very interesting and useful.
Best Google cloud Online Training
from kedro.
@ulli-snyman and everyone who has been watching this issue. We're excited to announce that kedro 0.15.5
will have CSVGCSDataSet
, ParquetGCSDataSet
and JSONGCSDataSet
.
In a following release of Kedro, we will have:
- Support for Google Big Query
- And a new series of file-storage agnostic datasets for
CSVDataSet
,ParquetDataSet
,JSONDataSet
,ExcelDataSet
,HDFDataSet
andPickleDataSet
made possible because we stumbled intofsspec
while we were looking at Dask integration; these datasets will support GCS, S3, etc. and simplify our data catalog
I'll close this issue when we have finished full support of GCS.
from kedro.
@ulli-snyman This PR has been addressed and full Google Cloud Support will be available in the next release. The datasets are already available in the develop
branch: https://github.com/quantumblacklabs/kedro/blob/develop/kedro/extras/datasets/
They all use fsspec
to load filepath:
and GCS is included in that series: https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/registry.html
from kedro.
Related Issues (20)
- ci: Nightly build failure on `develop` HOT 2
- Kedro cli command fails due to path conflicts HOT 4
- ci: Nightly build failure on `develop` HOT 1
- Authoring process for release notes is confusing HOT 3
- Test fails after following contribution guide and install dependencies in a fresh environment.
- ShareMemoryDataset does not have `exists()` method
- Create documentation of the usage of `configure_project`
- Add documentation about the combined use of `$globals` and `$runtime_params` HOT 2
- Search engines still index pages disallowed in `robots.txt` HOT 9
- Inconsistent navigation links in documentation
- Subprojects accidentally excluded from `robots.txt` HOT 1
- `kedro pipeline create` create `tests` folder inside `src` HOT 6
- Rethink how Kedro can play a role in multiprocessing / performance boost
- Kedro dataset CLI commands
- Proposal for Partial/Custom node ordering for SequentialRunner HOT 7
- ci: Nightly build failure on `main` HOT 1
- Prioritise user created catch-all dataset factory pattern over `{default}` from runners
- ci: Nightly build failure on `main` HOT 1
- Kedro Resume Information fails HOT 4
- Add info about tag names convention/limitation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kedro.