Coder Social home page Coder Social logo

Comments (8)

apyasic avatar apyasic commented on June 15, 2024 1

@markheger right, the idea is to copy BigData.jar from $STREAMS_INSTALL/toolkits/com.ibm.streamsx.hdfs/impl/lib during build.

from streamsx.objectstorage.

apyasic avatar apyasic commented on June 15, 2024

@ddebrunner and @markheger
Working on stocator based object store implementation, i've run into the following dilemma.

ObjectScan/Source/Sink operators code mimic a lot of logic of HDFSDirectoryScan, HDFSSource and HDFSSink.

In the initial ObjectScan/Source/Sink operators'
implementation I even used BigData.jar from HDFS toolkit
(containing HDFS operator implementations) as a library.
Yet, it required some tweaking, like hiding unrelevant parameters,
like hdfsUri, hdfsUser, etc. In addition, BigData.jar contains
hadoop client implementation (based on Apache Hadoop client)
which was replaced with stocator-based client.

What do you think is the best approach to take?
Is it better to duplicate relevant HDFS classes in the object storage toolkit
or still to reuse HDFS operators implementation as a library?

from streamsx.objectstorage.

markheger avatar markheger commented on June 15, 2024

There are several branches of the HDFS toolkit. A good code base would be the "bluemix" branch. The difference between the branches is mainly the loading of the jars. We should not depend on any HADOOP_HOME environment variable.
I would go with the approach to have a copy of the classes, because it would be very confusing if this toolkit would depend on the hdfs toolkit.

from streamsx.objectstorage.

apyasic avatar apyasic commented on June 15, 2024

@markheger what if would take kind of intermediate approach: independent operators hierarchy from one side as you proposed and utilization of "infrastructure" classes like HDFSFile and AsyncBufferWriter from HDFS Toolkit BigData.jar?

from streamsx.objectstorage.

markheger avatar markheger commented on June 15, 2024

@apyasic I agree to your suggestion to take few classes from the BigData.jar.
You mean, that during toolkit build the BigData.jar is copied from $STREAMS_INSTALL/toolkits/com.ibm.streamsx.hdfs/impl/lib into the com.ibm.streamsx.objectstorage toolkit "opt" directory?
Or depend on hdfs toolkit and let the jar located in hdfs toolkit?

from streamsx.objectstorage.

alexpy11 avatar alexpy11 commented on June 15, 2024

@markheger looking for collective name for "container" and "bucket".
Do you have ideas?

from streamsx.objectstorage.

markheger avatar markheger commented on June 15, 2024

Shall we keep the word container for both?
"A bucket is a container for objects stored in Amazon S3. Every object is contained in a bucket" http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html

from streamsx.objectstorage.

alexpy11 avatar alexpy11 commented on June 15, 2024

@markheger agreed. In the "wrapped" version the "bucket" parameter will be used instead to make it compiant with S3 vocabulary.

from streamsx.objectstorage.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.