Comments (8)
@markheger right, the idea is to copy BigData.jar from $STREAMS_INSTALL/toolkits/com.ibm.streamsx.hdfs/impl/lib during build.
from streamsx.objectstorage.
@ddebrunner and @markheger
Working on stocator based object store implementation, i've run into the following dilemma.
ObjectScan/Source/Sink operators code mimic a lot of logic of HDFSDirectoryScan, HDFSSource and HDFSSink.
In the initial ObjectScan/Source/Sink operators'
implementation I even used BigData.jar from HDFS toolkit
(containing HDFS operator implementations) as a library.
Yet, it required some tweaking, like hiding unrelevant parameters,
like hdfsUri, hdfsUser, etc. In addition, BigData.jar contains
hadoop client implementation (based on Apache Hadoop client)
which was replaced with stocator-based client.
What do you think is the best approach to take?
Is it better to duplicate relevant HDFS classes in the object storage toolkit
or still to reuse HDFS operators implementation as a library?
from streamsx.objectstorage.
There are several branches of the HDFS toolkit. A good code base would be the "bluemix" branch. The difference between the branches is mainly the loading of the jars. We should not depend on any HADOOP_HOME environment variable.
I would go with the approach to have a copy of the classes, because it would be very confusing if this toolkit would depend on the hdfs toolkit.
from streamsx.objectstorage.
@markheger what if would take kind of intermediate approach: independent operators hierarchy from one side as you proposed and utilization of "infrastructure" classes like HDFSFile and AsyncBufferWriter from HDFS Toolkit BigData.jar?
from streamsx.objectstorage.
@apyasic I agree to your suggestion to take few classes from the BigData.jar.
You mean, that during toolkit build the BigData.jar is copied from $STREAMS_INSTALL/toolkits/com.ibm.streamsx.hdfs/impl/lib into the com.ibm.streamsx.objectstorage toolkit "opt" directory?
Or depend on hdfs toolkit and let the jar located in hdfs toolkit?
from streamsx.objectstorage.
@markheger looking for collective name for "container" and "bucket".
Do you have ideas?
from streamsx.objectstorage.
Shall we keep the word container for both?
"A bucket is a container for objects stored in Amazon S3. Every object is contained in a bucket" http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
from streamsx.objectstorage.
@markheger agreed. In the "wrapped" version the "bucket" parameter will be used instead to make it compiant with S3 vocabulary.
from streamsx.objectstorage.
Related Issues (20)
- Globalization: Provide fallback message
- Vulnerability found in org.codehaus.jackson:jackson-mapper-asl HOT 1
- Globalization support - add/update messages
- Globalization (it_IT): apostrophes are uncorrectly doubled in some messages HOT 1
- Vulnerable jar found: jackson-databind-2.6.7.1.jar HOT 1
- Vulnerable jar found: jackson-databind-2.9.10.4.jar
- vulnerability (177835) in Apache commons codec
- Toolkit uses a vulnerable version of log4j. HOT 2
- Toolkit uses vulnerable versions of third-party jars
- Update httpclient to 4.5.13 to resolve potential security vulnerability
- Upgrade to Hadoop 2.10.1 to resolve potential security vulnerability
- Sample applications: Prepare Makefiles for CP4D buildservice (build with VsCode)
- Potential security vulnerability in junit.jar HOT 1
- Potential security vulnerability in jackson-dataformat-cbor HOT 2
- Update guava jar to fix a potential security vulnerability HOT 1
- `S3ObjectStorageSink` parameter `skipPartitionAttributes` does not remove partition attributes HOT 2
- Upgrade commons-io to resolve potential security vulnerability
- cffi version mismatch under IBM Cloud Pak for Data HOT 1
- Update ibm-cos-java-sdk-bundle
- Objects remain opened after timePerObject expires
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streamsx.objectstorage.