ibmstreams / streamsx.inet Goto Github PK
View Code? Open in Web Editor NEWThis toolkit supports common internet protocols, such as HTTP and WebSockets
Home Page: http://ibmstreams.github.io/streamsx.inet/
License: Other
This toolkit supports common internet protocols, such as HTTP and WebSockets
Home Page: http://ibmstreams.github.io/streamsx.inet/
License: Other
I forked a few days ago and see that chanskw added some files that I need. How to I get the most current version that checked into the 'upstream". This is obviously a newbe question.
"Corporation" spelled incorrectly in XML copyright statements.
It is strongly encouraged that artifacts in a toolkit are described and documented using SPLDOC so that users can understand how to use and parameterize the operator.
Add a copy of the license.md file to the com.ibm.streamsx.inet toolkit directory. This directory is the shippable part of the the project and is covered under the same license. This needs to be present in the shippable part.
Just wondering if it made sense to move the C++ code into the cgt files, or somehow reference it from them, so that it's compiled by the SPL compile, and thus the toolkit would become portable.
This then means we could have a single release of this toolkit, rather than potentially many.
In the inet toolkit, the HTTPTupleInjection operator provides a form at prefix/ports/output/port index/form for submitting tuples to the operator, however the HTTPXMLInjection operator does not provide a form similar to HTTPTupleInjection for the same purpose.
Operators should be named for the function they perform. There is no requirement to have "Sink" and "Source" in the name for such operators, so remove the terms "Source" and "Sink" from these operators.
The upside to renaming is having clean names in this new toolkit.
The downside is that existing applications may be using the existing names, though in a different namespace and different toolkit.
Some options:
Note in all cases existing code still has to adapt to new namespace & toolkit.
My vote is 1)
Thoughts?
Allow the URL to be general attribute-free expressions. This means they can include function calls. If a URL does include stateful function calls, the URL will be recomputed before every fetch.
ant spldoc gives
/homes/hny2/hildrum/github/IBMStreams/streamsx.inet/build.xml:18: Execute failed: java.io.IOException: Cannot run program "git": java.io.IOException: error=2, No such file or directory
The sample does not compile:
make
/homes/chanskw/streams//bin/sc -a -t ../../com.ibm.streamsx.inet -T -M com.ibm.streamsx.inet.sample::GetWeather
CDISP0727W WARNING: The ../../com.ibm.streamsx.inet input path is not a directory.
CDISP0385E ERROR: The InetSourceSample toolkit requires version 1.0.0 of the com.ibm.streamsx.inet toolkit, but version 1.0.0 of the com.ibm.streamsx.inet toolkit is not available.
CDISP0131E ERROR: Errors occurred while the toolkits were loading.
make: *** [standalone] Error 1
---- External builder for project InetSource completed in 1.692 seconds ----
Enhance the HTTPPost operator to allow the value of the 'Content-Type' request header to be specified as a parameter, and fill the body of the request from a specified attribute. The attribute could contain an rstring or ustring encoded upstream to the type specified by the parameter, or, the attribute could be a composite type to be encoded as specified by the attribute by the operator, such as JSON or CSV.
The com.ibm.streamsx.inet.http package is starting to build up a set of HTTP client utilities, including items like authentication support (see pull request #42). This seems like it is duplicating the Apache HTTP Client Components:
https://hc.apache.org/httpcomponents-client-ga/index.html
I propose this toolkit should use the Apache library to avoid duplicating work and also then to be able to support a much richer set of authentication options (e.g. see issue #40).
I'll start by converting the new HTTPGetXMLContent operator to use the library (I started dsown this path as I found I needed some capabilities in the Apache library).
Operators supporting streaming reads (chunked source) , post etc
Samples should be in a namespace rather than the default namespace.
Using the default namespace increases the changes of a name clash (e.g. multiple applications using Main as their sample composite name).
Users of Streams have requested that samples such as this are in their own namespace.
An operator that uses HTTP GET to fetch application/xml or text/xml content with the content being put into a SPL xml attribute. Thus a single output tuple is submitted for every successful request, containing the complete content.
This then can be easily streamed into the standard SPL toolkit XMLParse operator for downstream processing.
Streams supports Java 6 so good to have the toolkits work with Java 6 (specific operators might require Java 7).
Note the default JVM for SPL applications is Java 7
Currently, the InetSource operator has trouble connecting to untrusted HTTPS connections. There should be a mechanism to tell the operator to accept certificates from untrusted connections.
Support common compressions: gzip, deflate
Typically operator parameters that specify time use a float64 to represent seconds:
https://github.com/IBMStreams/toolkits/wiki/Conventions-for-Operators
Any thoughts on how we make a downloadable toolkit available for users who don't care about source, just want to download the toolkit and reference it from an SPL application?
I can see we would add download bundles to the "pages" site, but is it:
I've only released portable (Java & SPL code) toolkits before.
Get WsReceive() and WsSend() checked in
The SPLDoc for INetSource is minimal, seems the original product documentaion has more information, can it be brought into this toolkit?
To use the script as it currently is, you need a second clone of the repository. In a clone set to the master branch, run the script giving as an argument the location of a different clone that can be set to the gh-pages branch, eg:
./makeDocs.pl ../../IBMStreams-web/streamsx.inet/
What it does is then:
The make doc script needs improvement. Some that would be useful are:
Currently, the InetSource operator has trouble connecting to untrusted HTTPS connections. There should be a mechanism to tell the operator to accept certificates from untrusted connections.
I originally hit this problem while attempting to use the InetSource operator to connect to the InfoSphere Streams REST API.
Current the Jetty webserver is listening on all network interfaces, this caused an issue when some of the network interfaces were running at full throughput, with traffic unrelated to the Jetty operators.
Hi all,
I have downloaded the streamsx.inet toolkit and configured in Streams Studio.
Like other streams applications right click on "Main.spl" and select "Launch" optioin executes the application. But i for this ExchangeWebSocket example that option is disabled.
can anyone please guide me throgh. How can i execute / start this sample application ?
do i need to install / configure any webserver or something ? like apache tomcat ?
Your response would be much appreciated !
Regards,
Amir
Amir
Hi,
I have an issue which I think is related to TooTallNate/Java-WebSocket#225.
Whenever I run the WebSocketSend for an extended period of time, the CPU usage of the streams PEC in which the operator is running goes into overload - usually exhibiting 90 to 100 % processor usage.
I haven't been able to produce a minimum working example as it seems to be somewhat random, and usually only crops up after a few hours of continuous use. My application generates a relatively low rate of one tuple every 1 to 4 seconds.
I was wondering if anyone else is seeing something similar?
Cillian
The documentation for the HTTPPost operator says 'Nested attributes are not individually accessed and serialized. Only the top level attributes are serialized individually.' What does that mean, exactly?
For example, will it serialize an attribute of type ' list < WordType > ', like this:
type TranscribedWordsType =
rstring mediaFilename, // copied from same attribute in input tuple
rstring streamLanguage, // copied from same attribute in input tuple
float64 wordStartTime, // from FIRST(attribute), in seconds
float64 wordEndTime, // from LAST(attribute), in seconds
list<WordType> words, // see below
uint64 sequenceNumber;
type WordType = rstring word, float64 start, float64 end, float32 confidence ;
For another example, will it serialize an attribute of type ' map < rstring, float32 > ', like this:
type ScoredFrameType =
VideoType, tuple< // see below
map<rstring, float32> modelScores // a score for each model
>;
type VideoType =
rstring mediaFilename, // full pathname of file
float64 mediaDuration, // playback duration of file
uint32 videoFrameIndex, // from VIDEO_FRAME_INDEX()
float64 videoFrameStartTime, // from VIDEO_FRAME_START_TIME()
float64 videoFrameEndTime, // from VIDEO_FRAME_END_TIME()
boolean videoKeyframe, // from VIDEO_KEYFRAME()
... and so on ...
Are tuples like these acceptable to HTTPPost, or do I need to flatten them out somehow?
Put the Mail toolkit capabilities in this toolkit
Using the meta-data from HTTPTupleView, add a html page that automatically provides links to a live table (automatically updated) view of any stream exposed by HTTPTupleView.
Would use the Dojo Javascript libraries from $STREAMS_INSTALL/ext/dojo
Basically allows anyone using HTTPTupleView to have instant live views of their feeds, rather than just a JSON static view. There's no requirement for any application to use Dojo, this is just an out of the box, easy to use, getting started feature.
What about being able to read messages and send messages using and instant messaging server such as Sametime ?
The various descriptions are all somewhat different:
Summary
"This toolkit is focused on interacting with network hosted data."
README
"The IBMStreams/streamsx.inet toolkit project is an open source Streams toolkit project focused on the development of operators and functions that extend IBM InfoSphere Streams internet capabilities."
SPL DOC description
"Internet Adapters Toolkit"
It would be good if we had a consistent one line summary. I think the README is a better (but not great) description, the toolkit is not just about network hosted data, but (in my view) about support common internet protocols , e.g. HTTPTupleView makes streaming data available to JSON clients, is that "network hosted data"?
The InetSource sample doesn't import into Eclipse correctly.
Using these 2 operators in SPL application with authenticationType set to "basic" and a valid authenticationFile results in http error 401. Further debugging shows that these operators are not setting the "Authorization" headers in the http request.
Using HTTPGetXMLContent with NextBus need to update the URL to change the query parameters to include a last time parameter, which is obtained from the content of a previous GET.
I'm trying to setup the toolkit as a Streams Studio Project, here are the steps so far:
The curl library used by the InetSource operator supports proxies via the use of environment variables. However, there's no way to set the environment variables, so InetSource doesn't work with proxies.
To fix this, I will add a envVars
parameter to the InetSource operator. It will be able to take multiple values. For example:
envVars: "http_proxy=http://your.proxy.server:port/"
Will set the http_proxy
environment variable to http://your.proxy.server:port/
I'm getting errors compiling with ant:
[javac] /homes/hny2/hildrum/github/IBMStreams/streamsx.inet/com.ibm.streamsx.inet/impl/java/src/com/ibm/streamsx/inet/rest/test/ContextTest.java:10: package org.junit does not exist
[javac] import static org.junit.Assert.assertEquals;
[javac] ^
I must be missing the junit package--how do I get that setup?
The ant build.xml creates a jarfile with the name com.ibm.streamsx.inet.rest.jar.
Since it also includes java packages and classes that have nothing to do with the rest code, its better to name it "com.ibm.streamsx.inet.jar" instead.
Example use is the NextBus public XML feed where the returned XML content incudes a lastTime attribute. This returned value is to be used on the next request to get updates since that time.
Add support for changing an attribute based upon an XPath expression, example use:
stream<xml locationXMLDoc> RVLXML = HTTPGetXMLContent()
{
param
url : commandUrl("vehicleLocations", $params);
period : validatePollingTime($period);
updateParameter: "t";
updateParameterFromContent: "/body/lastTime/@time";
}
You will get errors like this:
CDISP0127E ERROR: The following toolkit file is out of date: ../../com.ibm.streamsx.inet/toolkit.xml. This file is newer: ../../com.ibm.streamsx.inet/com.ibm.streamsx.inet.wsserver/WebSocketSend/WebSocketSend.xml.
Looks like the build.xml from the repository does not tar things up properly.. or the tar and untar process altered the timestamp on the toolkit... causing the toolkit.xml to be older than the operator model xml.
inet toolkit at github includes junit test cases and junit lib dependency, which should be removed.
Also the current junit dependency results in compile error as the junit test cases requires version 4 to work, but it currently specifies version 3.
I'd like to add the HTTP operators from the inet toolkit on Streams Exchange https://www.ibm.com/developerworks/community/forums/html/topic?id=66bd3284-a120-4104-b2e8-a07acab237d6
I've refactored the code recently to use the Java operator annotations added in Streams 3.2.
No documentation for the HTTPGetStream input port, so unclear how to use the operator.
Add a pages site for this project and a mechanism to publish the spldoc..
Please! X.509 support is needed for some clients. Need to support certificate-based access wherever possible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.