Coder Social home page Coder Social logo

ibmstreams / streamsx.inet Goto Github PK

View Code? Open in Web Editor NEW
10.0 25.0 28.0 15.31 MB

This toolkit supports common internet protocols, such as HTTP and WebSockets

Home Page: http://ibmstreams.github.io/streamsx.inet/

License: Other

Java 57.66% HTML 0.43% Makefile 1.22% Shell 16.39% Perl 1.97% C++ 20.94% Python 1.39%
ibm-streams inet ftp websockets http stream-processing toolkit

streamsx.inet's People

Contributors

alexpogue avatar bmwilli avatar bruceglassford avatar chanskw avatar conglisc avatar ddebrunner avatar ejpring avatar hildrum avatar jan-van-dieren avatar joergboe avatar lcawley avatar petenicholls avatar rohitsw avatar siegenth avatar siegenthibm avatar ulemanstreaming avatar zollnapa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

streamsx.inet's Issues

How do I rebase?

I forked a few days ago and see that chanskw added some files that I need. How to I get the most current version that checked into the 'upstream". This is obviously a newbe question.

license missing in toolkit

Add a copy of the license.md file to the com.ibm.streamsx.inet toolkit directory. This directory is the shippable part of the the project and is covered under the same license. This needs to be present in the shippable part.

INetSource C++ code - can it be moved into the cgt files?

Just wondering if it made sense to move the C++ code into the cgt files, or somehow reference it from them, so that it's compiled by the SPL compile, and thus the toolkit would become portable.

This then means we could have a single release of this toolkit, rather than potentially many.

Rename HTTPGetStreamSource and HTTPPostSink

Operators should be named for the function they perform. There is no requirement to have "Sink" and "Source" in the name for such operators, so remove the terms "Source" and "Sink" from these operators.

The upside to renaming is having clean names in this new toolkit.

The downside is that existing applications may be using the existing names, though in a different namespace and different toolkit.

Some options:

  1. Rename - existing code will have to adapt to new names.
  2. Don't rename
  3. Support both names, with old names documented as deprecated, strange to start a new toolkit with deprecated features
  4. Rename but also have old names in a deprecated toolkit within the repo

Note in all cases existing code still has to adapt to new namespace & toolkit.

My vote is 1)

Thoughts?

Top level build file fails on machine with no git

ant spldoc gives

/homes/hny2/hildrum/github/IBMStreams/streamsx.inet/build.xml:18: Execute failed: java.io.IOException: Cannot run program "git": java.io.IOException: error=2, No such file or directory

InetSource sample does not compile

  1. download release from here:
    https://github.com/IBMStreams/streamsx.inet/releases
  2. extract the release
  3. import InetSource sample as example
  4. Add streamsx.inet toolkit as toolkit location in studio

The sample does not compile:
make

/homes/chanskw/streams//bin/sc -a -t ../../com.ibm.streamsx.inet -T -M com.ibm.streamsx.inet.sample::GetWeather
CDISP0727W WARNING: The ../../com.ibm.streamsx.inet input path is not a directory.
CDISP0385E ERROR: The InetSourceSample toolkit requires version 1.0.0 of the com.ibm.streamsx.inet toolkit, but version 1.0.0 of the com.ibm.streamsx.inet toolkit is not available.
CDISP0131E ERROR: Errors occurred while the toolkits were loading.
make: *** [standalone] Error 1

---- External builder for project InetSource completed in 1.692 seconds ----

Support other Content-Type values in HTTPPost operator

Enhance the HTTPPost operator to allow the value of the 'Content-Type' request header to be specified as a parameter, and fill the body of the request from a specified attribute. The attribute could contain an rstring or ustring encoded upstream to the type specified by the parameter, or, the attribute could be a composite type to be encoded as specified by the attribute by the operator, such as JSON or CSV.

Use Apache HTTP Client Components

The com.ibm.streamsx.inet.http package is starting to build up a set of HTTP client utilities, including items like authentication support (see pull request #42). This seems like it is duplicating the Apache HTTP Client Components:

https://hc.apache.org/httpcomponents-client-ga/index.html

I propose this toolkit should use the Apache library to avoid duplicating work and also then to be able to support a much richer set of authentication options (e.g. see issue #40).

I'll start by converting the new HTTPGetXMLContent operator to use the library (I started dsown this path as I found I needed some capabilities in the Apache library).

Put InetSource sample application into a namespace

Samples should be in a namespace rather than the default namespace.

Using the default namespace increases the changes of a name clash (e.g. multiple applications using Main as their sample composite name).

Users of Streams have requested that samples such as this are in their own namespace.

Add new operator HTTPGetXMLContent

An operator that uses HTTP GET to fetch application/xml or text/xml content with the content being put into a SPL xml attribute. Thus a single output tuple is submitted for every successful request, containing the complete content.

This then can be easily streamed into the standard SPL toolkit XMLParse operator for downstream processing.

Ensure Java classes can run on Java 6

Streams supports Java 6 so good to have the toolkits work with Java 6 (specific operators might require Java 7).

Note the default JVM for SPL applications is Java 7

How to release?

Any thoughts on how we make a downloadable toolkit available for users who don't care about source, just want to download the toolkit and reference it from an SPL application?

I can see we would add download bundles to the "pages" site, but is it:

  • a single download and the end user must type make to build the platform specific c++ code, and have the java code already built
  • or provide multiple downloads, as required for different cpu architecture/operating system?

I've only released portable (Java & SPL code) toolkits before.

Document INetSource operator

The SPLDoc for INetSource is minimal, seems the original product documentaion has more information, can it be brought into this toolkit?

makeDoc.pl script needs improvement

To use the script as it currently is, you need a second clone of the repository. In a clone set to the master branch, run the script giving as an argument the location of a different clone that can be set to the gh-pages branch, eg:
./makeDocs.pl ../../IBMStreams-web/streamsx.inet/

What it does is then:

  • Searches the repository for directories containing info.xml (excluding the top-level test directory)
  • For each such directory, runs spl-make-doc, and then copies any new or updated files into the repository on the gh-pages branch. For new files, it runs git add.

The make doc script needs improvement. Some that would be useful are:

  • better documentation
  • automatically create gh-pages clone in /tmp if not supplied on command line
  • commit the files on the gh-pages branch
  • option to push after the commit
  • run make and spl-make-toolkit before running spl-make-doc
  • possibly update the samples section of index.html (?)

InetSource operator should be able to connect to untrusted HTTPS

Currently, the InetSource operator has trouble connecting to untrusted HTTPS connections. There should be a mechanism to tell the operator to accept certificates from untrusted connections.

I originally hit this problem while attempting to use the InetSource operator to connect to the InfoSphere Streams REST API.

How to Launch ExchangeWebSocket Example ?

Hi all,
I have downloaded the streamsx.inet toolkit and configured in Streams Studio.
Like other streams applications right click on "Main.spl" and select "Launch" optioin executes the application. But i for this ExchangeWebSocket example that option is disabled.

can anyone please guide me throgh. How can i execute / start this sample application ?
do i need to install / configure any webserver or something ? like apache tomcat ?

Your response would be much appreciated !

Regards,
Amir
Amir

WebSocketSend CPU overload

Hi,

I have an issue which I think is related to TooTallNate/Java-WebSocket#225.

Whenever I run the WebSocketSend for an extended period of time, the CPU usage of the streams PEC in which the operator is running goes into overload - usually exhibiting 90 to 100 % processor usage.

I haven't been able to produce a minimum working example as it seems to be somewhat random, and usually only crops up after a few hours of continuous use. My application generates a relatively low rate of one tuple every 1 to 4 seconds.

I was wondering if anyone else is seeing something similar?

Cillian

[question] what types of tuple attributes does the HTTPPost operator serialize?

The documentation for the HTTPPost operator says 'Nested attributes are not individually accessed and serialized. Only the top level attributes are serialized individually.' What does that mean, exactly?

For example, will it serialize an attribute of type ' list < WordType > ', like this:

type TranscribedWordsType = 
    rstring mediaFilename, // copied from same attribute in input tuple
    rstring streamLanguage, // copied from same attribute in input tuple
    float64 wordStartTime, // from FIRST(attribute), in seconds
    float64 wordEndTime, // from LAST(attribute), in seconds
    list<WordType> words, // see below
    uint64 sequenceNumber;

type WordType = rstring word, float64 start, float64 end, float32 confidence ;

For another example, will it serialize an attribute of type ' map < rstring, float32 > ', like this:

type ScoredFrameType = 
    VideoType, tuple< // see below
    map<rstring, float32> modelScores // a score for each model
    >;

type VideoType = 
    rstring mediaFilename, // full pathname of file
    float64 mediaDuration, // playback duration of file
    uint32 videoFrameIndex, // from VIDEO_FRAME_INDEX()
    float64 videoFrameStartTime, // from VIDEO_FRAME_START_TIME()
    float64 videoFrameEndTime, // from VIDEO_FRAME_END_TIME()
    boolean videoKeyframe, // from VIDEO_KEYFRAME()
    ... and so on ...

Are tuples like these acceptable to HTTPPost, or do I need to flatten them out somehow?

Add live table support for HTTPTupleView using a Dojo grid.

Using the meta-data from HTTPTupleView, add a html page that automatically provides links to a live table (automatically updated) view of any stream exposed by HTTPTupleView.

Would use the Dojo Javascript libraries from $STREAMS_INSTALL/ext/dojo

Basically allows anyone using HTTPTupleView to have instant live views of their feeds, rather than just a JSON static view. There's no requirement for any application to use Dojo, this is just an out of the box, easy to use, getting started feature.

Instant messaging

What about being able to read messages and send messages using and instant messaging server such as Sametime ?

Have a consistent summary of this toolkit.

The various descriptions are all somewhat different:

Summary
"This toolkit is focused on interacting with network hosted data."

README

"The IBMStreams/streamsx.inet toolkit project is an open source Streams toolkit project focused on the development of operators and functions that extend IBM InfoSphere Streams internet capabilities."

SPL DOC description

"Internet Adapters Toolkit"

It would be good if we had a consistent one line summary. I think the README is a better (but not great) description, the toolkit is not just about network hosted data, but (in my view) about support common internet protocols , e.g. HTTPTupleView makes streaming data available to JSON clients, is that "network hosted data"?

Setup the toolkit directory as a StreamsStudio SPL project.

I'm trying to setup the toolkit as a Streams Studio Project, here are the steps so far:

  1. Clone the Git project
  2. Create a new SPL project 'com.ibm.streamsx.inet', unclick Use Default Location and browse to set the Location as the com.ibm.streamsx.inet folder in the cloned Git project.
  3. Click Finish
  4. Click on 'info.xml' in the project and select Replace from HEAD version to replace the info.xml that Streams Studio created, with the one that already existed in the project.

InetSource operator should support environment variables

The curl library used by the InetSource operator supports proxies via the use of environment variables. However, there's no way to set the environment variables, so InetSource doesn't work with proxies.

To fix this, I will add a envVars parameter to the InetSource operator. It will be able to take multiple values. For example:

envVars: "http_proxy=http://your.proxy.server:port/"

Will set the http_proxy environment variable to http://your.proxy.server:port/

Where do I get junit?

I'm getting errors compiling with ant:
[javac] /homes/hny2/hildrum/github/IBMStreams/streamsx.inet/com.ibm.streamsx.inet/impl/java/src/com/ibm/streamsx/inet/rest/test/ContextTest.java:10: package org.junit does not exist
[javac] import static org.junit.Assert.assertEquals;
[javac] ^

I must be missing the junit package--how do I get that setup?

rename streamsx.inet.rest.jar to streamsx.inet.jar

The ant build.xml creates a jarfile with the name com.ibm.streamsx.inet.rest.jar.
Since it also includes java packages and classes that have nothing to do with the rest code, its better to name it "com.ibm.streamsx.inet.jar" instead.

HTTPGetXMLContent support updating a URL query attribute from the returned content.

Example use is the NextBus public XML feed where the returned XML content incudes a lastTime attribute. This returned value is to be used on the next request to get updates since that time.

Add support for changing an attribute based upon an XPath expression, example use:

    stream<xml locationXMLDoc> RVLXML = HTTPGetXMLContent()
    {
        param
            url : commandUrl("vehicleLocations", $params);
            period : validatePollingTime($period);
            updateParameter: "t";
            updateParameterFromContent: "/body/lastTime/@time";
    }

Release tar.gz file causes toolkit file to be out of date

  1. download the release *.tar.gz file
  2. gunzip
  3. tar -xvf to untar
  4. Go to the samples directory samples/InetSource.
  5. type make

You will get errors like this:

CDISP0127E ERROR: The following toolkit file is out of date: ../../com.ibm.streamsx.inet/toolkit.xml. This file is newer: ../../com.ibm.streamsx.inet/com.ibm.streamsx.inet.wsserver/WebSocketSend/WebSocketSend.xml.

Looks like the build.xml from the repository does not tar things up properly.. or the tar and untar process altered the timestamp on the toolkit... causing the toolkit.xml to be older than the operator model xml.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.