hortonworks / streamline Goto Github PK
View Code? Open in Web Editor NEWStreamLine - Streaming Analytics
License: Apache License 2.0
StreamLine - Streaming Analytics
License: Apache License 2.0
Currently JDBC sink just uses JdbcInsertBolt and doesn't set custom query. This means the query for insertion would be INSERT INTO <table> (<column list>) VALUES (<values>)
.
Given that Storm guarantees at-least-once, even there's no duplicate unique key for input tuples of JDBC sink, there could be same tuples received multiple time and it makes duplication of unique key. This means basic query only works with table which don't have unique keys (including primary key) which heavily limits functionality.
Moreover, Phoenix doesn't support non-PK table and support UPSERT instead of INSERT.
So JDBC sink should basically build UPSERT-like query, which format is up to the target DB type.
JDBC sink needs to have supported DB types, and lets users select correct one.
This is a subtask of #535
Support for translating StreamlineSource (KafkaSource) to the equivalent flink source.
Refer: KafkaSpoutFluxComponent.java
Failed to build project common where dependency com.hortonworks.registries:registry-common:jar:0.1.0-SNAPSHOT cannot be found.
[ERROR] Failed to execute goal on project common: Could not resolve dependencies for project com.hortonworks.streamline:common:jar:0.1.0-SNAPSHOT: Failure to find com.hortonworks.registries:registry-common:jar:0.1.0-SNAPSHOT in https://repository.apache.org/content/repositories/snapshots/
I checked both http://nexus-private.hortonworks.com/nexus/content/groups/public/com/hortonworks/registries/registry-common/ and http://repo1.maven.org/maven2/com/hortonworks/registries/ but neither of them has the dependency.
Both sort by Name and Last Updated results in the LOADING... text and results do not appear.
Some suggestions:
This is a subtask of #535
The topology DAG has a traverse method that accepts a TopologyDagVisitor and will call back the appropriate visit methods as the different components in the topology DAG are traversed. In case of storm this is StormTopologyFluxGenerator. Similar implementation will have to be provided for Flink (E.g. FlinkDataStreamGenerator).
Refer: StormTopologyFluxGenerator
Manual service pool add includes description. This is replaced by Ambari URL in case of Auto Add. Do we want to include description as well for auto added service pool?
Hi thanks for this great project and the presentation yesterday.
Which streaming engines besides storm are supported? Are there any plans to integrate with flink?
http://streamline.readthedocs.io/en/latest/
returns
\ SORRY /
\ /
\ This page does /
] not exist yet. [ ,'|
] [ / |
]___ ___[ ,' |
] ]\ /[ [ |: |
] ] \ / [ [ |: |
] ] ] [ [ [ |: |
] ] ]__ __[ [ [ |: |
] ] ] ]\ _ /[ [ [ [ |: |
] ] ] ] (#) [ [ [ [ :===='
] ] ]_].nHn.[_[ [ [
] ] ] HHHHH. [ [ [
] ] / `HH("N \ [ [
]__]/ HHH " \[__[
] NNN [
] N/" [
] N H [
/ N \
/ q, \
/ \
When we save new version if topology test case exists for current version, saving version fails with integrity check in the middle of removing topology dependencies.
Cloning current version of topology fails and removing current version of topology also fails.
ERROR [06:44:14.978] [dw-2191 - POST /api/v1/catalog/topologies/381/versions/save] c.h.s.s.c.s.StreamCatalogService - Got exception while copying topology dependencies
com.hortonworks.streamline.storage.exception.StorageException: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot delete or update a parent row: a foreign key constraint fails (`streamline`.`topology_test_run_case`, CONSTRAINT `topology_test_run_case_ibfk_1` FOREIGN KEY (`topologyId`) REFERENCES `topology` (`id`))
at com.hortonworks.streamline.storage.impl.jdbc.provider.sql.factory.AbstractQueryExecutor$QueryExecution.executeUpdate(AbstractQueryExecutor.java:223)
at com.hortonworks.streamline.storage.impl.jdbc.provider.sql.factory.AbstractQueryExecutor.executeUpdate(AbstractQueryExecutor.java:180)
at com.hortonworks.streamline.storage.impl.jdbc.provider.sql.factory.AbstractQueryExecutor.delete(AbstractQueryExecutor.java:85)
at com.hortonworks.streamline.storage.impl.jdbc.JdbcStorageManager.remove(JdbcStorageManager.java:73)
at com.hortonworks.streamline.storage.cache.writer.StorageWriteThrough.remove(StorageWriteThrough.java:41)
at com.hortonworks.streamline.storage.CacheBackedStorageManager.remove(CacheBackedStorageManager.java:60)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.removeOnlyTopologyEntity(StreamCatalogService.java:367)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.addTopology(StreamCatalogService.java:354)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.cloneTopologyVersion(StreamCatalogService.java:467)
at com.hortonworks.streamline.streams.service.TopologyCatalogResource.saveTopologyVersion(TopologyCatalogResource.java:333)
...
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot delete or update a parent row: a foreign key constraint fails (`streamline`.`topology_test_run_case`, CONSTRAINT `topology_test_run_case_ibfk_1` FOREIGN KEY (`topologyId`) REFERENCES `topology` (`id`))
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.Util.getInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:935)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3970)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3906)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073)
at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2009)
at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5098)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1994)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
at com.hortonworks.streamline.storage.impl.jdbc.provider.sql.factory.AbstractQueryExecutor$QueryExecution.executeUpdate(AbstractQueryExecutor.java:221)
... 67 common frames omitted
ERROR [06:44:15.130] [dw-2191 - POST /api/v1/catalog/topologies/381/versions/save] c.h.s.s.s.TopologyCatalogResource - Got exception: [RuntimeException] / message [com.hortonworks.streamline.storage.exception.StorageException: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot delete or update a parent row: a foreign key constraint fails (`streamline`.`topology_test_run_case_source`, CONSTRAINT `topology_test_run_case_source_ibfk_2` FOREIGN KEY (`sourceId`) REFERENCES `topology_source` (`id`))] / related resource location: [com.hortonworks.streamline.streams.service.TopologyCatalogResource.saveTopologyVersion](TopologyCatalogResource.java:333)
java.lang.RuntimeException: com.hortonworks.streamline.storage.exception.StorageException: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot delete or update a parent row: a foreign key constraint fails (`streamline`.`topology_test_run_case_source`, CONSTRAINT `topology_test_run_case_source_ibfk_2` FOREIGN KEY (`sourceId`) REFERENCES `topology_source` (`id`))
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.removeTopology(StreamCatalogService.java:396)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.cloneTopologyVersion(StreamCatalogService.java:472)
at com.hortonworks.streamline.streams.service.TopologyCatalogResource.saveTopologyVersion(TopologyCatalogResource.java:333)
...
Caused by: com.hortonworks.streamline.storage.exception.StorageException: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot delete or update a parent row: a foreign key constraint fails (`streamline`.`topology_test_run_case_source`, CONSTRAINT `topology_test_run_case_source_ibfk_2` FOREIGN KEY (`sourceId`) REFERENCES `topology_source` (`id`))
at com.hortonworks.streamline.storage.impl.jdbc.provider.sql.factory.AbstractQueryExecutor$QueryExecution.executeUpdate(AbstractQueryExecutor.java:223)
at com.hortonworks.streamline.storage.impl.jdbc.provider.sql.factory.AbstractQueryExecutor.executeUpdate(AbstractQueryExecutor.java:180)
at com.hortonworks.streamline.storage.impl.jdbc.provider.sql.factory.AbstractQueryExecutor.delete(AbstractQueryExecutor.java:85)
at com.hortonworks.streamline.storage.impl.jdbc.JdbcStorageManager.remove(JdbcStorageManager.java:73)
at com.hortonworks.streamline.storage.cache.writer.StorageWriteThrough.remove(StorageWriteThrough.java:41)
at com.hortonworks.streamline.storage.CacheBackedStorageManager.remove(CacheBackedStorageManager.java:60)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.removeTopologySource(StreamCatalogService.java:1284)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.removeTopologyDependencies(StreamCatalogService.java:446)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.removeTopology(StreamCatalogService.java:393)
... 60 common frames omitted
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot delete or update a parent row: a foreign key constraint fails (`streamline`.`topology_test_run_case_source`, CONSTRAINT `topology_test_run_case_source_ibfk_2` FOREIGN KEY (`sourceId`) REFERENCES `topology_source` (`id`))
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.Util.getInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:935)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3970)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3906)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073)
at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2009)
at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5098)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1994)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
at com.hortonworks.streamline.storage.impl.jdbc.provider.sql.factory.AbstractQueryExecutor$QueryExecution.executeUpdate(AbstractQueryExecutor.java:221)
After receiving message, the topology is not accessible because there're two CURRENT versions for topology id. Stack trace is below:
ERROR [06:51:57.210] [dw-2232 - GET /api/v1/catalog/topologies/381?detail=true&latencyTopN=3] c.h.s.s.s.TopologyCatalogResource - Got exception: [IllegalStateException] / message [More than one 'CURRENT' version for topology id: 381] / related resource location: [com.hortonworks.streamline.streams.service.TopologyCatalogResource.getTopologyById](TopologyCatalogResource.java:156)
java.lang.IllegalStateException: More than one 'CURRENT' version for topology id: 381
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.getCurrentTopologyVersionInfo(StreamCatalogService.java:224)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.getCurrentVersionId(StreamCatalogService.java:317)
at com.hortonworks.streamline.streams.service.TopologyCatalogResource.getTopologyById(TopologyCatalogResource.java:156)
...
While starting a server in secure mode, we create an admin role user but don't add metadata in it which breaks the UI.
We add it after running bootstrap.sh but if by any chance the end user forgets to run the script, we show a page conveying the same which doesn't come up right now due to missing metadata.
Currently ambari URL suggested text in the text box is not editable. It cannot also be copied to change it in a separate editor. Let's make that text editable.
The URL shown in the text box should have "ambari_host:port" and "CLUSTER_NAME" styled differently to help user understand what needs to be changed. Bold or slight color change, either works.
Also when I missed "clusters" in the URL, I just get a generic error, "Not a valid URL". Instead, we can throw an error that says, "Not a valid Ambari URL. Please follow the convention - http://ambari_host:port/api/v1/clusters/CLUSTER_NAME
Unlike its description, DRIVER CLASS NAME
in JDBC sink is in fact used for data source class name, not driver class name. Moreover, JDBC sink doesn't allow providing driver class name instead of data source class name.
Since some storage drivers don't support data source class (javax.sql.DataSource
) we just would be better to change the usage of DRIVER CLASS NAME
to driver class name, as the name is representing.
This change might be breaking change for someone since putting com.mysql.jdbc.jdbc2.optional.MysqlDataSource
or org.postgresql.ds.PGSimpleDataSource
will not work afterwards.
Right now we are letting user deploy an application that has a kafka sink without validating if topic has an associated schema in SR or not. We need to add a validation that prevents user from deploying if the schema does not exist.
Following up with #534 I will open an issue here to discuss flink support.
What API would flink need to implement? How much work needs to be done?
From STREAMLINE-769, we added topology security config to let users input principal and keytab for each cluster.
This approach just works, but given principal should be able to impersonate with multiple services like HDFS, HBase, Hive, and users need to know where the keytab is, which increases operational complexity.
Instead of doing that, we can just use Nimbus principal and keytab information for Storm. With this approach we doesn't need to expose principal and keytab path to the end users.
(operators just need to know so that they can set up nimbus principal to be able to access & impersonate for HDFS, HBase, Hive, and so on.)
Kafka Source has 'readerSchemaVersion' as optional
config key and its default value is set to null
.
But since Flux can't handle null
value and throws an error, the key is effectively required
.
Needs to change readerSchemaVersion
to be required
, or add a new constructor for AvroKafkaSpoutTranslator which excludes readerSchemaVersion
from parameters so that it can be handled as null
.
Moreover, we need to handle empty string
as null
for readerSchemaVersion
as well since it throws NumberFormatException.
After adding a manual service pool,
Need to check other services as well.
UI code most like passes value of clustersSecurityConfig in topology level config as [{}]. Thats causing an issue to deploy topology since we are not checking for empty map
For other sources/sinks, "artifacts" takes care of dependencies easily because it's clear which external storage the source/sink needs to connect.
But since JDBC sink should support JDBC-supported external storages, and storages can be MySQL, PostgreSQL, Oracle, Phoenix, and so on. We can't and shouldn't add all drivers of them to the artifact of JDBC sink bundle.
Instead of doing this, we can let users input the artifact of driver, and add it to dependency list of topology. It can provide flexibility of selecting driver for JDBC sink.
Having Docker support for Streamline would be convenient for both development and customers. Since a proper environment requires RDBMS, Storm, Kafka, etc it makes sense that most customers would have a docker-compose script to orchestrate deployment. The scope of this would be simply to provide the isolated image for Streamline. Would be best to use the Docker Maven plugin so that a profile can be created for building Docker images. This would allow people not interested in Docker to ignore it and not require the docker engine on their machine for building.
Implement Storm hint provider class
We need to know if the storm's cluster is in secure mode or not, cluster names list for clusters security configuration.
This is a subtask of #535
Create a flink specific topology actions implementation. This should implement the topology lifecycle actions like deploy, kill, status (the main ones). Other apis could be unsupported for now and added as needed.
Refer: StormTopologyActionsImpl
This is a subtask of #535
Streamline has support for rules executed via RuleProcessorRuntime. The design time (layout) representation of this is RulesProcessor.
This task is to translate and execute rule in flink by reusing the RuleProcessorRuntime.
StormMetadataResource does not use Subject.doAs when hitting storm ui rest api to get information about topologies. Hence a 401 Unauthorized response is received. @hmcl Any clue about this?
Running with the streamline-dev.yaml
file, I can get the site to be up and running but I get "Not authorized" errors and when I try to connect to a remote system running HDP 2.6, it just adds a new "Not authorized" error. Is there a public way to actually run in a dev environment via the current java self-hosting or docker?
This is a subtask of #535
Support for translating StreamlineSink for Kafka to the equivalent flink kafka sink.
Refer: KafkaBoltFluxComponent.java
Create a singleton LoginManager which will use JAAS config to login StreamlineServer principal when the server starts. This should populate and return the subject that can be used elsewhere as needed.
If user creates an environment by picking HBase/Hive from an HDP cluster added via ambari then HDFS from same cluster should automatically added by UI to the environment. Reason is HBase has hbase.rootdir that uses hdfs-site.xml core-site.xml to connect to HDFS. Without HDFS being present in environment storm topology fails at runtime throwing an exception from HBaseClient that it cannot connect to hdfs..
Comments from other issue ported here
@shahsank3t Can you or someone familiar with UI look in to this?
@HeartSaVioR I see that we are packaing hdfs-site.xml and core-site.xml with the topology jar if we have just HBase sink in the topology. Just curious about the logic used to pick hdfs-site and core-site if we have two hdfs services(both from different service pools) added to the same environment? Will it pick the right one?
@priyank5485
No we can't determine which is the right one, so the services which rely on configuration file like HDFS, HBase, and Hive should be selected only once.
Maybe better to prevent selecting same service multiple time in an environment until we support multi-clusters for all services properly. That may be an unnecesary limitation for other services, but safest way to avoid that issue.
We might also want to force selecting cluster for each component in order to provide corresponding configurations while deploying.
(Maybe also better to change some fields which are dependent on cluster to be read-only.)
cc. @harshach @shahsank3t
This allows the consumption of messages using a different version of schema than the one that was used to write data.
Create a service pool name aaa and Zzz. When you sort it by name, Zzz appears first. Sort by name should be case insensitive.
Streamline caches the REST API output for metrics, so clicking 'Refresh' sometimes doesn't update topology metrics. It doesn't force update of metrics.
We would be better to have "force topology update" when it is necessary. Maybe both backend and frontend changes are necessary.
Below stack trace is thrown while exporting topology with projection processor. Projection processor is not handled in TopologyExportVisitor
.
java.lang.RuntimeException: Unexpected exception thrown while trying to add the rule 8
at com.hortonworks.streamline.streams.catalog.topology.component.TopologyExportVisitor.visit(TopologyExportVisitor.java:101)
at com.hortonworks.streamline.streams.layout.component.impl.RulesProcessor.accept(RulesProcessor.java:99)
at com.hortonworks.streamline.streams.layout.component.TopologyDag.traverse(TopologyDag.java:180)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.doExportTopology(StreamCatalogService.java:575)
at com.hortonworks.streamline.streams.catalog.service.StreamCatalogService.exportTopology(StreamCatalogService.java:561)
at com.hortonworks.streamline.streams.service.TopologyCatalogResource.exportTopology(TopologyCatalogResource.java:581)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
While creating an environment that has either a Hive or HBase service from a service pool we need to add validation satisfying following conditions.
@sank
Let me know if you have any more questions regarding this
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.