Comments (7)
@lakeofsand I believe this enhancement proposal is now obsolete given that we have the JDBC Sink Connector that can do this directly. Feel free to reopen if you are talking about something other than the thrift server for Spark
from kafka-connect-hdfs.
It is not exactly same with "JDBC Sink connector".
In "HDFS sink connector", we also need a hive metastore service for sync-with-hive when new partition's data come in.
It need support to sync-with-hive with spark thrift server,not hive metastore service.
from kafka-connect-hdfs.
@lakeofsand the spark thrift server is akin to the hiveserver2 implementation and as such has no state to sync http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server
I'm not sure what the current implementation is lacking, but if you can lay out an example then that would be helpful.
from kafka-connect-hdfs.
Sorry for my poor explanation...
Let say in this way:
Now 'kafka-connect-hdfs' use class 'HiveMetastore' to do hive actions, for example add partitions when new data come in. It relys on 'org.apache.hadoop.hive.metastore.*',and need a hive metastore service in the cluster.
In our spark 1.6 cluster, there is no hive metastore service. We need deploy a new one just for 'kafka-connect-hdfs'. That's unworthy and heavily.
So we add a thin implement 'Hive2Thrift' just upon "java.sql.", it can do same thing,but only need include standard 'java.sql.', and a spark thrift server.
I am not a expect,but in our spark cluster,really unworthy to deploy a heavily hivemestore service.
from kafka-connect-hdfs.
@lakeofsand so are you suggesting an architectural change here to remove the HiveMetastore dependency of the connector for those HDFS instances that have no Hive service associated with them? I'll reopen this but I think we need more details here because that's a pretty non-trivial change.
from kafka-connect-hdfs.
Maybe no need an 'architectural change'.
In our local implement ,we just extend a class named 'ThriftUtil' from HiveUtil(io.confluent.connect.hdfs.hive),like:
public class ThriftUtil extends HiveUtil {
...
@OverRide
public void createTable(String database, String tableName, Schema schema, Partitioner partitioner) throws Hive2ThriftException {
StringBuilder createDBDDL = new StringBuilder();
String createTableDDL;
createDBDDL.append("CREATE DATABASE IF NOT EXISTS ").append(database);
hive2Thrift.excute(createDBDDL.toString());
createTableDDL = getCreateTableDDL(database,tableName, schema, partitioner,this.lifeCycle);
log.debug("create table ddl {}",createTableDDL);
hive2Thrift.excute(createTableDDL);
}
...
}
from kafka-connect-hdfs.
But i can't find a appropriate way to override 'alterSchema()'
from kafka-connect-hdfs.
Related Issues (20)
- using wrong user/keytab while there are multiple hdfs-sink connections HOT 1
- template file isn't committed and uploaded to storage when using AvroFormat
- java.util.ConcurrentModificationException during task rebalancing HOT 1
- log4j update schedule HOT 1
- Hive table does not match column names present in the parquet data
- Exception when reading Decimal types written by connector
- Hive Merge Feature
- Incremental Co-operative Rebalancing Support for HDFS Connector
- Error after install and unistall connect-transforms
- Adding Hive partition threw unexpected error
- HDFS2 connect compatibility with HDFS3 server
- CVE-2021-34538 HIGH vulnerability HOT 2
- Task is being killed and will not recover until manually restarted
- Allow to limit retry write errors by timeout
- Kafka Issue while running on docker and adding new connector HOT 1
- can't build because repo conjars is down
- multiple keytab kerberos issue HOT 1
- OzoneFileSystem
- Non-resolvable parent POM io.confluent:common:[7.7.0, 7.7.1) for io.confluent:kafka-connect-storage-common-parent:11.2.9
- [2024-05-30 10:25:31,403] ERROR [hdfs3_sink-test_v4|task-0] WorkerSinkTask{id=hdfs3_sink-test_v4-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:237) java.lang.NullPointerException: Cannot invoke "io.confluent.connect.hdfs3.DataWriter.open(java.util.Collection)" because "this.hdfsWriter" is null
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-hdfs.