Comments (3)
The Batch.send()
method is just the one that actually sends the data to the Thrift server, hence it takes the most time (but it is I/O bound). With your approach all ingestion into HBase happens from a single Thrift server, which in turn has to communicate with all the region servers to actually store the data (which in turn trigger HDFS replication traffic). This is obviously slower than ingesting bulk data from within the cluster itself, as is the case with the MapReduce job, since in that case the work will be actually distributed over the cluster.
Have you tried running multiple Thrift servers and using multiple connections, i.e. multiple HappyBase Connection
instances to different remote hosts? Of course the splitting/chunking would move to your Python code in that case. The multiprocessing
module (especially .imap_unordered()
with an appropriate chunk size) may help you here.
from happybase.
Thank you very much for your help. Your suggestion works. It performs much better and takes ~ 15 minutes to write all the data with multiple Thrift servers.
Another optimization we can try is to pre-create the regions (reference: http://hbase.apache.org/book/perf.writing.html). I didn't find any API in happybase for creating regions, so I used HBase shell instead. Then it only takes less than 4 minutes to write all the data.
Your work makes our life much easier. Thanks again.
from happybase.
Great, glad to hear. As far as I know there is no way to create regions using the Thrift API, so it seems you'll have to use the HBase shell for that.
from happybase.
Related Issues (20)
- Fix simple typo: specifed, -> specified,
- How does happyhbase fuzzy query rowkey HOT 3
- Support Pre-Split when create table ?
- HBase remotely connecting to python project HOT 7
- Support gevent?
- suppot count table rows? HOT 1
- org.apache.hadoop.hbase.NamespaceNotFoundException: HOT 4
- [feature] Support reconnect host when connectionpool raise error HOT 2
- thriftpy2.protocol.exc.TProtocolException: Bad protocol id in the message: 72 HOT 4
- happybase 1.2.0 supports hbase 2.2.5 ? HOT 1
- How can I do a query for specific columns by regex-statement HOT 2
- TTransportException: TTransportException(type=4, message='TSocket read 0 bytes') HOT 4
- TypeError: __str__ returned non-string (type bytes) hbase HOT 6
- import happybase error HOT 1
- Does it support snapshot management? HOT 1
- Is there a way to update / modify TTL (time to leave) on already created table ? HOT 1
- Hbase compatibility
- Table Put - How Do We Assign And Use A Variable For the 'Row Key' HOT 8
- compact_table major no work
- 使用happybase创建连接池并扫描整张表一段时间以后报错 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from happybase.