Comments (12)
Tests would be implemented by mocking out the Connection class so that no actual sockets need to be opened.
from happybase.
I have a prototype solution working. I'm going to battle test it on production for a week before I come with an official patch.
import time
import random
import contextlib
import happybase
from socketpool import ConnectionPool
from socketpool.conn import TcpConnector
class HappybaseConnectionPool(object):
''' singleton to share a connection pool per process '''
pool = None
_instance = None
def __new__(cls, *args, **kwargs):
if not cls._instance:
cls._instance = super(HappybaseConnectionPool, cls).__new__(cls, *args, **kwargs)
return cls._instance
def __init__(self, host, **options):
if not self.pool:
options['host'] = host
self.pool = ConnectionPool(
factory=HappybaseConnector,
max_size=options.get('max_size', 10),
options=options,
)
def connection(self, **options):
return self.pool.connection(**options)
@contextlib.contextmanager
def table(self, table_name):
with self.pool.connection() as connector:
yield connector.table(table_name)
class HappybaseConnector(TcpConnector):
def __init__(self, host, port, pool=None, **kwargs):
self.host = host
self.port = port
self.connection = happybase.Connection(self.host, self.port)
self._connected = True
# use a 'jiggle' value to make sure there is some
# randomization to expiry, to avoid many conns expiring very
# closely together.
self._life = time.time() - random.randint(0, 10)
self._pool = pool
self.logging = kwargs.get('logging')
def is_connected(self):
if self._connected and self.connection.transport.isOpen():
try:
# isOpen is unreliable, actually try to do something
self.connection.tables()
return True
except:
pass
return False
def handle_exception(self, exception):
if self.logging:
self.logging.error(exception)
else:
print exception
def invalidate(self):
self.connection.close()
self._connected = False
self._life = -1
def open(self):
pass
def close(self):
self.release()
def __getattr__(self, name):
if name in ['table', 'tables', 'create_table', 'delete_table',
'enable_table', 'disable_table', 'is_table_enabled', 'compact_table']:
return getattr(self.connection, name)
else:
raise AttributeError(name)
You use it like this:
pool = HappybaseConnectionPool('localhost', '9090')
with pool.connection() as connection:
connection.create_table('foobar')
from happybase.
shouldn't this support multiple thrift servers? pycassa has support for that.
from happybase.
I'm hitting a bunch of Thrift instances behind a load balancer, which I think makes sense to run externally. If we did load balancing in process, it would mean implementing options like round-robin, least connection, etc. Not sure how you would deal with least-connection between various python processes; they would all be keeping their own connection counts, exclusive of each other.
I think it's better left to an external load balancer.
from happybase.
well long term you could have it aware of regionserver splits for performance. Netflix has a cassandra client that does this
http://techblog.netflix.com/2012/01/announcing-astyanax.html
from happybase.
I think I agree with Chase. Connection pooling is hard, and it adds quite a bit of complexity. Other solutions like load balancers are actually designed to handle this problem on a network level (instead of a process level).
from happybase.
I actually had a go at this since it also seems the way to go for multi-threading support. I've pushed my current code to a feature branch, which can be seen here: https://github.com/wbolster/happybase/commits/connection-pool
Copy/paste from the (w-i-p) docs:
Thread-safe connection pool.
A connection pool allows multiple threads to share connections. The
`size` parameter specifies how many connections this pool manages.
The pool is lazy; it opens new connections when requested.
To ensure that connections are actually returned to the pool after
use, connections can only be obtained using Python's context manager
protocol, i.e. the ``with`` statement. Example::
pool = ConnectionPool(size=3, host='...')
with pool.connection() as connection:
print(connection.tables())
When a thread asks for a connection using
:py:meth:`ConnectionPool.connection`, it is granted a lease, during
which the thread has exclusive access to the obtained connection. To
avoid starvation, connections should be returned as quickly as
possible. In practice this means that the amount of code included
inside the ``with`` block should be kept to an absolute minimum.
The connection pool is designed so that any thread can hold at most
one connection at a time. This does not require any coordination
from the application: when a thread holds a connection and asks for
a connection for a second time (e.g. because a called function also
wants to use a connection), the same connection instance it already
holds is returned. Ultimately, once the outer ``with`` block (which
may be in a function up in the call stack) terminates, the
connection is returned to the pool.
Additional keyword arguments are passed unmodified to the
:py:class:`happybase.Connection` constructor, with the exception of
the `autoconnect` argument, since maintaining connections is the
task of the pool.
:param int size: the maximum number of concurrently open connections
:param kwargs: keyword arguments passed to
:py:class:`happybase.Connection`
What do you think? I'd appreciate comments/flames/feedback!
from happybase.
Looks good to me. Probably makes more sense than including a dependency. It would be cool if there was a way of getting a single pool object w/o passing it around everywhere. That's what I'm using a singleton for; but I suppose you could always layer that on top of what you have.
from happybase.
Okay, I have landed a Connection Pool implementation in the master branch. Please try it out. Comments on the design and API are most welcome.
See the API docs at https://happybase.readthedocs.org/en/latest/api.html#happybase.ConnectionPool for more information and example usage.
I'm leaving this ticket open since I need to refactor the tutorial/user guide to incorporate some information on the connection pool.
from happybase.
Fwiw, the feature branch is gone now that this feature has landed on master. I'll need to expand the docs (working on it already) before I consider this issue closed.
I'll also cook up a 0.5 release soonish with this feature and some other unreleased enhancements from the master branch.
from happybase.
Oh, I forgot to mention that I have (privately) received positive test reports about the connection pool, so I have confidence the current implementation is ready for public release. :-)
from happybase.
HappyBase 0.5 is out! https://twitter.com/wbolster/status/338034468780662784
from happybase.
Related Issues (20)
- How does happyhbase fuzzy query rowkey HOT 3
- Support Pre-Split when create table ?
- HBase remotely connecting to python project HOT 7
- Support gevent?
- suppot count table rows? HOT 1
- org.apache.hadoop.hbase.NamespaceNotFoundException: HOT 4
- [feature] Support reconnect host when connectionpool raise error HOT 2
- thriftpy2.protocol.exc.TProtocolException: Bad protocol id in the message: 72 HOT 4
- happybase 1.2.0 supports hbase 2.2.5 ? HOT 1
- How can I do a query for specific columns by regex-statement HOT 2
- TTransportException: TTransportException(type=4, message='TSocket read 0 bytes') HOT 4
- TypeError: __str__ returned non-string (type bytes) hbase HOT 6
- import happybase error HOT 1
- Does it support snapshot management? HOT 1
- Is there a way to update / modify TTL (time to leave) on already created table ? HOT 1
- Hbase compatibility
- Table Put - How Do We Assign And Use A Variable For the 'Row Key' HOT 8
- compact_table major no work
- 使用happybase创建连接池并扫描整张表一段时间以后报错 HOT 5
- Support for 'cacheBlocks' in TScan HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from happybase.