boramalper / pydis Goto Github PK
View Code? Open in Web Editor NEWA redis clone in Python 3 to disprove some falsehoods about performance.
License: ISC License
A redis clone in Python 3 to disprove some falsehoods about performance.
License: ISC License
According to the Redis docs:
MSET is atomic, so all given keys are set at once. It is not possible for clients to see that some of the keys were updated while others are unchanged.
Is this true in Pydis?
This is a great project!
Would like to see pydis complied with Nuitka, https://nuitka.net/, and re-test the performance.
On line Line 46 you use hiredis
to read the data. The Python hiredis package is just a thin wrapper around the C code. So at least part of your benchmarks are not coming from interpreted python.
Excellent work! Just wanted to mention, if straight-line performance is the goal, it may be worth testing a non-async version of the code. Most operations are going to be gated on the dictionary
backing the application, so unless idle connections are part of the benchmark, the async component might be introducing more overhead than strictly necessary. What do you think?
Perhaps the original author tested this, but the pattern of:
deque = self.dictionary.get(key, collections.deque())
# process command relative to the deque
self.dictionary[key] = deque
Results in the creation of a deque with every command regardless whether or not the key is present in the dictionary. It may be faster to initialize self.dictionary with a defaultdict like so:
self.dictonary = collections.defaultdict(collections.deque)
which then simplifies the previous code to:
deque = self.dictionary[key]
which also eliminates the need to re-store the deque into self.dictionary. While the logic in defaultdict to decide whether or not call the factory might add some time, I suspect that it will be overshadowed by the time saved from the object creation/deletion and additional dict lookup.
Even though the supported subset of commands is very limited, there are many cases where pydis can't get the results right, which makes it more like a toy that can't do much other than benchmarks.
For example,
127.0.0.1:7878> SET a 1
OK
127.0.0.1:7878> INCR a
Error: Server closed the connection
because it only converts str
while it is stored as bytes
File "pydis.py", line 133, in incr
value += 1
TypeError: can't concat int to bytes
After fixing that one, the expiration doesn't really work...
127.0.0.1:7878> SET a 1 EX 1
OK
[wait a few seconds here]
127.0.0.1:7878> INCR a
(integer) 2
I don't know if you'd like to fix those since the goal is to "disprove some falsehoods about performance." If not, at least put out a warning like "don't take it serious" also I'm not interested in the performance toll of making extra checks for expiration, etc. but that should be taken into account when doing benchmarks too.
This is a good data point, a lot of our beliefs about software are based on "educated guesses" (translation we just have no clue and make everything up). Thank you for making this, it's really very interesting! I hope to learn more from it too.
redis is 100,000 lines of .c code plus 50,000 lines of deps (jemalloc mostly, and lua, and then linenoise which is tiny). It runs (roughly) 2x the speed of pydis.
pydis is 250 lines of .py which is very impressive.. but it runs on top of python which is 400,000 lines of .c code and 777,460 lines of .py
I would like to see the kind of performance a golang implementation in roughly 250 lines would get (because it's high level like python, but it's also compiled so it might be very fast). How close to 1.0x performance might it achieve?
We should benchmark the memory usage as well, since it's an undoubtedly very important metric for an in-memory database.
Please add this change to support Windows:
#comment out the initial import:
#import uvloop
def main() -> int:
print("Hello, World!")
# detect what platform
if sys.platform == 'win32':
loop = asyncio.ProactorEventLoop()
# loop = asyncio.DefaultEventLoopPolicy() does not work
asyncio.set_event_loop(loop)
else:
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
loop = asyncio.get_event_loop()
Hi,
Can you check my redis clone (pure py3 too)
https://github.com/manatlan/redys
I'd like to hear benchmarks : redys vs yours
Hello. I found this project very interesting, Reading the README it says
The aim of this exercise is to prove that interpreted languages can be just as fast as C.
I'm surprised to read that though, because I had learned that interpreters would always be slower than compiler because of the interpreter overhead. I would be very excited to see this project reach close to 1.0x performance but I'm curious why you believe the interpreter overhead would not hold it back?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.