msiemens / tinydb Goto Github PK
View Code? Open in Web Editor NEWTinyDB is a lightweight document oriented database optimized for your happiness :)
Home Page: https://tinydb.readthedocs.org
License: MIT License
TinyDB is a lightweight document oriented database optimized for your happiness :)
Home Page: https://tinydb.readthedocs.org
License: MIT License
Sorry, I am totally new with this. My issue is probably related with the documentation. The simple example at readthedocs (http://tinydb.readthedocs.org/en/latest/getting-started.html) does not work for me. Importing Query does not work because it does not exist:
from tinydb import TinyDB, Query
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'Query'
Maybe there is a discrepancy between the getting started documentation which might be for version 3.0.0 and the version I use (2.4) but this is not clear to me from the changelog.
If I want it to work I have to do this:
from tinydb import TinyDB, where
db = TinyDB('db.json')
db.insert({'type': 'apple', 'count': 7})
db.insert({'type': 'peach', 'count': 3})
fruit=where('type')
print(db.search(fruit=='peach'))
I am not sure whether that is the correct or efficient usage. I am Working under python 3.4 with tinydb 2.4 installed via the anaconda distribution of richardotis.
@msiemens I think you would like this extension module for TinyDB- tinyrecord. It basically adds atomic transaction support for TinyDB.
Hi, I am a newbie tinydb user. I have a problem/feature:
I'm running an thread that insert rows validating their oneness with "count(where(..."
The problem occurs when I remove the row or purge from another side (ie ipython): the count/search thinks that the row exists.
Some suggestion? Thanks!
It would be very useful if multiple entries could be inserted into a tinydb in one method call, by passing an array of dicts.
My preference would be to overload insert()
so you could write:
db.insert([{'x':1}, {'x': 2}])
One common use case would be serialising multiple JSON records consumed via RESTful APIs
Tried to clone the repository and got the following error:
error: unable to create symlink CONTRIBUTING.rst (File name too long)
Managed to work around it by cloning without checking out, and disabling symlinks:
git clone --no-checkout [email protected]:msiemens/tinydb.git
git config core.symlinks false
git checkout
Running OSX Yosemite.
In the docs found at https://tinydb.readthedocs.org/en/latest/usage.html, In the section concerning tables, there seems to have been a copy & paste gone wrong. Furthermore, purge_all()
and purge_tables()
are both listed to purge tables, which is confusing.
Not sure how hard this would be to implement, but it would be neat to have a safe way of updating an entry. Right now I am deleting and then inserting the new entry, which seems the only way to do this, but it is quite unsafe, for example if the program is terminated right between the write and delete steps the database may become corrupted.
If there is interest in this, I would be willing to put in some work, if someone can give me guidance. Does it fundamentally go against the philosophy, or would it make sense?
TinyDB version 2.0.1. My interpreter session:
from tinydb import TinyDB
db = TinyDB('my_existing_db.json')
len(db) # => 314
print db._last_id # => 99 (which is obviously wrong, as _get_next_id will return an existing id, messing with existing data)
Ids are actually sorted alphabetically.Table.init, so '99' is the last one. A workaround (albeit perhaps not the best one) is to replace line 191:
all_ids = sorted(self._read().keys())
by the following:
all_ids = sorted(self._read().keys(), key=int)
I find it strange I'm the first to notice this issue.
Hi all, remove "print("Okay!")" please
tinydb/tinydb/middlewares.py: line 171
code:
for field in item:
try:
if item[field].startswith(tag):
print('Okay!')
Like the title says. insert() returns the ID of the item inserted.
Maybe let update return some information about the items that where updated too?
My colleague had this problem today, trying to install through pip. My package, that he's trying to install, and which depends on tinydb, has the setup.py
requirement for tinydb>=2.2. Are we missing something silly here or is there a problem with the uploaded 2.3 version to pip? Thanks!
Searching for tinydb>=2.2
Reading https://pypi.python.org/simple/tinydb/
Best match: tinydb 2.3.0-b
Downloading https://pypi.python.org/packages/source/t/tinydb/tinydb-2.3.0-b.zip#md5=15f0720f29a7f673a4bec54cd5ec6350
Processing tinydb-2.3.0-b.zip
Writing /tmp/easy_install-udrz_6/tinydb-2.3.0/setup.cfg
Running tinydb-2.3.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-udrz_6/tinydb-2.3.0/egg-dist-tmp-HZLQA2
No eggs found in /tmp/easy_install-udrz_6/tinydb-2.3.0/egg-dist-tmp-HZLQA2 (setup script problem?)
error: Could not find required distribution tinydb>=2.2
I think it'd be useful to have an updating operation to append to an array element. I've tried this successfully:
def append(field, value):
"""
Append a given value to a given array field.
"""
def transform(element):
element[field].append(value)
return transform
Can be used as:
db.update(append("some_array_field","value_to_append"), eids=[1,2])
Perhaps it needs some try/catch in case the element is not an array and the value can't be appended.
I've been using tinydb
in a small prototype recently. I've noticed that most of my queries are performed against the _id
attribute of the rows (I cannot use a key value store here because I still need some querying functionality), and this does a full table scan if I'm not mistaken. What I suggest is that we store dataset in memory as a hash table?
I.e. the following:
>>> table.data
{1: {'_id': 1, 'some': 'text'}, 2: {'_id': 2, 'some': 'text'}}
Which means we can have a very fast O(1) function to retrieve a row by _id
, which is better than the standard table scan implementation, which is O(n) and worse, affected by _id
(generally, the greater the id, the more iterations required).
>>> table.get_by_id(1)
{'_id': 1, 'some': 'text'}
Suppose I have a database of books that stores titles, authors etc. Now If I have a book titles "Complex Analysis" I wish the query using "complex" should show up all the matches in the database in this case "COmplex Analysis". I know perhaps this is not the place to ask this but please help me.
This is a proposal for a much simpler Query model that can be used, which essentially enables us to do the following, even without any advanced metaclass or inheritance magic:
User = Query()
db.insert({'metadata': {'id': 1}})
db.find(User.metadata.id == 1)
Which is very SQLAlchemy-like and what one would expect, and I would say even more intuitive than the current style/model. The following is a simple implementation that only handles the very basic:
import operator
class Query(object):
def __init__(self, function=None):
self.function = function or (lambda k: k)
def __getattr__(self, attr):
return Query(operator.itemgetter(attr))
def __eq__(self, value):
return Query(lambda k: self.validate(k) == value)
def validate(self, value):
return self.function(value)
The advantages over the current style is perhaps that this is much more intuitive and simple. We don't need to have a new class for every special case. It is also simpler to generate the intermediate query expressions so that they can be cached, instead of the rather primitive or hacky string representations that we are having at the moment, which is error prone and needs us to worry about Unicode etc.
It would be nice to have a logo for TinyDB. @eugene-eeo I've seen that some of your projects have logos. Would you like to create one for TinyDB?
I'm just wondering if there is any way to search()
below the top level of data structures.
Say I have some data like:
{
'Employees': [
{'name': 'Dave', 'age': 42},
{'name': John', 'age': 39}
]
}
What is the best way to return an array of dicts where the condition employee['name'] == 'John'
is satisfied?
I guess I would like to be able to do something like:
table.search(where('Employees.name') == 'John')
Is this possible / desirable with TinyDB, or should this be handled outside the TinyDB API?
Cannot use: db.search(where('thing') == 'bike')
as stated in the manual at: http://tinydb.readthedocs.org/en/latest/getting-started.html
Error: NameError: name 'where' is not defined
version: PIP version of 20-10-14
When you try to insert a string, operation completes successfully. After that, ValueError: dictionary update sequence element #0 has length 1; 2 is required
is thrown at any operation.
It's not a real bug since it is explicitly said in the docs that TinyDB expects inserted element to be a set, but anyway it's not good that such destructive operation is allowed. Maybe an exception on inserting anything except set will be more appropriate?
The current JSONStorage class doesn't allow a way to store the file in pretty print.
I think you need to use iterators. It's faster than lists.
Attempting to insert a non-serializable python object results in an unusable tinydb. For example, doing:
from tinydb import TinyDB
db = TinyDB('tinydb.json')
db.insert({'bark'})
results in an exception, which is good, but it also corrupts the db, which now looks like:
{"_default": {"1":
This happens if the db also has other records.
Since JSON is the default serilization format for TinyDB there's the datetime problem:
>>> from tinydb import TinyDB
>>> from datetime import datetime
>>> db = TinyDB("db.json")
>>> db.insert({"date": datetime.now()})
...
TypeError: datetime.datetime(2015, 2, 21, 17, 24, 17, 828569) is not JSON serializable
Many other databases handle datetime conversion for the user, and I would very much like TinyDB to do the same (It's usually fixed by specifying a custom encoding and a corresponding decoder when reading from the database).
Do you think this is a good idea?
Hi, I'm trying to create a DB on a Debian VM.
This is the error that I get:
db = TinyDB('db.json')
Traceback (most recent call last):
File "", line 1, in
File "/home/lorenzo/rubygram/local/lib/python2.7/site-packages/tinydb/database.py", line 50, in init
self._table = self.table('_default')
File "/home/lorenzo/rubygram/local/lib/python2.7/site-packages/tinydb/database.py", line 79, in table
table = table_class(name, self, **options)
File "/home/lorenzo/rubygram/local/lib/python2.7/site-packages/tinydb/database.py", line 212, in init
old_ids = self._read().keys()
File "/home/lorenzo/rubygram/local/lib/python2.7/site-packages/tinydb/database.py", line 276, in _read
data[eid] = Element(raw_data[key], eid)
File "/home/lorenzo/rubygram/local/lib/python2.7/site-packages/tinydb/database.py", line 21, in init
self.update(value)
ValueError: dictionary update sequence element #0 has length 15; 2 is required
In the following code I open 2 DBs, db
and db2
and insert separate items in each. However, all 4 items get inserted in both DB's.
from tinydb import TinyDB
import os
# Make sure we're starting with clean slate db's
names = ['testdb.json', 'testdb2.json']
for name in names:
try:
os.remove(name)
except IOError:
pass
db = TinyDB('testdb.json')
db.insert({'int': 1, 'char': 'a'})
db.insert({'int': 1, 'char': 'b'})
db.insert({'int': 1, 'value': 5.0})
db2 = TinyDB('testdb2.json')
db2.insert({'color': 'blue', 'animal': 'turtle'})
print('db:\n%r' % db.all())
print('db2:\n%r' % db2.all())
Output:
db:
[{'char': 'a', '_id': 0, 'int': 1}, {'char': 'b', '_id': 1, 'int': 1}, {'_id': 2, 'value': 5.0, 'int': 1}, {'color': 'blue', '_id': 3, 'animal': 'turtle'}]
db2:
[{'char': 'a', '_id': 0, 'int': 1}, {'char': 'b', '_id': 1, 'int': 1}, {'_id': 2, 'value': 5.0, 'int': 1}, {'color': 'blue', '_id': 3, 'animal': 'turtle'}]
I'm looking to add new functionality to update
. I would like to add an increment
function first of all. I thought of something like this:
db.update(inc('int'), cond, eids)
And then change the update
's present functionality to set
, so:
db.update(set({'char': 'a'}), cond, eids)
Alternatively an interface such as this may be preferable:
db.update(cond, eids).inc('int')` and `db.update(cond, eids).set({'char': 'a'})
This change would allow the addition of new functions such as max
, min
, etc going forward.
I'm looking for thoughts/opinions and how best to structure this?
Hello,
I did not know how to get in touch so i thought might as well say it here. This is a very interesting project....i read the docs and will start playing with tinyDB. I would also love to start contributing (since the project is still "tiny" :P ) and i wanted to ask what are the plans/features for future versions and how i can help :)
I was really looking for a project where i can contribute towards a python-implemented database (hopefully will get some experience from this project) and write my own later :)
Sorry again for raising an issue for this
The id of the element will change to a unicode string after JSON serialization/deserialization. This causes no way to get the element by integer eid.
Python 2.7.6 (default, Sep 9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tinydb import TinyDB
>>> db=TinyDB('/tmp/test.json')
>>> db.insert({'foo':'bar'})
1
>>> db.all()
[{u'foo': u'bar'}]
>>> element = db.all()[0]
>>> element.eid
u'1'
>>> assert db.get(eid=1) is not None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
>>> assert db.get(eid='1') is not None
>>> db.update({'foo':'blah'}, eids=[1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/wolfg/.virtualenvs/opensource/lib/python2.7/site-packages/tinydb/database.py", line 335, in update
cond, eids)
File "/Users/wolfg/.virtualenvs/opensource/lib/python2.7/site-packages/tinydb/database.py", line 222, in process_elements
func(data, eid)
File "/Users/wolfg/.virtualenvs/opensource/lib/python2.7/site-packages/tinydb/database.py", line 334, in <lambda>
self.process_elements(lambda data, eid: data[eid].update(fields),
KeyError: 1
>>> db.update({'foo':'blah'}, eids=['1'])
>>> db.all()
[{u'foo': u'blah'}]
>>> db.contains(eids=[1])
False
>>> db.contains(eids=['1'])
True
When testing db field with unicode string, UnicodeEncodeError
exception is raised.
Line which causes the exception:
db.get(where('name') == u'žir')
Inserting unicode data went without problems:
db.insert({'name': 'žir'})
I have made quick hack which fixes problem for my little hobby project, but I will examine this problem more when I find time.
In queries.py
, I've changed Query._update_repr
function body to:
self._repr = u'\'{0}\' {1} {2}'.format(self._key, operator, value)
and Query.__hash__
to:
return hash(repr(unicode(self)))
Basically adding string preffix "u" in _update_repr
, and "unicode" call in __hash__
...
Using tinydb from git on python 2.7.6, ubuntu 14.04
This line: https://github.com/msiemens/tinydb/blob/master/tinydb/storages.py#L13 made me wonder, is https://github.com/micropython/micropython supported anyhow? (ujson is micropython's builtin json module, which offers subset of functionality).
I've run into an issue where searching gets slower over time. I know I may be pushing this TinyDB to it's performance limits. I have a database of around 10k dicts. Each Dict has five items.
So, Here is some psuedo code
class tiny_db():
def __init__(self):
self.db = TinyDB('db.json')
self.table = self.db.table('_default', smart_cache=True)
def file(self, item):
self.table.insert(item)
def locate(name)
result = self.table.search(where('name') == name)
return result
Every time I call tiny_db.locate(name)
it takes 1 second longer than it did the last time I called. it.
I am probably doing something horribly wrong, Im not entirely sure how the caching middleware works, my grasp of python is pretty weak.
My duct tape fix has been to copy the db into a variable when my class initializes. I'm using that to search. But, I have to insert new items into both. I feel like there may be a better solution.
Sorry for the amateurish question, but thanks in advance.
from tinydb import TinyDB, where
from tinydb.storages import JSONStorage, Storage
from tinydb.middlewares import CachingMiddleware, Middleware
db = TinyDB('db1.json', storage=CachingMiddleware(JSONStorage))
print db.all()
db.insert({'test': 1})
db.close()
print db.all()
Apologies if this is a stupid question, but I have not seen the answer in your docs.
A colleague has passed me a .json
file written out using TinyDB. I would like to read its contents. However, if I use
db = TinyDB('my_file.json')
then db
is empty, and the file on disk is overwritten (contains one line {"_default": {}}
). What's the correct way to read this file into a Python interpreter? I see no "load" method in the docs and tinydb.Storage.read
doesn't do this.
I'm looking for a way to update a value in a nest. At the moment, the value is either added in the root or the entire nest is replaced.
from tinydb import TinyDB
db = TinyDB('test.json')
db.purge()
# Initial data
thing_id = db.insert({'things': {'car': 'ford', 'bike': 'giant'}})
print db._read()
# Appends 'volvo' to root
db.update({'car': 'volvo'}, eids=str(thing_id))
print db._read()
# Replaces complete 'things' nest
db.update({'things': {'car': 'volvo'}}, eids=str(thing_id))
print db._read()
# Goal: {'things': {'car': 'volvo', 'bike': 'giant'}}
It would be great if tinydb were regularly released on www.anaconda.org . This would make things nicer for your downstream package authors so that we don't have to host tinydb in our own unofficial channels like https://anaconda.org/richardotis/tinydb/files since it won't always have the latest version. You can download one of those files and see an example of a simple recipe file.
I have an example workflow for uploading to anaconda.org on https://github.com/richardotis/pycalphad/blob/develop/RELEASING.rst#uploading-to-anacondaorg
It would be cool to add support for projections like in mongoDB!
https://docs.mongodb.org/v3.0/reference/glossary/#term-projection
Are you planning to do this?
Note to self: Rewrite the documentation to make it more beginner friendly.
Hi,
this code ( https://gist.github.com/dirkk0/407bcf506568e1cdf20b ) implements two sequencers, seq1 and seq2. seq1 looks every second at a database stack.json, and once an entry appears, it inserts the entry with a timestamp into a dabase process.json and removes the entry from the stack. seq2 looks at process.json and processes the entries once their time is come.
seqbad does exactly the same thing as seq1 but opens both databases once in the beginning (which should be the right way to do it). Now, if you add entries with add.py multiple times, the old entries reappear in process.json the more you add.
Best,
Dirk
Query.matches()
, which uses re.match
, searches from beginning of field value. I propose a new feature that mirrors python's re.search
and allows matching inside field value, not just a beginning.
Perhaps this could be done by additional parameter to QueryRegex (something like re_method
) to distinguish between Query.matches()
and Query.search
.
I can submit pull request if ready?
Using tinydb 2.3.2 in python 2.7.10 on mac OSX 10.10.4 I input a simple model db and then type:
db.all()
output is:
[{u'count': 7, u'fruit': u'apple'},
{u'count': 12, u'fruit': u'peach'},
{u'count': 5, u'fruit': u'banana'},
{u'count': 4, u'fruit': u'kiwi'},
{u'count': 3, u'fruit': u'mango'}]
Any idea why the letter 'u' is appearing?
I'm finding myself using custom test functions a lot, many of which are nearly identical, and writing the same lambda function over and over is getting tedious.
It'd be really handy if one could pass additional arguments to these functions, so that one could do this:
def case_insensitive_match(a, b):
return a.casefold() == b.casefold()
db.where('value').test(case_insensitive_match, my_val)
db.where('value').test(case_insensitive_match, my_other_val)
If you're keen on having this ability too, I'd be happy to code it up and submit a pull request.
P.S. Just discovered this library, and love it so far!
I seem to be getting multiple repeated _id
s when the db is opened and closed repeatedly, and each session only inserts one entry. Is this expected?
>>> TinyDB('db.json').insert({'x': 1})
>>> TinyDB('db.json').insert({'x': 1})
>>> TinyDB('db.json').all()
[{u'_id': 0, u'x': 1}, {u'_id': 0, u'x': 1}]
This is with 1.2.0 from pypi
$ pip freeze | grep tinydb
tinydb==1.2.0
This is an observation from prototyping a storage system similar in semantics to TinyDB's ConcurrencyMiddleware
- note that the current approach of TinyDB, although it's thread-safe, writes may have a chance of being discarded. Consider a hypothetical situation where I have a table with the ConcurrencyMiddleware
being used, i.e.:
s = ConcurrencyMiddleware(JSONStorage('file'))
db = TinyDB('data.json', storage=s).table('table')
If we spawn say, three threads, though it is guaranteed by the lock that their writes will be atomic, and their reads atomic as well, some writes may be discarded due to the way TinyDB works, i.e. in the insert function:
def insert(self, element):
"""
Insert a new element into the table.
"""
current_id = self._last_id + 1
self._last_id = current_id
data = self._read()
data[current_id] = element
self._write(data)
Setting the key on a new dictionary (reconstructed every time the _read()
function is called) is, though atomic, there may be multiple threads calling this function asynchronously and unless they happen to execute database calls serially, there is no way to guarantee that the previous write is acknowledged when the data is written to disk- that depends on which write request writes to the file last, a.k.a. race condition. Solving this problem requires the lock to be held on the entirety of the function call, i.e.
def insert(self, element):
with self.lock:
...
Unable to use TinyDB due to the following bug
> python
Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:52:43)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tinydb import TinyDB
>>> TinyDB("test")
Segmentation fault: 11
Same error in all versions of TinyDB.
Running OSX Yosemite.
If I remove an item from a tinydb database with json file back-end, the resulting json file will look fine at its head. The end of the file however is still occupied by the original contents. Further inserts into the table fail from then on. I.e. calling .all()
on the table returns an empty list.
A database with TinyDB(storage=MemoryStorage)
isn't affected by this behavior.
The issue was reproduced with Python versions 2.7.3
, 3.3.0
, and 3.4.0
from tinydb import TinyDB, where
db = TinyDB('test.json')
item = {'name':'A very long entry'}
item2 = {'name':'A short one'}
db.insert(item)
assert(db.get(where('name')=='A very long entry') == item)
# test.json contains:
# {"_default": [{"name": "A very long entry", "_id": 0}]}
db.remove(where('name') == 'A very long entry')
assert(db.get(where('name')=='A very long entry') == None)
# test.json contains:
# {"_default": []}name": "A very long entry", "_id": 0}]}
db.insert(item2)
assert(db.get(where('name')=='A short one') == item2)
# contents of test.json after AssertionError:
# {"_default": [{"name": "A short one", "_id": 1}]}: 0}]}
db.remove(where('name') == 'A short one')
assert(db.get(where('name')=='A short one') == None)
In #14 we changed the way tinydb stores data. As a result one can't open tables from v1.4 and prior with the recent version of tinydb. How can we make it possible to upgrade to the most recent scheme? I'd suggest something like
>>> db = TinyDB('some-file')
UserWarning: Database format is outdated. Please use db.upgrade() to migrate to the recent version.
>>> db.upgrade()
I have a CRUD application where the data needs to be delete within certain time based on the day entered, for that I'm using a cron job.
It is OK to have the main process and the external task accessing the file database without issues?
I ask because it is referred in the documentation on "Why Not Use TinyDB?" that "You need advanced features like access from multiple processes, indexes, a HTTP server, relationships or similar."
This scenario counts as multiple processes?
I can't do something like this:
db.search(where('field') in ['value1', 'value2', ...])
????????
Not sure if this is a feature request or a documentation issue, but it would be great to have a facility where you can ask the database for it's structure, sort of like asking for the columns of a table in SQL with DESCRIBE name_of_table; or calling keys() on a python dictionary.
In general, I'm looking for guidance on how to examine a TinyDB database when you're not quite sure what it contains.
Not sure, if this is even worth opening an issue, but im doing it anyway:
There are several lines that say
raise NotImplementedError('To be overriden!')
which should be spelled with double d
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.