arangodb-community / pyarango Goto Github PK

View Code? Open in Web Editor NEW

237.0 18.0 90.0 1.08 MB

Python Driver for ArangoDB with built-in validation

Home Page: https://pyarango.readthedocs.io/en/latest/

License: Apache License 2.0

Python 92.79% Makefile 3.13% Batchfile 3.10% Shell 0.98%

python database nosql nosql-databases nosql-database nosql-data-storage arangodb arango graph-database

pyarango's Introduction

pyArango

NoSQL is really cool, but in this harsh world it is impossible to live without field validation.

WARNING: The last versions of pyArango are only compatible with ArangoDB 3.X. For the old version checkout the branch ArangoDBV2

Key Features

pyArango is geared toward the developer. It's here to help to you develop really cool apps using ArangoDB, really fast.

Light and simple interface

Built-in validation of fields on setting or on saving

Support for all index types

Supports graphs, traversals and all types of queries

Caching of documents with Insertions and Lookups in O(1)

Collections are treated as types that apply to the documents within. That means you can define a Collection and then create instances of this Collection in several databases. The same goes for graphs.

In other words, you can have two databases, cache_db and real_db, each of them with an instance of a Users Collection. You can then be assured that documents of both collections will be subjected to the same validation rules. Ain't that cool?

You can be 100% permissive or enforce schemas and validate fields on set, on save or both.

Installation

Supports python 2.7 and 3.5.

From PyPi:

pip install pyArango

For the latest version:

git clone https://github.com/tariqdaouda/pyArango.git
cd pyArango
python setup.py develop

Full documentation

This is the quickstart guide; you can find the full documentation here.

Initialization and document saving

from pyArango.connection import *

conn = Connection()

conn.createDatabase(name="test_db")
db = conn["test_db"] # all databases are loaded automatically into the connection and are accessible in this fashion
collection = db.createCollection(name="users") # all collections are also loaded automatically

# collection.delete() # self explanatory

for i in xrange(100):
    doc = collection.createDocument()
    doc["name"] = "Tesla-%d" % i
    doc["number"] = i
    doc["species"] = "human"
    doc.save()

doc = collection.createDocument()
doc["name"] = "Tesla-101"
doc["number"] = 101
doc["species"] = "human"

doc["name"] = "Simba"
# doc.save() # overwrites the document
doc.patch() # updates the modified field
doc.delete()

Queries : AQL

aql = "FOR c IN users FILTER c.name == @name LIMIT 10 RETURN c"
bindVars = {'name': 'Tesla-3'}
# by setting rawResults to True you'll get dictionaries instead of Document objects, useful if you want to result to set of fields for example
queryResult = db.AQLQuery(aql, rawResults=False, batchSize=1, bindVars=bindVars)
document = queryResult[0]

Queries : Simple queries by example

PyArango supports all types of simple queries (see collection.py for the full list). Here's an example query:

example = {'species': "human"}
query = collection.fetchByExample(example, batchSize=20, count=True)
print query.count # print the total number or documents

Queries : Batches

for e in query :
    print e['name']

Defining a Collection and field/schema Validation

PyArango allows you to implement your own field validation. Validators are simple objects deriving from classes that inherit from Validator and implement a validate() method:

import pyArango.collection as COL
import pyArango.validation as VAL
from pyArango.theExceptions import ValidationError
import types

class String_val(VAL.Validator):
 def validate(self, value):
     if type(value) is not types.StringType :
         raise ValidationError("Field value must be a string")
     return True

class Humans(COL.Collection):

    _validation = {
        'on_save': False,
        'on_set': False,
        'allow_foreign_fields': True  # allow fields that are not part of the schema
    }

    _fields = {
        'name': COL.Field(validators=[VAL.NotNull(), String_val()]),
        'anything': COL.Field(),
        'species': COL.Field(validators=[VAL.NotNull(), VAL.Length(5, 15), String_val()])
    }

collection = db.createCollection('Humans')

In addition, you can also define collection properties (creation arguments for ArangoDB) right inside the definition:

class Humans(COL.Collection):

  _properties = {
      "keyOptions" : {
          "allowUserKeys": False,
          "type": "autoincrement",
          "increment": 1,
          "offset": 0,
      }
  }

    _validation = {
        'on_save': False,
        'on_set': False,
        'allow_foreign_fields': True  # allow fields that are not part of the schema
    }

    _fields = {
        'name': COL.Field(validators=[VAL.NotNull(), String_val()]),
        'anything': COL.Field(),
        'species': COL.Field(validators=[VAL.NotNull(), VAL.Length(5, 15), String_val()])
    }

A note on inheritence

There is no inheritance of the "_validation" and "_fields" dictionaries. If a class does not fully define its own, the defaults will be automatically assigned to any missing value.

Creating Edges

from pyArango.collection import Edges

class Connections(Edges):

    _validation = {
        'on_save': False,
        'on_set': False,
        'allow_foreign_fields': True # allow fields that are not part of the schema
    }

    _fields = {
        'length': Field(NotNull=True),
    }

Linking Documents with Edges

from pyArango.collection import *

class Things(Collection):
  ....

class Connections(Edges):
  ....

....
a = myThings.createDocument()
b = myThings.createDocument()

conn = myConnections.createEdge()

conn.links(a, b)
conn["someField"] = 35
conn.save() # once an edge links documents, save() and patch() can be used as with any other Document object

Geting Edges linked to a vertex

You can do it either from a Document or an Edges collection:

# in edges
myDocument.getInEdges(myConnections)
myConnections.getInEdges(myDocument)

# out edges
myDocument.getOutEdges(myConnections)
myConnections.getOutEdges(myDocument)

# both
myDocument.getEdges(myConnections)
myConnections.getEdges(myDocument)

# you can also of ask for the raw json with
myDocument.getInEdges(myConnections, rawResults=True)
# otherwise Document objects are retuned in a list

Creating a Graph

By using the graph interface you ensure for example that, whenever you delete a document, all the edges linking to that document are also deleted:

from pyArango.collection import Collection, Field
from pyArango.graph import Graph, EdgeDefinition

class Humans(Collection):
    _fields = {
        "name": Field()
    }

class Friend(Edges): # theGraphtheGraph
    _fields = {
        "lifetime": Field()
    }

# Here's how you define a graph
class MyGraph(Graph) :
    _edgeDefinitions = [EdgeDefinition("Friend", fromCollections=["Humans"], toCollections=["Humans"])]
    _orphanedCollections = []

# create the collections (do this only if they don't already exist in the database)
self.db.createCollection("Humans")
self.db.createCollection("Friend")
# same for the graph
theGraph = self.db.createGraph("MyGraph")

# creating some documents
h1 = theGraph.createVertex('Humans', {"name": "simba"})
h2 = theGraph.createVertex('Humans', {"name": "simba2"})

# linking them
theGraph.link('Friend', h1, h2, {"lifetime": "eternal"})

# deleting one of them along with the edge
theGraph.deleteVertex(h2)

Creating a Satellite Graph -----------------

If you want to benefit from the advantages of satellite graphs, you can also create them of course. Please read the official ArangoDB Documentation for further technical information.

from pyArango.connection import *
from pyArango.collection import Collection, Edges, Field
from pyArango.graph import Graph, EdgeDefinition

databaseName = "satellite_graph_db"

conn = Connection()

# Cleanup (if needed)
try:
    conn.createDatabase(name=databaseName)
except Exception:
    pass

# Select our "satellite_graph_db" database
db = conn[databaseName] # all databases are loaded automatically into the connection and are accessible in this fashion

# Define our vertex to use
class Humans(Collection):
    _fields = {
        "name": Field()
    }

# Define our edge to use
class Friend(Edges):
    _fields = {
        "lifetime": Field()
    }

# Here's how you define a Satellite Graph
class MySatelliteGraph(Graph) :
    _edgeDefinitions = [EdgeDefinition("Friend", fromCollections=["Humans"], toCollections=["Humans"])]
    _orphanedCollections = []

theSatelliteGraph = db.createSatelliteGraph("MySatelliteGraph")

Document Cache

pyArango collections have a caching system for documents that performs insertions and retrievals in O(1):

# create a cache a of 1500 documents for collection humans
humans.activateCache(1500)

# disable the cache
humans.deactivateCache()

Statsd Reporting

pyArango can optionally report query times to a statsd server for statistical evaluation:

import statsd from pyArango.connection import Connection statsdclient = statsd.StatsClient(os.environ.get('STATSD_HOST'), int(os.environ.get('STATSD_PORT'))) conn = Connection('http://127.0.0.1:8529', 'root', 'opensesame', statsdClient = statsdclient, reportFileName = '/tmp/queries.log')

It's intended to be used in a two phase way: (we assume you're using bind values - right?)

First run, which will trigger all usecases. You create the connection by specifying statsdHost, statsdPort and reportFileName. reportFilename will be filled with your queries paired with your hash identifiers. It's reported to statsd as 'pyArango<hash>'. Later on you can use this digest to identify your queries to the gauges.
On subsequent runs you only specify statsdHost and statsdPort; only the request times are reported to statsd.

Examples

More examples can be found in the examples directory. To try them out change the connection strings according to your local setup.

Debian Dependency Graph

If you are on a Debian / Ubuntu you can install packages with automatic dependency resolution. In the end this is a graph. This example parses Debian package files using the deb_pkg_tools, and will then create vertices and edges from packages and their relations.

Use examples/debiangraph.py to install it, or examples/fetchDebianDependencyGraph.py to browse it as an ascii tree.

You can create the ArangoDB SocialGraph using examples/createSocialGraph.py. It resemples The original ArangoDB Javascript implementation: in python.

pyarango's People

Contributors

Stargazers

Watchers

Forkers

morsdatum mark-eastwood markbuckley gemius ppiccolo kidaa30 dothebart cabalamat kashifpk agatatalita srozb robert-wright vijayvishy mvidalv brennv iferminm pombredanne dmariassy jonathanseguin keyflow chrmorais bosskeyproductions splendido uchuugaka efazati epalese tfindlay-au nrgwsth frcmail natmash suviano logan169 rlshuhart hkarpf nawfaltachfine smokrow hinike cisco-ie sunghyeok93 ian-gallmeister youfeng243 jedmitten komal-skynet dilipvamsi thekie beyondliangcai duriantang backendtea jangocheng jarvisav lipper trueblue-zone mecolosimo snmpbuddy neildutoit13 orion6194 the-alchemist gavingx trevordimartino aernesto perryyo markuspf boxx1483 rkucsora alexyongsu rsolano supytalp donqueso89 hkernbach gitcarbs jsteemann schintalapudi cunzhang521 tiagoooliveira yutiansut nsa32752 nisarga-ashok hmprt bitnom aak1983 cpurules cristiansteib radifar piotrpiatyszek redreamality pedramardakani joeltgray larsborn

pyarango's Issues

GH docs

The GH landing page doesn't explain what parameters to pass to conn

conn = Connection()

should be:
conn = Connection(arangoURL="local_base_url", username="user", password="password")

this would make it easier to get started comparing this library to others! Thanks for the great work.

Unable to create user-defined "_key"

Hello, I have a JSON sample like

sample = [{
        "_key": "xxxxxxxxx", 
        "body_len": 4, 
        "date": "Mon, 14 May 2001 16", 
        "mid": "18782981.1075855378110", 
        "subject": "", 
        "type": "receiver"
    },
 ...
...
]

But when I save the document into Arango, the _key did not get passed, but all other attributes got passed, and _key in the Arango shown as random-generated numbers.

My Python codes are listed below:

from pyArango.connection import *
from pyArango.collection import *
import json

conn = Connection(username = "root", password = "")
db = conn["test_db"]
coll = db.createCollection(name = "test")
for each in s:
    coll.createDocument(each).save()

I still don't know what happened. Any help will be thankful!

pip install pyArango installed 1.0.4 not 1.1.0

We need the pyArango client to support both database versions v2.8.9 and v3.

We are still using ArangoDB v2.8.9 and a conversion to v3 cannot be scheduled soon.

Note that the arangojs client (https://github.com/arangodb/arangojs) currently allows the selection of the DB version via an arangoVersion property. It would be most helpful if pyArango could do the same.

var db = new arangojs.Database({url: arangoUrl, arangoVersion: 20800});

We would appreciate any help on making this a priority.

Thanks,

Rick

Example error

I've tried to execute a code from example, and got error:

collection = db.createCollection(name = "users") #all collections are also loaded automatically
  File "/root/venv/lib/python3.5/site-packages/pyArango/database.py", line 97, in createCollection
    if colArgs['name'] in self.collections :
  File "/root/venv/lib/python3.5/site-packages/pyArango/database.py", line 239, in __getattr__
    Database.__init__(self, connection, name)
  File "/root/venv/lib/python3.5/site-packages/pyArango/database.py", line 34, in __init__
    self.reload()
  File "/root/venv/lib/python3.5/site-packages/pyArango/database.py", line 81, in reload
    self.reloadGraphs()
  File "/root/venv/lib/python3.5/site-packages/pyArango/database.py", line 76, in reloadGraphs
    raise UpdateError(data["errorMessage"], data)
pyArango.theExceptions.UpdateError: collection not found (_graphs). Errors: {'errorMessage': 'collection not found (_graphs)', 'errorNum': 1203, 'code': 404, 'error': True}

arangodb version 3.1.9

simplify tests

Simplify tests.py so we can run all tests in tests/ like with:
python -m unittest discover tests/

Currently errors out on globals. Also replacing rawinput methods with os.getenv so it plays well with automated test environments.

force collection.delete() to be synchronous

I need a script to reinitialize a collection.
But with "collection".delete() and afterwards create "collection" again I got an CreationError that the collection already exists.
Here an example:

def test_foo():
    foo.delete()
    foo = db.createCollection("Ips")

test_foo() raises the following error in the last line (after delete):

CreationError: Database already has a collection named Ips. Errors: {}

It looks like that delete() is asynchronous and createCollection() is triggered before Ips is deleted.
How can I handle this?

KeyError: 'collections'

I received following error with ArangoDB 3.0.0 [linux] 64bit

from pyArango.connection import *
conn = Connection(arangoURL='http://root:password@localhost:8529')
conn.createDatabase(name = "test")

/lib/python2.7/site-packages/pyArango/database.pyc in reloadCollections(self)
37 self.collections = {}
38
---> 39 for colData in data["collections"] :
40 colName = colData['name']
41 if colData['isSystem'] :

KeyError: 'collections'

Maybe 'collections' have removed in ver3..?

Can't create connection to Arango server

Arango 3 and Python 2.7 on Ubuntu (localhost). Maybe I'm missing something, but I can't even execute your code examples without throwing this error.

>>> from pyArango.connection import *
>>> conn = Connection()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyArango/connection.py", line 63, in __init__
    self.reload()
  File "pyArango/connection.py", line 77, in reload
    data = r.json()
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 812, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

pyArango very slow on windows

I tested pyArango 1.2.5 in python 3.5 on windows and linux. Arangodb 3 running local in every system.
On linux connection time is 0.04 seconds. A simple query took 0.0089 seconds.

On windows I got 3 seconds and the simpe query took 1 second.

My windows machine is newer and faster than my linux machine.

Modifying validation schema for an already existing Collection

Hello!

I have a Collection in my database which was defined using a certain combination of validators.
I realized I made a mistake and I changed my validation schema, but pyArango does not seem to pick up the changes automatically (and I don't want to delete and reinstante the db either).

Is there a function to force him to acknowledge the new structure?

Add TTL in AQLQuery constructor

It seems one cannot specify the time-to-live for cursors.

I think this should do the fix.

query.py

class AQLQuery(Query) :
    "AQL queries are attached to and instanciated by a database"
    def __init__(self, database, query, batchSize, bindVars, options, count, fullCount, ttl, rawResults = True) :
    payload = {'query' : query, 'batchSize' : batchSize, 'bindVars' : bindVars, 'options' : options, 'count' : count, 'fullCount' : fullCount, 'ttl': ttl}

database.py

def AQLQuery(self, query, batchSize = 100, rawResults = False, bindVars = {}, options = {}, count = False, fullCount = False, ttl=60) :
    "Set rawResults = True if you want the query to return dictionnaries instead of Document objects"
    return AQLQuery(self, query, rawResults = rawResults, batchSize = batchSize, bindVars  = bindVars, options = options, count = count, fullCount = fullCount, ttl=ttl)

Full documentation in the readme is a link to pyGeno documentation

Even if the link redirect to http://pyarango.tariqdaouda.com/

The page contains the full documentation of the pyGeno library and not the pyArango library.

Graph does forget its definition

Got the following issue report:

can someone help me on this error: raise KeyError("'%s' is not among the edge definitions" % collectionName)
KeyError: "'relation' is not among the edge definitions"

his is not coming up if I create a new database but once I run the same script on the existing database

The test script is:

from pyArango.connection import *
from pyArango.database import *
from pyArango.collection import *
from pyArango.document import *
from pyArango.query import *
from pyArango.graph import *
from pyArango.theExceptions import *
# Configure your ArangoDB server connection here
conn = Connection(arangoURL="http://localhost:8529", username="root", password="")
db = None
edgeCols = {}
packagesCol = {}
if not conn.hasDatabase("testdb7"):
    db = conn.createDatabase("testdb7")
else:
    db = conn["testdb7"]
if not db.hasCollection('packages'):
    packagesCol = db.createCollection('Collection', name='packages')
else:
    packagesCol = db.collections['packages']
if not db.hasCollection('relation'):
    relation1 = db.createCollection('Edges',name='relation')
else:
    relation1= db.collections['relation']
class Vertex(Collection) :
    _fields = {
        "name" : Field()
    }
    
# class relation(Edges) :
#     _fields = {
#         "number" : Field()
#     }
    
class social(Graph) :
    _edgeDefinitions = (EdgeDefinition ('relation',fromCollections = ["packages"],toCollections = ["packages"]),)
    _orphanedCollections = []
if not db.hasCollection('node'):
    Vertex = db.createCollection('Collection', name='node')
else:
    Vertex = db.collections['node']
if not db.hasGraph('social'):
    g = db.createGraph('social')
else:
    g = db.graphs['social']    
print relation1
#incoming sign up node
# i=0
# user = []
#user.append(g.createVertex('packages', {"name": 'uidx'}))
user1 = g.createVertex('packages', {"name": 'uidx'})
user2 = g.createVertex('packages', {"name": 'uidx'})
user3 = g.createVertex('packages', {"name": 'uidx',"device_id":"-1"})
user4 = g.createVertex('packages', {"name": 'uidx',"device_id":"-1"})
user1.save()
user2.save()
user3.save()
user4.save()
#g.link('relation',user1, user2, {"type": 'married', "_key": 'aliceAndBob'})
g.link('relation', user1, user4, {"type": 'device_id', "linked_id": "1"})
g.link('relation', user1, user3, {"type": 'mobile', "linked_id": '1'})
g.link('relation', user2, user3, {"type": 'device_id', "linked_id": '1'})
g.link('relation', user2, user4, {"type": 'mobile', "linked_id": '1'})

Documentation is not clear ....for me

Greetings, sorry for asking a newbie question but for

myDocument.getInEdges(myConnections)
myConnections.getInEdges(myDocument)

how exactly I get an object "myDocument"

If I import the from pyArango.collection import Document
I don't see any functions for getting a document by id for example, only how to create one.

My question is:

I have x documents. I want to get the edges from one of them. I know it's id/key.
I want it as easy as possible.

one option would be

FOR v, e, p IN 1 OUTBOUND 'collection_name/id' GRAPH 'graphname'

RETURN xyz

is there a better/simpler way to get edges for a document

Mistake in collection.py getIndexes()

There is wrong name of database attribute in function getIndexes().
There should be uppercase "self.database.URL" instead of "self.database.url" in:
url = "%s/index" % self.database.url

add travis-ci

Add travis-ci so we can watch our PRs pass or fail across multiple Python environments.

Link to full documentation in readme.md is broken.

Link to full documentation to pyarango.tariqdaouda.com/ is not working.

No Documented way to actually get a graph

Docs show the ability to create a graph, but trying to access it does not currently seem possible outside of the first session when it's created.

Collections are accessible via the bracket syntax, but graphs are not. There's no obvious method for doing this either.

exists() not implemented

It appears no exists method ... exists ... on the Collection class. Can we add that?

When I try to get a non-existing document using fetchDocument, the code fails (as expected) with a KeyError. So I tried using 'doc_key' in collection, but since Collection doesn't implement __contains__ either, this results in the same error.

Inconsistent use of tabs and spaces in indentation

I needed to remove multiple instances of spaces mixed in with tabs in the indentation to get the code to run. Also, setup.py did not complete properly because of this error. If you're not writing this code in notepad, you will likely find a feature in your IDE along the lines of "replace tabs with spaces". Please just use it, just makes life easier for everyone.

I am using pyArango-1.0.1

DocumentCache getitem BUG ?

In pyArango/collection.py#L85

def __getitem__(self, key) :
        if key in self.cacheStore :
            try :
....

I think the if statement is unnecessary, and it causes an error in Collection. __getitem__.

strange exception handling

The problem:

When i try to use ConnectionError from pyArango (that i import) it does not catch the exception. Trace

In [1]: from prisma.db.db import PrismaDB

In [2]: x = PrismaDB()
===
Unable to establish connection, perhaps arango is not running.
===
---------------------------------------------------------------------------
ConnectionError                           Traceback (most recent call last)
<ipython-input-2-e7a0f3d49b3b> in <module>()
----> 1 x = PrismaDB()

/home/jb/py-prisma/prisma/db/db.py in __init__(self)
      9 
     10         self.logger = logging.getLogger('PrismaDB')
---> 11         self.db_conn = self.connect_arangodb()
     12         self.db = self.create_database()
     13         self.peer_collection = self.create_peer_collection()

/home/jb/py-prisma/prisma/db/db.py in connect_arangodb(self)
     17         try:
     18             # Not sustainable, need to fix how we connect to the DB. Maybe create a new user with our password.
---> 19             db_conn = Connection(username="root", password="prisma")
     20             print("Connected to prisma database: {0}".format(db_conn.arangoURL))
     21             return db_conn

/usr/local/lib/python3.5/dist-packages/pyArango/connection.py in __init__(self, arangoURL, username, password, verbose)
    103 
    104         self.users = Users(self)
--> 105         self.reload()
    106 
    107     def disconnectSession(self) :

/usr/local/lib/python3.5/dist-packages/pyArango/connection.py in reload(self)
    120         """
    121 
--> 122         r = self.session.get(self.databasesURL)
    123 
    124         data = r.json()

/usr/local/lib/python3.5/dist-packages/pyArango/connection.py in __call__(self, *args, **kwargs)
     34 
     35             try :
---> 36                 ret = self.fct(*args, **kwargs)
     37             except :
     38                 print ("===\nUnable to establish connection, perhaps arango is not running.\n===")

/usr/lib/python3/dist-packages/requests/sessions.py in get(self, url, **kwargs)
    478 
    479         kwargs.setdefault('allow_redirects', True)
--> 480         return self.request('GET', url, **kwargs)
    481 
    482     def options(self, url, **kwargs):

/usr/lib/python3/dist-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    466         }
    467         send_kwargs.update(settings)
--> 468         resp = self.send(prep, **send_kwargs)
    469 
    470         return resp

/usr/lib/python3/dist-packages/requests/sessions.py in send(self, request, **kwargs)
    574 
    575         # Send the request
--> 576         r = adapter.send(request, **kwargs)
    577 
    578         # Total elapsed time of the request (approximately)

/usr/lib/python3/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    435                 raise RetryError(e, request=request)
    436 
--> 437             raise ConnectionError(e, request=request)
    438 
    439         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8529): Max retries exceeded with url: /_api/user/root/database (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f19c9869b70>: Failed to establish a new connection: [Errno 111] Connection refused',))

Why? Well as you see in the trace its "requests" library that throws it, not pyArango.

from requests import ConnectionError

The above catches the exception because its not pyArango that throws it? Using python v3.5.2 and pyArango 1.3.1.

AttributeError: There's no attribute result in fetchAll()

Hi, I noticed that I get an error if I iterate through the collection and do something that takes more time.

For example, if I run this (size of the collection about 4000 documents):

for doc in cuisineCategories.fetchAll():
   print(doc)
   time.sleep(30)

I get this error:

Traceback (most recent call last):
  File "get_subject_of2.py", line 127, in <module>
    for doc in cuisineCategories.fetchAll():
  File "/usr/local/lib/python2.7/dist-packages/pyArango/query.py", line 101, in __next__
    v = self[self.currI]
  File "/usr/local/lib/python2.7/dist-packages/pyArango/query.py", line 114, in __getitem__
    if not self.rawResults and (self.result[i].__class__ is not Edge and self.result[i].__class__ is not Document) : 
  File "/usr/local/lib/python2.7/dist-packages/pyArango/query.py", line 127, in __getattr__
    raise  AttributeError("There's no attribute %s" %(k))
AttributeError: There's no attribute result

This works fine:

for doc in cuisineCategories.fetchAll():
   print(doc)

I have a workaround, so it's not a big issue, just wanted to share.

Field argument

Thank you for supporting 3.x!

When I tried the graph function, I got the following error.

from pyArango.collection import Edges, Field

class Connections(Edges) :
    _validation = {
    'on_save' : False,
    'on_set' : False,
    'allow_foreign_fields' : True # allow fields that are not part of the schema
    }

    _fields = {
    'length' : Field(NotNull = True),
    }

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-56-9cd8aed1f5b2> in <module>()
      1 from pyArango.collection import Edges, Field
      2 
----> 3 class Connections(Edges) :
      4     _validation = {
      5     'on_save' : False,

<ipython-input-56-9cd8aed1f5b2> in Connections()
      9 
     10     _fields = {
---> 11     'length' : Field(NotNull = True),
     12     }

TypeError: __init__() got an unexpected keyword argument 'NotNull'

error at: conn = Connection()

I want to use your pyarango, but I cannot even get connected. Here is the error i get:

File "pyarango-test1.py", line 8, in
conn = Connection(arangoURL='http://localhost:8529') #or with just conn = Connection()
File "/usr/local/lib/python2.7/dist-packages/pyArango/connection.py", line 19, in init
self.reload()
File "/usr/local/lib/python2.7/dist-packages/pyArango/connection.py", line 27, in reload
data = r.json()
TypeError: 'dict' object is not callable

I can't find any related threads on it, can somebody help me with this?

CreationError: VPackError error:

I'm trying to extract data from a dataset and I keep getting this error
ds is the variable for the dataset that can be found here as datasets.csv
I have created a collection named 'dataset' and trying to extract each rows info from the dataset to create a document each but i keep getting this error when I run the function I've created.

CreationError: VPackError error: Expecting digit. Errors: {u'code': 400, u'errorNum': 600, u'errorMessage': u'VPackError error: Expecting digit', u'error': True}

`def create_datasets_docs():
tup = [tuple(x) for x in ds.values]
ds_do = {}

for i in range(len(tup)):
    name, about, link, category, cloud, vintage = tup[i]
    ds_doc = datasets.createDocument()
    ds_doc["name"] = name
    ds_doc["about"] = about
    ds_doc["link"] = link
    ds_doc["category"] = category
    ds_doc["cloud"] = cloud
    ds_doc["vintage"] = vintage
    ds_doc.save()`

UNSET action for Document

Very great tool for Arango. But I am missing the opportunity to unset an attribute of a document-object. Would be great to implement it.

Document save func says to use .path() instead of .patch()

save(waitForSync=False, **docArgs)[source]
Saves the document to the database by either performing a POST (for a new document) or a PUT (complete document overwrite). If you want to only update the modified fields use the .path() function. Use docArgs to put things such as ‘waitForSync = True’ (for a full list cf ArangoDB’s doc). It will only trigger a saving of the document if it has been modified since the last save. If you want to force the saving you can use forceSave()

client api shouldn't be changed this way

You have interesting feature in master branch (using statsd to handle statistics).
However when i installed a master branch distribution i met error with Collection::createDocument() method
You changed high-level api without any deprecation warnings in 1.3.0
Since I have cases when i create documents from a post messages with prepopulated data i had to change all my client code from createDocument() to createDocument_() which is (honestly) a bit weird.

1.3.0...master#diff-b710a95686ae3f74d5df8a2cf52930a8R295

status 401 when creating database

Using
ArangoDB 3.3.2 [linux] 64bit, using jemalloc, VPack 0.1.30, RocksDB 5.6.0, ICU 58.1, V8 5.7.492.77, OpenSSL 1.1.0f
Pyarango 1.3.0
On the tutorial page it says you can create database with

from pyArango.connection import *

conn = Connection(username='my_username',password='my_password')

conn.createDatabase(name="test_db")

even after putting the correct password and user i get

conn.createDatabase(name=db)
 File "/usr/local/lib/python2.7/dist-packages/pyArango/connection.py", line 138, in createDatabase
   r = self.session.post(url, data = payload)
 File "/usr/local/lib/python2.7/dist-packages/pyArango/connection.py", line 42, in __call__
   raise ConnectionError("Empty server response", ret.url, ret.status_code, ret.content)
pyArango.theExceptions.ConnectionError: Empty server response. URL: http://127.0.0.1:8529/_api/database, status: 401. Errors:

but note, it works with root user which in that case i think the documentation should be updated, cause i thought any user should be able to create a db with all privilege assigned to the user .

Question on indexes

Hi,

How is Arango indexes handled using this package - ie. how would I create an index?

NameError: name 'updateError' is not defined

>>> db = Database(conn, 'system')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ajung/src/lib/python3.5/site-packages/pyArango/database.py", line 34, in __init__
    self.reload()
  File "/Users/ajung/src/lib/python3.5/site-packages/pyArango/database.py", line 80, in reload
    self.reloadCollections()
  File "/Users/ajung/src/lib/python3.5/site-packages/pyArango/database.py", line 62, in reloadCollections
    raise updateError(data["errorMessage"], data)
NameError: name 'updateError' is not defined

Missing import of CreationError in query.py

I think there's a missing import in query.py. Here's an example:

from pyArango.connection import *

conn = Connection()
db = conn['_system']
docs = db.AQLQuery("for x in myCollection return { name : x.name }", batchSize = 1);

for doc in docs:
  print doc

which gives me

Traceback (most recent call last):
  File "c:\Users\z0017wrp\Desktop\kks-decoder\minimalExample.py", line 7, in <module>
    for doc in docs:
  File "build\bdist.win32\egg\pyArango\query.py", line 89, in next
  File "build\bdist.win32\egg\pyArango\query.py", line 106, in __getitem__
  File "build\bdist.win32\egg\pyArango\query.py", line 63, in _developDoc
NameError: global name 'CreationError' is not defined

I know that my code does not follow the intended usage of the function AQLQuery and that I should be using rawResult=True, but it's still better to have the error raised correctly rather than a crash.

Versions: pyArango-1.0.1, Python 2.7.9, ArangoDB 2.6.1, Win7 64 bit.

Document creation - nested fields - Validation error

Hello!

When we try to insert a document into an ArangoDB collection and that document has nested fields (fields populated with a dictionary) we get this error:

TypeError: argument of type 'Field' is not iterable

which is thrown by

if (not self.collection._validation['allow_foreign_fields']) and (field not in self.validators) and (field not in self.collection.arangoPrivates):

and that seems to happen because self.validators in DocumentStore.set is equal to

<Field, validators: ''>

Are we doing the insertion the wrong way or is there a bug in the library?

P.S. We have

_validation = { 'on_save': True, 'on_set': True, 'on_load': False, 'allow_foreign_fields': False }

for this collection.

Upsert implementation

Hello @tariqdaouda

I wonder if UPSERT is integrated, if not, is it planned in near future ?

Document store broke in 1.3.0?

I am fetching a document from arangodb3 (v3.0.12) using pyArango 1.3.0.. The doc type returned is <class 'pyArango.document.DocumentStore'> which appear right. I then make an assignment in and my JSON appears to have all these store:xxx embedded keys which breaks the existing code. 1.2.8 worked fine. Is this the intended effect?

threadJSON is now of type <class 'pyArango.document.DocumentStore'> where it used to be type <class 'dict'>

Here is the output:

doc = threadCollection.fetchDocument(threadname)
threadJSON = doc["config"]

threadJSON now reads:
<store: {'thread': <store: {'_comment': 'p9s2 system for a min 2 host configuration for xxxx development', 'networks': [{'networktype': 'management', 'networkNamespace': 'global'}, {'networktype': 'data1', 'networkNamespace': 'global'}, {'networktype': 'data2', 'networkNamespace': 'global'}, {'networktype': 'ipmi', 'networkNamespace': 'global'}], 'name': 'p9s2', 'hostlist': ['p9s2_vmhost1', 'p9s2_xxxx', 'p9s2_xxxx'], 'application_config': <store: {'version': 'LATEST', 'docker': <store: {'registry': <store: {'ip': 'xx.xxx.xx.xx', 'port': 'xxxx'}>}>, 'applicationtype': 'xxxx', 'kub_network': <store: {'pod_network_cidr': 'xxx.xx.x.x/xx'}>}>, 'guestlist': ['p9s2_xxxx', 'p9s2_xxx', 'p9s2_xxx', 'p9s2_xxx', 'p9s2xxxx'], 'puppetcontrol': <store: {'host': 'master', 'guest': 'master'}>, 'masteragent': <store: {'ip': 'xx.xxx.xxx.x', 'username': 'cobbler', 'password': 'cobbler'}>, 'orchestrationnode': <store: {'enabled': False}>, 'cobblercontrol': <store: {'host': 'master', 'guest': 'master'}>, 'slaveagent': <store: {'ip': 'xxx.xxx.x.x', 'username': 'cobbler', 'password': 'cobbler'}>}>}>

v1.2.8 threadJSON was (using same code)
{'thread': {'orchestrationnode': {'enabled': False}, '_comment': 'p9s2 system for a min 2 host configuration for xxx development', 'name': 'p9s2', 'guestlist': ['p9s2_xxxx', 'p9s2_xxxx', 'p9s2_xxxx', 'p9s2_xxxx', 'p9s2_xxxx'], 'hostlist': ['p9s2_xxxx', 'p9s2_xxxx', 'p9s2_xxxx'], 'puppetcontrol': {'guest': 'master', 'host': 'master'}, 'masteragent': {'password': 'cobbler', 'ip': 'xx.xxx.xxx.x', 'username': 'cobbler'}, 'application_config': {'applicationtype': 'xxxx', 'version': 'LATEST', 'docker': {'registry': {'ip': 'xx.xxx.xxx.xx', 'port': 'xxxx'}}, 'kub_network': {'pod_network_cidr': 'xxx.xx.x.x/xx'}}, 'networks': [{'networkNamespace': 'global', 'networktype': 'management'}, {'networkNamespace': 'global', 'networktype': 'data1'}, {'networkNamespace': 'global', 'networktype': 'data2'}, {'networkNamespace': 'global', 'networktype': 'ipmi'}], 'cobblercontrol': {'guest': 'master', 'host': 'master'}, 'slaveagent': {'password': 'cobbler', 'ip': 'xxx.xxx.x.x', 'username': 'cobbler'}}}

setting pool size in pyarango

can we set pool size using connection class

When using gevent/greenlets, highly concurrent requests can return incorrect responses

When sending multiple concurrent requests through gevent/greenlets, requests from one greenlet can sometimes leak their state into other greenlets. So a request will get the response from a different query.

This may be related to a similar issue, under the same circumstances where a "Bad Status Line" exception is thrown when reading a response. Doing a TCPDump shows that the data coming across the wire is correct, but pyArango seems to drop portions of the incoming stream.

There is no bulk import method?

There is no bulk import method in pyArango like there is one for arangojs

Delete db functionality

At the moment it is not possible, using pyArango, to actually drop an ArangoDB database.
Is there a specific reason for this?

I have written a script to successfully clean a database (drop all graphs and collections) using pyArango functionalities, but it doesn't really solve the issue.
By the way I could draft a pull request to integrate this last functionality into the database class (a drop_all_collections method?)

Support for the Export API

ArangoDB has a special very fast API to export a full collection. It would be great to have it in the driver, the benchmarks are impressive:

https://jsteemann.github.io/blog/2015/04/04/more-efficient-data-exports/

Roughly it is equivalent to:

curl
  -X POST
  http://localhost:8529/_api/export?collection=users
--data '{"batchSize":5000}'

With a cursor

More info:
https://docs.arangodb.com/cookbook/UseCases/ExportingData.html

Feature Request: python3 Support

Python 3.4.3
>>> import pyArango
>>> from pyArango.connection import Connection
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/.virtualenvs/rmusic.spider/lib/python3.4/site-packages/pyArango/connection.py", line 4, in <module>
    from database import Database, DBHandle
ImportError: No module named 'database'

Query result dictionary order lost

aql = """FOR x IN Link
              FILTER CONTAINS(x._from, "Core")
              SORT x._from, x._to, x.Port1
              RETURN {[x._from]:x.Port1, [x._to]:x.Port2}"""
result = self.aql_query(aql)
print(result.result[0])

def aql_query(self, aql, rawResults=True, batchSize=100):
  query_result = self.db.AQLQuery(aql, rawResults=rawResults, batchSize=batchSize)
  return query_result

Link is a collection of edges in a topology.
Example of a document in Link:{"Port1":"Port2","Port2":"NM_Port1","_from":"Router/Core1","_to":"Router/Edge2","name":"Core1Edge2"}

Expected response while running pyArango code:
[{'Router/Core1': 'Port1', 'Router/Edge1': 'NM_Port1'}, {'Router/Core1': 'Port2', 'Router/Edge2': 'NM_Port1'}]

Received response while running pyArango code:
[{'Router/Edge1': 'NM_Port1', 'Router/Core1': 'Port1'}, {'Router/Edge2': 'NM_Port1', 'Router/Core1': 'Port2'}]

Expected and received response while running the query in the arango interface:
[{'Router/Core1': 'Port1', 'Router/Edge1': 'NM_Port1'}, {'Router/Core1': 'Port2', 'Router/Edge2': 'NM_Port1'}]

My understanding is - when python gets the query result, it does not care about the order in which the dictionaries are in the result. So even thought the sort function is present in the arango query, it is being overwritten.

I wonder if something can be done about this issue!

How to include collections in a graph that is already created?

Hey there,

For example I want to add "Humans4" to the graph so I update the EdgeDefinition:

class social3(Graph) :

    _edgeDefinitions = (EdgeDefinition ('siblin4',
                                    fromCollections = ["female", "male", "Humans4"],
                                    toCollections = ["female", "male", "Humans4"]),)
    _orphanedCollections = []

Then I load the graph and indeed it seems it has been updated::

g = db.graphs['social3']
g.definitions

{'siblin4': <ArangoED>{'collection': 'siblin4', 'from': ['female', 'male', 'Humans4'], 'to': ['female', 'male', 'Humans4']}}

But still:

g.createVertex('Humans4', {'name':'pepe'})

raises:

---------------------------------------------------------------------------
CreationError                             Traceback (most recent call last)
<ipython-input-29-f3363e33b4a4> in <module>()
----> 1 g.createVertex('Humans4', {'name':'pepe'})

/Users/k/3virtualenvs/ENV31/lib/python3.6/site-packages/pyArango/graph.py in     createVertex(self, collectionName, docAttributes, waitForSync)
    127             return self.database[collectionName][data["vertex"]["_key"]]
    128 
--> 129         raise CreationError("Unable to create vertice, %s" % data["errorMessage"], data)
    130 
    131     def deleteVertex(self, document, waitForSync = False) :

CreationError: Unable to create vertice, collection not found. Errors: {'error': True, 'errorNum': 1203, 'errorMessage': 'collection not found', 'code': 404}

Also, similarly, I'd like to know how to modify Document definitions (validators, etc).

Thanks!

Connect to SSL server

I have arangodb running on ssl protected server. Admin interface is running ok.

When I try to connect using:
conn = Connection(arangoURL="https://www.test.com:12345", username="root", password="password")

(python 3.5 on windows 7)

I got an SSL error:
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

complete traceback:

Traceback (most recent call last):
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 595, in urlopen
    chunked=chunked)
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 352, in _make_request
    self._validate_conn(conn)
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 831, in _validate_conn
===
Unable to establish connection, perhaps arango is not running.
===
    conn.connect()
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\packages\urllib3\connection.py", line 289, in connect
    ssl_version=resolved_ssl_version)
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\packages\urllib3\util\ssl_.py", line 308, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "C:\Users\crecheverria\Apps\WinPython\64bit-3.5.1.1\python-3.5.1.amd64\lib\ssl.py", line 376, in wrap_socket
    _context=self)
  File "C:\Users\crecheverria\Apps\WinPython\64bit-3.5.1.1\python-3.5.1.amd64\lib\ssl.py", line 747, in __init__
    self.do_handshake()
  File "C:\Users\crecheverria\Apps\WinPython\64bit-3.5.1.1\python-3.5.1.amd64\lib\ssl.py", line 983, in do_handshake
    self._sslobj.do_handshake()
  File "C:\Users\crecheverria\Apps\WinPython\64bit-3.5.1.1\python-3.5.1.amd64\lib\ssl.py", line 628, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\adapters.py", line 423, in send
    timeout=timeout
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 621, in urlopen
    raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "c:\Users\crecheverria\Devel\SGOT\fallas\v_python\arangodb3\test_02.py", line 13, in <module>
    conn = Connection(arangoURL="https://arango.xer.cl:22754", username="root", password="calipso01")
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\pyArango\connection.py", line 103, in __init__
    self.reload()
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\pyArango\connection.py", line 120, in reload
    r = self.session.get(self.databasesURL)
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\pyArango\connection.py", line 38, in __call__
    ret = self.fct(*args, **kwargs)
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\sessions.py", line 488, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\sessions.py", line 596, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\crecheverria\Devel\z_venvs\py64-3.5.1-gen\lib\site-packages\requests\adapters.py", line 497, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

Document .save() and patch() does not actually save

Part of the example code:
doc["name"] = "Tesla-101"
doc = collection.createDocument()
doc["name"] = "Tesla-101"
doc["number"] = 101
doc["species"] = "human"

doc["name"] = "Simba"
#doc.save() # overwrites the document
doc.patch() # updates the modified field

Does not do anything, i.e. after doc.patch() I cannot see a new document in the database

Have better defaults in the Database.AQLQuery method

When I attempt to run the code:

queryResult = db.AQLQuery(aql, rawResults=True)

I get the error message: "pyArango.theExceptions.QueryError: expecting non-zero value for ". To fix this I have to instead say something like:

queryResult = db.AQLQuery(aql, rawResults=True, batchSize=100)

I think the API would look nicer if batchSize had a usable default.

fetchByExample error

I do this query
posts = postsCollection.fetchByExample({'PostTypeId' : 2}, batchSize = 100)
and after i fetch the result with a for. Inside the for there is other fetchByExample query.
When the number of row of the first query is high after some batch step i return me this error:
AttributeError: There's no attribute result

What is the problem?

mustn't decode empty download

In connection.py the method r.json will assert if there is no reply content (as i.e. if the HTTP returncode is 401)

Most probably a change like that could make it run smoother:

    def reload(self) :
...
        data = [];
        if r.content.length > 0:
            data = r.json()
        else if r.status_code == 401:
            data["errorMessage"] = "Unauthorized"

Else the user will just get an exception like that:

  File "/usr/local/lib/python2.7/dist-packages/pyArango/connection.py", line 102, in __init__
    self.reload()
  File "/usr/local/lib/python2.7/dist-packages/pyArango/connection.py", line 121, in reload
    data = r.json()
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 894, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")

which won't give much information on the original error.