irods / python-irodsclient Goto Github PK
View Code? Open in Web Editor NEWA Python API for iRODS
License: Other
A Python API for iRODS
License: Other
Proposed patch:
(icenv)onottra558335x:python-irodsclient lewisct$ git diff
diff --git a/irods/resource_manager/data_object_manager.py b/irods/resource_manager/data_object_manager.py
index b5c5231..1d85148 100644
--- a/irods/resource_manager/data_object_manager.py
+++ b/irods/resource_manager/data_object_manager.py
@@ -22,7 +22,7 @@ class DataObjectManager(ResourceManager):
.filter(DataObject.name == basename(path))\
.filter(DataObject.collection_id == parent.id)
results = query.all()
- if len(results) < 0:
+ if len(results) <= 0:
raise DataObjectDoesNotExist()
return iRODSDataObject(self, parent, results)
Mar 13 11:52:10 pid:17278 NOTICE: rsRmColl: Recursively removing /tempZone/home/rods/test_dir.
Mar 13 11:52:11 pid:17278 ERROR: svrSendCollOprStat: client reply 157 != 99999997.
Mar 13 11:52:11 pid:17278 ERROR: _rsPhyRmColl: svrSendCollOprStat failed for /tempZone/home/rods/test_dir. status = -313000 status = -313000 UNMATCHED_KEY_OR_INDEX
Mar 13 11:52:11 pid:17278 ERROR: [-] iRODS/server/core/src/rsApiHandler.cpp:297:sendApiReply : status [SYS_HEADER_WRITE_LEN_ERR] errno [Broken pipe] -- message []
[-] iRODS/lib/core/src/sockComm.cpp:1340:sendRodsMsg : status [SYS_HEADER_WRITE_LEN_ERR] errno [Broken pipe] -- message [failed to call 'write body']
[-] libtcp.cpp:420:tcp_send_rods_msg : status [SYS_HEADER_WRITE_LEN_ERR] errno [Broken pipe] -- message [writeMsgHeader failed]
[-] iRODS/lib/core/src/sockComm.cpp:461:writeMsgHeader : status [SYS_HEADER_WRITE_LEN_ERR] errno [Broken pipe] -- message []
[-] libtcp.cpp:358:tcp_write_msg_header : status [SYS_HEADER_WRITE_LEN_ERR] errno [Broken pipe] -- message [wrote 0 expected 145]
Mar 13 11:52:11 pid:17278 ERROR: [-] iRODS/server/core/src/rsApiHandler.cpp:475:readAndProcClientMsg : status [SYS_HEADER_READ_LEN_ERR] errno [] -- message []
[-] iRODS/lib/core/src/sockComm.cpp:197:readMsgHeader : status [SYS_HEADER_READ_LEN_ERR] errno [] -- message [failed to call 'read header']
[-] libtcp.cpp:256:tcp_read_msg_header : status [SYS_HEADER_READ_LEN_ERR] errno [] -- message [header length is out of range: 1011708775 expected >= 0 and < 1088]
Mar 13 11:52:11 pid:17278 NOTICE: Agent exiting with status = -4000
Mar 13 11:52:11 pid:21364 NOTICE: Agent process 17278 exited with status 24576
Mar 13 16:07:53 pid:21364 NOTICE: Agent process 19687 started for puser=rods and cuser=rods from 127.0.0.1
Mar 13 16:07:53 pid:19687 ERROR: [-] iRODS/server/core/src/rsApiHandler.cpp:475:readAndProcClientMsg : status [SYS_HEADER_READ_LEN_ERR] errno [Resource temporarily unavailable] -- message []
[-] iRODS/lib/core/src/sockComm.cpp:197:readMsgHeader : status [SYS_HEADER_READ_LEN_ERR] errno [Resource temporarily unavailable] -- message [failed to call 'read header']
[-] libtcp.cpp:240:tcp_read_msg_header : status [SYS_HEADER_READ_LEN_ERR] errno [Resource temporarily unavailable] -- message [read 0 expected 4]
Mar 13 16:07:53 pid:19687 NOTICE: Agent exiting with status = -4011
When attempting the examples in README.md, an "AttributeError: 'module' object has no attribute 'MSG_WAITALL'" is generated.
$ python test.py
Traceback (most recent call last):
File "test.py", line 4, in <module>
coll = sess.collections.get('/TestZone01/projects')
File "C:\Python27\lib\site-packages\irods\manager\collection_manager.py", line 15, in get
result = query.one()
File "C:\Python27\lib\site-packages\irods\query.py", line 217, in one
results = self.execute()
File "C:\Python27\lib\site-packages\irods\query.py", line 169, in execute
with self.sess.pool.get_connection() as conn:
File "C:\Python27\lib\site-packages\irods\pool.py", line 22, in get_connection
conn = Connection(self, self.account)
File "C:\Python27\lib\site-packages\irods\connection.py", line 22, in __init__
self._server_version = self._connect()
File "C:\Python27\lib\site-packages\irods\connection.py", line 82, in _connect
version_msg = self.recv()
File "C:\Python27\lib\site-packages\irods\connection.py", line 41, in recv
msg = iRODSMessage.recv(self.socket)
File "C:\Python27\lib\site-packages\irods\message\__init__.py", line 41, in recv
rsp_header_size = _recv_message_in_len(sock, 4)
File "C:\Python27\lib\site-packages\irods\message\__init__.py", line 19, in _recv_message_in_len
buf = sock.recv(size_left, socket.MSG_WAITALL)
AttributeError: 'module' object has no attribute 'MSG_WAITALL'
Not sure if this is the same as #21 but can we have sample code snippets and documentation for all the functions please? Both in README or similar and within code so the help(function) returns something useful.
Is there a way (using this python binding) to either
Using command line, there's an option to "iput" so that a checksum is computed at the origin, another one in irods, and they are compared. This enhance reliability of the transport inside irods from an external source.
It is possible the other way around, when getting an object. If a checksum is already computed/stored by irods, it is provided as ".checksum" on the python object, and we do already use this for checking the file integrity once received. What we are aiming here is to get the same level of reliability in the other direction.
Missing feature set on the initial checkmark list in the README that we would use if it were implemented.
Missing tick box in the features list in the README that we would use if implemented.
typos
comments
cleanups
etc...
This library does not seem to work with Python 3:
$ pip3 install --upgrade git+git://github.com/irods/python-irodsclient.git
$ python3
>>> from irods.session import iRODSSession
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/irods/session.py", line 1, in <module>
from irods.query import Query
File "/usr/local/lib/python3.4/dist-packages/irods/query.py", line 5, in <module>
from irods.message import (
File "/usr/local/lib/python3.4/dist-packages/irods/message/__init__.py", line 7, in <module>
from irods.message.message import Message
File "/usr/local/lib/python3.4/dist-packages/irods/message/message.py", line 5, in <module>
from irods.message.ordered import OrderedProperty, OrderedMetaclass, OrderedClass
File "/usr/local/lib/python3.4/dist-packages/irods/message/ordered.py", line 23
key = lambda (name, property): property._creation_counter,
^
SyntaxError: invalid syntax
Are there any plan for adding it? I could give a hand if needed.
Thanks,
Paolo
Hi there,
When using your python API, we hit a segfault in irodsAgent when trying to add metadata from a JSON snippet to a data_object, where the key is a valid string and the value is empty:
print("Adding %d bits of metadata to %s ..." % (len(misoobjjson), dobj.path))
for key, value in misoobjjson.iteritems():
dobj.metadata.add(str(key).strip(), str(value).strip()) # where key is "foo" and value is ""
Attempting to add it causes:
Sep 8 18:10:41 v0594 kernel: irodsAgent[14416]: segfault at 0 ip 0000000000784a36 sp 00007ffe29788f98 error 4 in irodsAgent[400000+5e6000]
Sep 8 18:10:41 v0594 abrt[9307]: Saved core dump of pid 14416 (/var/lib/irods/iRODS/server/bin/irodsAgent) to /var/spool/abrt/ccpp-2016-09-08-18:10:41-14416 (13205504 bytes)
Sep 8 18:10:41 v0594 abrtd: Directory 'ccpp-2016-09-08-18:10:41-14416' creation detected
Sep 8 18:10:51 v0594 abrtd: Generating core_backtrace
Sep 8 18:10:51 v0594 abrtd: New problem directory /var/spool/abrt/ccpp-2016-09-08-18:10:41-14416, processing
Sep 8 18:10:51 v0594 abrtd: Sending an email...
Sep 8 18:10:51 v0594 abrtd: Email was sent to: root@localhost
Ducking into the backtrace:
(gdb) bt full
#0 0x0000000000784a36 in overflow(char const*, int) ()
No symbol table info available.
#1 0x000000000078575b in computeExpressionWithParams(char const*, char const**, int, RuleExecInfo*, int, MsParamArray*, rError_t*, region*) ()
No symbol table info available.
#2 0x00000000007d0cce in applyRuleArgPA(char const*, char const**, int, MsParamArray*, RuleExecInfo*, int) ()
No symbol table info available.
#3 0x000000000068495b in _rsModAVUMetadata(rsComm_t*, modAVUMetadataInp_t*) ()
No symbol table info available.
#4 0x0000000000684e93 in rsModAVUMetadata(rsComm_t*, modAVUMetadataInp_t*) ()
No symbol table info available.
#5 0x000000000058d5af in rsApiHandler(rsComm_t*, int, BytesBuf*, BytesBuf*) ()
No symbol table info available.
#6 0x000000000058e2f5 in readAndProcClientMsg(rsComm_t*, int) ()
No symbol table info available.
#7 0x000000000049edc8 in agentMain(rsComm_t*) ()
No symbol table info available.
#8 0x00000000004a04db in main ()
No symbol table info available.
I think an exception being thrown in the API instead of the irodsAgent falling over would be preferable? :)
Cheers
Rob
We are using Robotframework for testing applications and have a library that uses the python-irodsclient
One of our keywords is for removing collections. This works just fine as long as the collection does not contain much data or collections. Once there is some data (our test is about 400MBs and 30+ subcollections) and it fails with a broken connection.
The full library is available at:
https://github.com/cyverse/Robotframework-iRODS-Library.git
Here is the code for the "Delete a Collection" keyword:
def delete_a_collection(self, path=None, recursive=True, force=False, alias="default_connection"):
""" Delete an existing iRODS collection at the given path
'path' - the collection you want to delete (full path)
'recursive' - boolean defaults to true
'force' - boolean defaults to false (all items sent to trash)
'alias' - Robotframework alias to identify the connection
Example usage:
| Delete A Collection | /tempZone/home/jdoe/NewCollectionName | connectionAlias
| Log | ${output}
| Should Not Contain | ${output} | error
"""
logger.info('Delete a Collection : alias=%s, path=%s, recursive=%s, force=%s' % (alias, path, recursive, force))
alias = str(alias)
path = str(path)
session = self._cache.switch(alias)
coll = session.collections.remove(path, recursive, force)
Hey!
First congrats for this great work, I love this API!
I want to send a local file to iRODS using your API but I'm not sure how should I proceed. Using iCommand, I'd use the "iput" command for this purpose however I can't find a equivalent for "iput" on your API.
Am I missing something?
Thanks and best regards.
Rafa
As reported by Juan Luis Font:
I have been testing the latest version of python-irodsclient available on GitHub together with our iRODS server.
The client code I have been using is this on: https://github.com/irods/python-irodsclient
Regarding our iRODS server, this is part of the information provided by imiscsvrinfo
RCAT_ENABLED
relVersion=rods3.3.1
apiVersion=d
Due to some external requirements, we have to stay with iRODS 3.X (not sure if using the new python client together with the previous major release can the the source of the following problem).
I am running the proposed examples from my workstation (Debian machine with Python 2.7)
I have successfully executed the python-irodsclient examples for collections available on the GitHub wiki, so I assume that at least the session is properly working.
When I try to run the data object examples, I always get the following error:
from irods.session import iRODSSession
sess = iRODSSession(host='ourirodshost', port=1247,
user='XXXX', password='YYYY', zone='ourVZ')
obj = sess.data_objects.get("/ourVZ/ourFolder/pythontest/xkcd.gif")
---------------------------------------------------------------------------
CAT_UNKNOWN_TABLE Traceback (most recent call last)
<ipython-input-7-0bed36b25e13> in <module>()
----> 1 obj =
sess.data_objects.get("/ourVZ/ourFolder/pythontest/sysadmin-xkcd.gif")
/home/jlfont/repo/python-irodsclient/irods/manager/data_object_manager.pyc
in get(self, path)
23 .filter(DataObject.name == basename(path))\
24 .filter(DataObject.collection_id == parent.id)
---> 25 results = query.all()
26 if len(results) <= 0:
27 raise DataObjectDoesNotExist()
/home/jlfont/repo/python-irodsclient/irods/query.pyc in all(self)
140
141 def all(self):
--> 142 result_set = self.execute()
143 if result_set.continue_index > 0:
144 self.continue_index(result_set.continue_index).close()
/home/jlfont/repo/python-irodsclient/irods/query.pyc in execute(self)
126 conn.send(message)
127 try:
--> 128 result_message = conn.recv()
129 results =
result_message.get_main_message(GenQueryResponse)
130 result_set = ResultSet(results)
/home/jlfont/repo/python-irodsclient/irods/connection.pyc in recv(self)
37 msg = iRODSMessage.recv(self.socket)
38 if msg.int_info < 0:
---> 39 raise get_exception_by_code(msg.int_info)
40 return msg
41
CAT_UNKNOWN_TABLE:
The file I'm trying to use to run the above example is a valid one and I can manipulate it without problem using the icommands.
I have been surfing the documentation and the web, but I have not managed to find anything that can be helpful for my case.
Any comments and suggestions are more than welcome.
Hi,
I tried the python-irods-client.
Reading a simple 12 MB file like this, may take up to 200 seconds!
obj = sess.data_objects.get(mseedFile)
with obj.open('r+') as f:
for line in f:
print line
time ./rsc.py > output
I patched the irods/data_object.py
file, to specify a big buffer:
1c61
< return BufferedRandom(iRODSDataObjectFileRaw(conn, desc))
---
> return BufferedRandom(iRODSDataObjectFileRaw(conn, desc),buffer_size=1000000)
Then, it took "only" 7 seconds to read the file. But it's still 10 times more than a simple iget.
Furthermore, if I use a library like "obspy" (obspy.core.read(f)
), we are back to 200 seconds, even with this buffering trick.
What read/write performance are we suppose to expect using the irods api?
Best regards
Logging in via PAM would be very useful. I think this will require overriding the _login
method of connection, and using a different API call and response.
Being able to register data objects, particularly of slink or http type, would be very useful for dynamically aggregating collections.
First use will be to pass resource parameter
At present, some of the code doesn't meet the PEP8 style guide, which means writing python using it often doesn't either which makes for IMHO less readable code and may also cause CI lint checkers to complain, if not outright reject the code.
python code:
sess = iRODSSession(iEnv_rodsHost, iEnv_rodsPort, iEnv_rodsUserName, iEnv_rodsZone, iEnv_rodsPassword)
irods_group_obj = sess.user_groups.get(groupname)
rodsLog:
Oct 21 10:00:03 pid:20240 NOTICE: Agent process 20866 started for puser=rods and cuser=rods from 10.205.115.130
....
Oct 21 10:00:03 pid:20866 NOTICE: readAndProcClientMsg: received disconnect msg from client
Oct 21 10:00:03 pid:20866 NOTICE: Agent exiting with status = 0
Oct 21 10:00:03 pid:20748 ERROR: [-] iRODS/server/core/src/rsApiHandler.cpp:470:readAndProcClientMsg : status [SYS_HEADER_READ_LEN_ERR] errno [Resource temporarily unavailable] -- message []
[-] iRODS/lib/core/src/sockComm.cpp:199:readMsgHeader : status [SYS_HEADER_READ_LEN_ERR] errno [Resource temporarily unavailable] -- message [failed to call 'read header']
[-] libtcp.cpp:240:tcp_read_msg_header : status [SYS_HEADER_READ_LEN_ERR] errno [Resource temporarily unavailable] -- message [read 0 expected 4]
Oct 21 10:00:03 pid:20748 NOTICE: Agent exiting with status = -4011
Oct 21 10:00:03 pid:20240 NOTICE: Agent process 20748 exited with status 21760
Oct 21 10:00:03 pid:20240 NOTICE: Agent process 20866 exited with status 0
This seems to be a general issue with the network protocol? Jargon can cause similar log messages, DICE-UNC/jargon#198
Possibly with a dictionary in irods/__init__.py
Get release version out of StartupPack, etc...
Enhancement request: it would be great if the python-irodsclient worked with python 3 as well.
irods/__init__.py
adds a specialized handler to the root logger which is then picked up by any application that loads irods.
This results in duplicate log entries for all that have already configured the loggers.
If you need to setup loggers, these should be left to the application or test environment and not be in the library.
We have a few hundred of files per collection and when I try to access the data objects within subcollections of a collection, I only get maximum 250 objects returned:
sess = iRODSSession(...)
coll = sess.collections.get('/a/path')
for collection, subcollections, data_objects in coll.walk():
print len(data_objects)
prints
0
250
250
250
although there are much more files in each subcollection.
Would it be possible to implement the use of API keys with the python-irodsclient? Would it be very difficult to implement? The only reason I ask is because it makes it a lot easier to authenticate in irods when using it with third party applications, than using plain text passwords.
It would be very helpful to have some functions to get or set a dictionary of context strings for a resource.
Following on from the iRODS chat discussion, can we keep the pip package up to date with the git repo please? I understand that it may be desirable to keep a release schedule but the pip version is enough behind the curve that if the standard advice is to clone it from the repo, its probably too far out of date!
please add a "like" mode to the comparison types in column.py (or, if it works out of the box somehow, please add a test case example on how to use it)
We are wondering if there is the possibility to execute a rule by the irods python client? Was this already implemented? I there some documentation on this?
Where would we have to look at if we would like to add this to the packages?
TNX!
Does this library support Kerberos Authentication at all? If not, are there any plans to support it in the future?
When reading the contents of iRODS data objects, the second read seems to always fail. The following code will always fail on the second file in the list. If I reorder the files it will still fail on the second one.
The error is BAD_INPUT_DESC_INDEX.
Notes:
files = ['/tempZone/home/rods/file1.csv',\
'/tempZone/home/rods/file2.csv']
for file in files:
obj = sess.data_objects.get(file)
with obj.open('r') as f:
try:
print f.read()
except:
print 'exception read file ' + file + '\n\n'
I'm running @beppodb 's docker images here, and I spin up clean instances each time. Currently, data object creation fails with this type of construction:
import irods
from irods.session import iRODSSession
sess = iRODSSession(host='localhost', port=8547, user='rods', password='rods',
zone='tempZone')
coll = sess.collections.get("/tempZone/home/rods")
print coll.id
for col in coll.subcollections:
print col
for obj in coll.data_objects:
print obj
coll = sess.collections.create("/tempZone/home/rods/testdir3")
print "Created collection", coll.id
obj = sess.data_objects.create("/tempZone/home/rods/test1")
print "Created object"
The error message is suspiciously similar to the previous issue with data collection creation:
10008
Created collection 10033
Traceback (most recent call last):
File "check_client.py", line 16, in <module>
obj = sess.data_objects.create("/tempZone/home/rods/test1")
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/resource_manager/data_object_manager.py", line 50, in create
conn.close_file(desc)
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/connection.py", line 144, in close_file
response = self.recv()
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/connection.py", line 38, in recv
msg = iRODSMessage.recv(self.socket)
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/message/__init__.py", line 24, in recv
rsp_header_size = struct.unpack(">i", rsp_header_size)[0]
struct.error: unpack requires a string argument of length 4
I've been trying to see if there was a similar solution to PR #1 but after scouring the source of irods-php, irods, and the rest of the python client I'm not able to figure out what it is. irods lists the API numbers for DATA_OBJ_CLOSE_AN
as being the original ones iniRODS/lib/api/include/apiNumber.hpp
, but python-irodsclient lists them as the 201 variants. Regardless, I have been unable to fix it, although I have succeeded in being told I've mal-packed the struct.
A possibly related issue is that if I create the object and pass on any error returned, the object does show up, but if I try to get it:
Traceback (most recent call last):
File "check_client.py", line 22, in <module>
obj = sess.data_objects.get("/tempZone/home/rods/test1")
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/resource_manager/data_object_manager.py", line 20, in get
parent = self.sess.collections.get(dirname(path))
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/resource_manager/collection_manager.py", line 12, in get
result = query.one()
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/query.py", line 132, in one
results = self.execute()
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/query.py", line 121, in execute
result_message = conn.recv()
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/connection.py", line 38, in recv
msg = iRODSMessage.recv(self.socket)
File "/home/mturk/yt-x86_64/lib/python2.7/site-packages/irods/message/__init__.py", line 24, in recv
rsp_header_size = struct.unpack(">i", rsp_header_size)[0]
struct.error: unpack requires a string argument of length 4
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.