drdarshan / ssgetpy Goto Github PK
View Code? Open in Web Editor NEWA searchable Python interface to the SuiteSparse Matrix Collection
A searchable Python interface to the SuiteSparse Matrix Collection
The package should not attempt to connect to the distant database during the import.
I am trying to package ssgetpy
in Guix, but it disables network access when building packages, so the import fails.
Currently ssgetpy assigns dtype='real'
to both real- and integer-valued matrices from the SuiteSparse Matrix Collection. It would be convenient if ssgetpy instead assigned dtype='integer'
to integer-valued matrices.
Thank you for this project! It's much easier to get matrices now. The only issue I found so far is that it takes days to download big matrices. I've changed the chunk size along with the sleep duration so now it seems to be as fast as downloading with the browser. Here is my change:
with open(localdest, "wb") as outfile, tqdm(
total=content_length, desc=self.name, unit="B"
) as pbar:
for chunk in response.iter_content(chunk_size=131072):
outfile.write(chunk)
pbar.update(131072)
time.sleep(0.01)
I'm not sure about exact numbers, but it most probably should be higher than in the trunk. Is there any reason to keep the chunk size that small?
It would be convenient if the ssgetpy
API supported returning all matching results of a search query. I've worked around the lack of this feature by setting argument limit
of function ssgetpy.search
to an arbitrarily large number (e.g. 10โ000). After checking the implementation, it seems to me it would be straightforward to extend the API to return all results when limit=None
.
Thanks for making this and sharing it. PyUFGet was easy to install and easy to use.
One feature I would like is to enable a dry run where the program lists which matrices it will download, but does not actually download them. I would use this feature to make sure my query returns the desired matrices before incurring the cost of an accidental download.
Are you open to --dry-run as a command flag for the CLI?
Executed
import ssgetpy
And got this error:
Using Conda virtual environment.
---------------------------------------------------------------------------
SSLCertVerificationError Traceback (most recent call last)
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
698 # Make the request on the httplib connection object.
--> 699 httplib_response = self._make_request(
700 conn,
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
381 try:
--> 382 self._validate_conn(conn)
383 except (SocketTimeout, BaseSSLError) as e:
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
1009 if not getattr(conn, "sock", None): # AppEngine might not have `.sock`
-> 1010 conn.connect()
1011
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connection.py in connect(self)
415
--> 416 self.sock = ssl_wrap_socket(
417 sock=conn,
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/util/ssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
448 if send_sni:
--> 449 ssl_sock = _ssl_wrap_socket_impl(
450 sock, context, tls_in_tls, server_hostname=server_hostname
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/util/ssl_.py in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname)
492 if server_hostname:
--> 493 return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
494 else:
~/anaconda3/envs/ktb_torch/lib/python3.8/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
499 # ctx._wrap_socket()
--> 500 return self.sslsocket_class._create(
501 sock=sock,
~/anaconda3/envs/ktb_torch/lib/python3.8/ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
1039 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
-> 1040 self.do_handshake()
1041 except (OSError, ValueError):
~/anaconda3/envs/ktb_torch/lib/python3.8/ssl.py in do_handshake(self, block)
1308 self.settimeout(None)
-> 1309 self._sslobj.do_handshake()
1310 finally:
SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
439 if not chunked:
--> 440 resp = conn.urlopen(
441 method=request.method,
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
754
--> 755 retries = retries.increment(
756 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
573 if new_retry.is_exhausted():
--> 574 raise MaxRetryError(_pool, url, error or ResponseError(cause))
575
MaxRetryError: HTTPSConnectionPool(host='sparse.tamu.edu', port=443): Max retries exceeded with url: /files/ssstats.csv (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
<ipython-input-4-86ec5188969f> in <module>
----> 1 import ssgetpy
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/ssgetpy/__init__.py in <module>
101 matrix/matrices.
102 """
--> 103 from .query import fetch, search
104
105 __all__ = ["fetch", "search"]
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/ssgetpy/query.py in <module>
4
5 from .config import SS_DIR
----> 6 from .dbinstance import instance
7
8 logger = logging.getLogger(__name__)
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/ssgetpy/dbinstance.py in <module>
18 ) > datetime.timedelta(days=90):
19 logger.info("{Re}creating index from CSV file...")
---> 20 instance.refresh(csvindex.generate())
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/ssgetpy/csvindex.py in generate()
48
49 def generate():
---> 50 response = requests.get(SS_INDEX_URL)
51 lines = response.iter_lines()
52
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/api.py in get(url, params, **kwargs)
73 """
74
---> 75 return request('get', url, params=params, **kwargs)
76
77
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/api.py in request(method, url, **kwargs)
59 # cases, and look like a memory leak in others.
60 with sessions.Session() as session:
---> 61 return session.request(method=method, url=url, **kwargs)
62
63
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
527 }
528 send_kwargs.update(settings)
--> 529 resp = self.send(prep, **send_kwargs)
530
531 return resp
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/sessions.py in send(self, request, **kwargs)
643
644 # Send the request
--> 645 r = adapter.send(request, **kwargs)
646
647 # Total elapsed time of the request (approximately)
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
515 if isinstance(e.reason, _SSLError):
516 # This branch is for urllib3 v1.22 and later.
--> 517 raise SSLError(e, request=request)
518
519 raise ConnectionError(e, request=request)
SSLError: HTTPSConnectionPool(host='sparse.tamu.edu', port=443): Max retries exceeded with url: /files/ssstats.csv (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.