Coder Social home page Coder Social logo

ssgetpy's People

Contributors

drdarshan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ssgetpy's Issues

No connection on import

The package should not attempt to connect to the distant database during the import.

I am trying to package ssgetpy in Guix, but it disables network access when building packages, so the import fails.

Differentiate between real and integer matrices

Currently ssgetpy assigns dtype='real' to both real- and integer-valued matrices from the SuiteSparse Matrix Collection. It would be convenient if ssgetpy instead assigned dtype='integer' to integer-valued matrices.

Slow download speed

Thank you for this project! It's much easier to get matrices now. The only issue I found so far is that it takes days to download big matrices. I've changed the chunk size along with the sleep duration so now it seems to be as fast as downloading with the browser. Here is my change:

with open(localdest, "wb") as outfile, tqdm(
    total=content_length, desc=self.name, unit="B"
) as pbar:
    for chunk in response.iter_content(chunk_size=131072):
        outfile.write(chunk)
        pbar.update(131072)
        time.sleep(0.01)

I'm not sure about exact numbers, but it most probably should be higher than in the trunk. Is there any reason to keep the chunk size that small?

Allow returning all search results

It would be convenient if the ssgetpy API supported returning all matching results of a search query. I've worked around the lack of this feature by setting argument limit of function ssgetpy.search to an arbitrarily large number (e.g. 10โ€‰000). After checking the implementation, it seems to me it would be straightforward to extend the API to return all results when limit=None.

Dry-run option

Thanks for making this and sharing it. PyUFGet was easy to install and easy to use.
One feature I would like is to enable a dry run where the program lists which matrices it will download, but does not actually download them. I would use this feature to make sure my query returns the desired matrices before incurring the cost of an accidental download.

Are you open to --dry-run as a command flag for the CLI?

SSL Error on Import

Executed
import ssgetpy
And got this error:
Using Conda virtual environment.

---------------------------------------------------------------------------
SSLCertVerificationError                  Traceback (most recent call last)
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    698             # Make the request on the httplib connection object.
--> 699             httplib_response = self._make_request(
    700                 conn,

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    381         try:
--> 382             self._validate_conn(conn)
    383         except (SocketTimeout, BaseSSLError) as e:

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
   1009         if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
-> 1010             conn.connect()
   1011 

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connection.py in connect(self)
    415 
--> 416         self.sock = ssl_wrap_socket(
    417             sock=conn,

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/util/ssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
    448     if send_sni:
--> 449         ssl_sock = _ssl_wrap_socket_impl(
    450             sock, context, tls_in_tls, server_hostname=server_hostname

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/util/ssl_.py in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname)
    492     if server_hostname:
--> 493         return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
    494     else:

~/anaconda3/envs/ktb_torch/lib/python3.8/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    499         # ctx._wrap_socket()
--> 500         return self.sslsocket_class._create(
    501             sock=sock,

~/anaconda3/envs/ktb_torch/lib/python3.8/ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
   1039                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
-> 1040                     self.do_handshake()
   1041             except (OSError, ValueError):

~/anaconda3/envs/ktb_torch/lib/python3.8/ssl.py in do_handshake(self, block)
   1308                 self.settimeout(None)
-> 1309             self._sslobj.do_handshake()
   1310         finally:

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    439             if not chunked:
--> 440                 resp = conn.urlopen(
    441                     method=request.method,

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    754 
--> 755             retries = retries.increment(
    756                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    573         if new_retry.is_exhausted():
--> 574             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    575 

MaxRetryError: HTTPSConnectionPool(host='sparse.tamu.edu', port=443): Max retries exceeded with url: /files/ssstats.csv (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))

During handling of the above exception, another exception occurred:

SSLError                                  Traceback (most recent call last)
<ipython-input-4-86ec5188969f> in <module>
----> 1 import ssgetpy

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/ssgetpy/__init__.py in <module>
    101                           matrix/matrices.
    102 """
--> 103 from .query import fetch, search
    104 
    105 __all__ = ["fetch", "search"]

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/ssgetpy/query.py in <module>
      4 
      5 from .config import SS_DIR
----> 6 from .dbinstance import instance
      7 
      8 logger = logging.getLogger(__name__)

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/ssgetpy/dbinstance.py in <module>
     18 ) > datetime.timedelta(days=90):
     19     logger.info("{Re}creating index from CSV file...")
---> 20     instance.refresh(csvindex.generate())

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/ssgetpy/csvindex.py in generate()
     48 
     49 def generate():
---> 50     response = requests.get(SS_INDEX_URL)
     51     lines = response.iter_lines()
     52 

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/api.py in get(url, params, **kwargs)
     73     """
     74 
---> 75     return request('get', url, params=params, **kwargs)
     76 
     77 

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/api.py in request(method, url, **kwargs)
     59     # cases, and look like a memory leak in others.
     60     with sessions.Session() as session:
---> 61         return session.request(method=method, url=url, **kwargs)
     62 
     63 

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    527         }
    528         send_kwargs.update(settings)
--> 529         resp = self.send(prep, **send_kwargs)
    530 
    531         return resp

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/sessions.py in send(self, request, **kwargs)
    643 
    644         # Send the request
--> 645         r = adapter.send(request, **kwargs)
    646 
    647         # Total elapsed time of the request (approximately)

~/anaconda3/envs/ktb_torch/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    515             if isinstance(e.reason, _SSLError):
    516                 # This branch is for urllib3 v1.22 and later.
--> 517                 raise SSLError(e, request=request)
    518 
    519             raise ConnectionError(e, request=request)

SSLError: HTTPSConnectionPool(host='sparse.tamu.edu', port=443): Max retries exceeded with url: /files/ssstats.csv (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.