fsspec / gdrivefs Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 17.0 77 KB

Google drive implementation of fsspec

License: BSD 2-Clause "Simplified" License

Python 87.34% Jupyter Notebook 12.40% Dockerfile 0.26%

gdrivefs's People

Contributors

Stargazers

Watchers

Forkers

martindurant aolwas tjcrone isidentical k4rth33k python-repository-hub climateengine amalzubidat ethicalsecurity-agency sehnem jimtje stanleykao72 american-institutes-for-research

gdrivefs's Issues

Daily limit for unauthenticated use exceeded

I'm probably doing something wrong here but I'm trying to open a Google Drive file for read-only anonymous access and I appear to be getting an error related to a daily limit. Currently on fsspec=0.8.4 and gdrivefs=HEAD. Am I doing it wrong? I would be happy to help with this if more needs to be done:

of = fsspec.open('gdrive://1LpzE8MEUYitOJeMvJAJOUMJDMVbh_GSr', mode='rb', token='anon')

---------------------------------------------------------------------------
HttpError                                 Traceback (most recent call last)
<ipython-input-7-fec500f75b47> in <module>
----> 1 of = fsspec.open('gdrive://1LpzE8MEUYitOJeMvJAJOUMJDMVbh_GSr' class="ansi-blue-fg">, mode='rb', token='anon')

/srv/conda/lib/python3.7/site-packages/fsspec/core.py in open(urlpath, mode, compression, encoding, errors, protocol, newline, **kwargs)
    436         newline=newline,
    437         expand=False,
--> 438         **kwargs
    439     )[0]
    440 

/srv/conda/lib/python3.7/site-packages/fsspec/core.py in open_files(urlpath, mode, compression, encoding, errors, name_function, num, protocol, newline, auto_mkdir, expand, **kwargs)
    285         storage_options=kwargs,
    286         protocol=protocol,
--> 287         expand=expand,
    288     )
    289     if "r" not in mode and auto_mkdir:

/srv/conda/lib/python3.7/site-packages/fsspec/core.py in get_fs_token_paths(urlpath, mode, num, name_function, storage_options, protocol, expand)
    608             )
    609         update_storage_options(options, storage_options)
--> 610         fs = cls(**options)
    611         paths = expand_paths_if_needed(paths, mode, num, fs, name_function)
    612 

/srv/conda/lib/python3.7/site-packages/fsspec/spec.py in __call__(cls, *args, **kwargs)
     56             return cls._cache[token]
     57         else:
---> 58             obj = super().__call__(*args, **kwargs)
     59             # Setting _fs_token here causes some static linters to complain.
     60             obj._fs_token_ = token

/srv/conda/lib/python3.7/site-packages/gdrivefs/core.py in __init__(self, root_file_id, token, access, spaces, **kwargs)
     70         self.root_file_id = root_file_id or 'root'
     71         self.connect(method=token)
---> 72         self.ls("")
     73 
     74     def connect(self, method=None):

/srv/conda/lib/python3.7/site-packages/gdrivefs/core.py in ls(self, path, detail, trashed)
    167                 file_id = self.path_to_file_id(path, trashed=trashed)
    168             files = self._list_directory_by_id(file_id, trashed=trashed,
--> 169                                                path_prefix=path)
    170             self.dircache[path] = files
    171         else:

/srv/conda/lib/python3.7/site-packages/gdrivefs/core.py in _list_directory_by_id(self, file_id, trashed, path_prefix)
    186             response = self.service.list(q=query,
    187                                          spaces=self.spaces, fields=afields,
--> 188                                          pageToken=page_token).execute()
    189             for f in response.get('files', []):
    190                 all_files.append(_finfo_from_response(f, path_prefix))

/srv/conda/lib/python3.7/site-packages/googleapiclient/_helpers.py in positional_wrapper(*args, **kwargs)
    132                 elif positional_parameters_enforcement == POSITIONAL_WARNING:
    133                     logger.warning(message)
--> 134             return wrapped(*args, **kwargs)
    135 
    136         return positional_wrapper

/srv/conda/lib/python3.7/site-packages/googleapiclient/http.py in execute(self, http, num_retries)
    913             callback(resp)
    914         if resp.status >= 300:
--> 915             raise HttpError(resp, content, uri=self.uri)
    916         return self.postproc(resp, content)
    917 

HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files?q=%27root%27+in+parents++and+trashed+%3D+false+&spaces=drive&fields=nextPageToken%2C+files%28name%2Cid%2Csize%2Cdescription%2Ctrashed%2CmimeType%2Cversion%2CcreatedTime%2CmodifiedTime%29&alt=json returned "Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup.". Details: "Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup.">

Any plans on making a release / publishing the package on PyPI

Apparently, the name of the package on PyPI is already in use, so maybe it can be published under something else like gdrive-spec?

License and shared-drive

Hi, two questions:

Can I assume this is under BSD?

Also, Could we get sharred drive support added? If you give me pointers I can do it and push a PR

https://developers.google.com/drive/api/v3/enable-shareddrives?hl=en

Module 'gdrivefs' has no attribute 'GoogleDriveFileSystem'

I installed
pip install gdrivefs

my filesystem:
fs = fsspec.filesystem('gdrive')

Then get error `Module 'gdrivefs' has no attribute 'GoogleDriveFileSystem'

my env:

python==3.11.8
fsspec==2024.3.1

I check my gdrivefs in my local. It is different from your gdrivefs. But I didn't know how to install gdrivefs that is supported by fsspec

Ls only lists files but not directories

gdfs.ls('/') only lists the files in the top-level directory, but does not list subdirectories. I cannot find any function that lists sub-directories. Am I doing it wrong? If so, what is the proper way to list files and also subdirectories? If this is not proper functionality I'm happy to help fix. Thanks. In this example root_file_id points to a folder with two files and one subfolder:

gdfs = gdrivefs.GoogleDriveFileSystem(root_file_id='1PCBDhk5f3v5PoPCY3Rdcqgy4S_Yj2kCC', token='cache')
files = gdfs.ls('/')
files

['CRND0103-2017-NY_Millbrook_3_W.csv', 'CRND0103-2017-NY_Millbrook_3_W.nc']

enable sharing and to get URL

See https://developers.google.com/drive/api/v3/manage-sharing#python

Pagination for listing files not working as intended

The ls and find functions only return 100 files for any folder that has >100 files.
To replicate the issue:

import fsspec
fs = fsspec.filesystem("gdrive", root_file_id="<folder_id_with_more_than_100_files>")
print(len(fs.find("/")) # or print(len(fs.find("/")))

The reason could be that during pagination in _list_directory_by_id function in core.py the loop is broken if the incompleteSearch field is False in the response or the nextPageToken field is non-existent. This will not work when there are more files to be listed i.e the nextPageToken exists and the incompleteSearch field is False because the search was complete as the listing did not occur over multiple drives.
Reference: https://developers.google.com/drive/api/v3/reference/files/list#response

General question about project interaction with fsspec?

Hi,
I am discovering this project and would like to know about its status?
It is presented as a fsspec implementation.
Is it connected to fsspec and can files be managed on Google Drive by using fsspec?
With fsspec, can we pip install fsspec[gdrive] ?

I have been working with Google Drive API v3 a while ago, relying on service account to provide a connector to Google Drive within cryptostore project (you can see the documentation page here).
Is gdrivefs able to manage connection by service account? (wondering if this connector I did could not be replaced with fsspec / gdrivepec?)

Thanks in advance for any feedbacks!
Bests

The removal of versioneer breaks fsspec discovery of gdrivefs

In #37 the _version.py file was deleted, but the __init__.py file still contains from ._version import get_versions.

Consequently, fsspec fails to register gdrivefs.

ModuleNotFoundError: No module named 'gdrivefs._version'

Stack Trace:
  File "lib/python3.10/site-packages/fsspec/registry.py", line 236, in get_filesystem_class
    register_implementation(protocol, _import_class(bit["class"]))
  File "lib/python3.10/site-packages/fsspec/registry.py", line 271, in _import_class
    mod = importlib.import_module(mod)
  File "lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "lib/python3.10/site-packages/gdrivefs/__init__.py", line 2, in <module>
    from ._version import get_versions

Example script gives error

I installed gdrivefs using pip. When I run your example script (in example_w_xarray_zarr.ipynb):

import gdrivefs
# use this the first time you run
token = 'browser'
# use this on subsequent attempts
#token = 'cache'

# shareable link to folder generated with
# https://drive.google.com/open?id=1FQzXM2E28WF6fV7vy1K7HdxNV-w6z_Wx
root_file_id = '1FQzXM2E28WF6fV7vy1K7HdxNV-w6z_Wx'

gdfs = gdrivefs.GoogleDriveFileSystem(token=token, root_file_id=root_file_id)
gdfs

I get the following error:
AttributeError: module 'gdrivefs' has no attribute 'GoogleDriveFileSystem'

"This app is blocked" on read_only access

Hi!
I tried the following code snippet:

import gdrivefs
from urllib.parse import urlparse

repo_url = 'https://drive.google.com/drive/folders/some-folder-id?usp=sharing'
parsed = urlparse(repo_url)
folder_id = parsed.path.split('/')[-1]
gdfs = gdrivefs.GoogleDriveFileSystem(token='browser', root_file_id=folder_id, access='read_only')
print(gdfs.ls(""))

On running the above, I get an OAuth2 URL which I use to authorize the application (PyData Authentication). Upon visiting the link, I get the following error:

Kindly note the access='read-only' argument. I don't get the error with access='full_control'.
Any and all help will be appreciated! Thanks!

Support for Shared Drives

Currently, gdrivefs doesn't support shared drives.

I have a setup like:

    root_folder: str = "gdrive://Discovery Folder/Worksheets"
    storage_options: dict = {
        "token": "service_account",
        "access": "read_only",
        "creds": json.loads(os.environ["GOOGLE_APPLICATION_CREDENTIALS"]),
        "root_file_id": "0123456789ABCDEFGH",
    }

If I attempt to access that file (using commit 2b48baa), I get the error:

FileNotFoundError: Directory 0123456789ABCDEFGH has no child named Discovery Folder

  File "./pipelines/assets/base.py", line 210, in original_files
    with p.fs.open(p.path, mode="rb") as f:
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 1295, in open
    f = self._open(
  File "./lib/python3.10/site-packages/gdrivefs/core.py", line 249, in _open
    return GoogleDriveFile(self, path, mode=mode, **kwargs)
  File "./lib/python3.10/site-packages/gdrivefs/core.py", line 270, in __init__
    super().__init__(fs, path, mode, block_size, autocommit=autocommit,
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 1651, in __init__
    self.size = self.details["size"]
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 1664, in details
    self._details = self.fs.info(self.path)
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 662, in info
    out = self.ls(path, detail=True, **kwargs)
  File "./lib/python3.10/site-packages/gdrivefs/core.py", line 174, in ls
    files = self._ls_from_cache(path)
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 372, in _ls_from_cache
    raise FileNotFoundError(path)

The root_file_id is set to the folder id of a GDrive Shared Drive (i.e. https://support.google.com/a/users/answer/7212025?hl=en).

As per https://developers.google.com/drive/api/guides/enable-shareddrives#:~:text=The%20supportsAllDrives%3Dtrue%20parameter%20informs,require%20additional%20shared%20drive%20functionality. we need to set supportsAllDrives=True and includeItemsFromAllDrives=True when calling files.list in order for the API client to find the files.

In my case, if I change the existing:

    def _list_directory_by_id(self, file_id, trashed=False, path_prefix=None):
        all_files = []
        page_token = None
        afields = 'nextPageToken, files(%s)' % fields
        query = f"'{file_id}' in parents  "
        if not trashed:
            query += "and trashed = false "
        while True:
            response = self.service.list(q=query,
                                         spaces=self.spaces, fields=afields,
                                         pageToken=page_token,
                                         ).execute()
            for f in response.get('files', []):
                all_files.append(_finfo_from_response(f, path_prefix))
            more = response.get('incompleteSearch', False)
            page_token = response.get('nextPageToken', None)
            if page_token is None:
                break
        return all_files

    def _list_directory_by_id(self, file_id, trashed=False, path_prefix=None):
        all_files = []
        page_token = None
        afields = 'nextPageToken, files(%s)' % fields
        query = f"'{file_id}' in parents  "
        if not trashed:
            query += "and trashed = false "
        while True:
            response = self.service.list(
                q=query,
                spaces=self.spaces, fields=afields,
                pageToken=page_token,
                includeItemsFromAllDrives=True,  # Required for shared drive support
                supportsAllDrives=True,    # Required for shared drive support
            ).execute()
            for f in response.get('files', []):
                all_files.append(_finfo_from_response(f, path_prefix))
            more = response.get('incompleteSearch', False)
            page_token = response.get('nextPageToken', None)
            if page_token is None:
                break
        return all_files

(note the change in the call to self.service.list)

then my code works, and the filesystem can find the file and open it successfully.

I am happy to prepare an MR, but you would need to decide whether you are happy for me to enable shared drive support in all cases, or whether you want to control it via storage_options. And if via storage_options whether it should default to off (completely backwards compatible) or on (may show new files to existing users with shared drives that they don't currently get returned from gdrivefs).

IOError for any file larger than 8 MB when using `put_file`

gdfs.put_file("./pbmc3k.h5ad", "test.zarr/pbmc3k.h5ad")

File ~\Apps\Miniconda3\envs\nbproject\lib\site-packages\fsspec\spec.py:818, in AbstractFileSystem.put_file(self, lpath, rpath, callback, **kwargs)
    816 while f1.tell() < size:
    817     data = f1.read(self.blocksize)
--> 818     segment_len = f2.write(data)
    819     callback.relative_update(segment_len)

File ~\Apps\Miniconda3\envs\nbproject\lib\site-packages\fsspec\spec.py:1491, in AbstractBufferedFile.write(self, data)
   1489 self.loc += out
   1490 if self.buffer.tell() >= self.blocksize:
-> 1491     self.flush()
   1492 return out

File ~\Apps\Miniconda3\envs\nbproject\lib\site-packages\fsspec\spec.py:1532, in AbstractBufferedFile.flush(self, force)
   1529         self.closed = True
   1530         raise
-> 1532 if self._upload_chunk(final=force) is not False:
   1533     self.offset += self.buffer.seek(0, 2)
   1534     self.buffer = io.BytesIO()

File c:\users\sergei.rybakov\projects\gdrivefs\gdrivefs\core.py:323, in GoogleDriveFile._upload_chunk(self, final)
    321 else:
    322     print(head)
--> 323     raise IOError
    324 return True

OSError:

Only 8 MB of any larger file seems to be written

The response header is this
{'content-type': 'text/plain; charset=utf-8', 'x-guploader-uploadid': 'ADPycduaxaTB7yWT7UfaP0PupuzS4l1YcH0zTlU0tuGvn4fVm-htDaw2faGi923TuPtEDW64fYmXoXjIOLFui3QOWOVj', 'range': 'bytes=0-8388607', 'x-range-md5': '88e4c1dfd5e74cc994d6b8a66f8cd72c', 'content-length': '0', 'date': 'Sun, 15 Jan 2023 22:10:29 GMT', 'server': 'UploadServer', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"', 'status': '308'}

How do I write with a mapper?

Although writing was implemented in #2, I can't figure out how to use it with zarr.

If, at the end of the example notebook, I run

mapper = gdfs.get_mapper('/woa_t_an_COPY.zarr/')
dsl.to_zarr(mapper)

I get the error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-7-850afca92d3b> in <module>
      1 mapper = gdfs.get_mapper('/woa_t_an_COPY.zarr/')
----> 2 dsl.to_zarr(mapper)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim)
   1614             compute=compute,
   1615             consolidated=consolidated,
-> 1616             append_dim=append_dim,
   1617         )
   1618 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim)
   1317         synchronizer=synchronizer,
   1318         group=group,
-> 1319         consolidate_on_close=consolidated,
   1320     )
   1321     zstore.append_dim = append_dim

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/zarr.py in open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close)
    258             zarr_group = zarr.open_consolidated(store, **open_kwargs)
    259         else:
--> 260             zarr_group = zarr.open_group(store, **open_kwargs)
    261         return cls(zarr_group, consolidate_on_close)
    262 

/srv/conda/envs/notebook/lib/python3.7/site-packages/zarr/hierarchy.py in open_group(store, mode, cache_attrs, synchronizer, path, chunk_store)
   1131             err_contains_group(path)
   1132         else:
-> 1133             init_group(store, path=path, chunk_store=chunk_store)
   1134 
   1135     # determine read only status

/srv/conda/envs/notebook/lib/python3.7/site-packages/zarr/storage.py in init_group(store, overwrite, path, chunk_store)
    430     # initialise metadata
    431     _init_group_metadata(store=store, overwrite=overwrite, path=path,
--> 432                          chunk_store=chunk_store)
    433 
    434 

/srv/conda/envs/notebook/lib/python3.7/site-packages/zarr/storage.py in _init_group_metadata(store, overwrite, path, chunk_store)
    451     meta = dict()
    452     key = _path_to_prefix(path) + group_meta_key
--> 453     store[key] = encode_group_metadata(meta)
    454 
    455 

/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/mapping.py in __setitem__(self, key, value)
     94         self.fs.mkdirs(self.fs._parent(key), exist_ok=True)
     95         with self.fs.open(key, "wb") as f:
---> 96             f.write(value)
     97 
     98     def keys(self):

/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/spec.py in __exit__(self, *args)
   1159 
   1160     def __exit__(self, *args):
-> 1161         self.close()

/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/spec.py in close(self)
   1127         else:
   1128             if not self.forced:
-> 1129                 self.flush(force=True)
   1130 
   1131             if self.fs is not None:

/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/spec.py in flush(self, force)
   1002             # Initialize a multipart upload
   1003             self.offset = 0
-> 1004             self._initiate_upload()
   1005 
   1006         if self._upload_chunk(final=force) is not False:

~/gdrivefs/core.py in _initiate_upload(self)
    311     def _initiate_upload(self):
    312         """ Create multi-upload """
--> 313         parent_id = self.fs.path_to_file_id(self.fs._parent(self.path))
    314         head = {"Content-Type": "application/json; charset=UTF-8"}
    315         # also allows description, MIME type, version, thumbnail...

~/gdrivefs/core.py in path_to_file_id(self, path, parent, trashed)
    181             parent = self.root_file_id
    182         top_file_id = self._get_directory_child_by_name(items[0], parent,
--> 183                                                         trashed=trashed)
    184         if len(items) == 1:
    185             return top_file_id

~/gdrivefs/core.py in _get_directory_child_by_name(self, child_name, directory_file_id, trashed)
    199         if len(possible_children) == 0:
    200             raise FileNotFoundError(
--> 201                 f'Directory {directory_file_id} has no child '
    202                 f'named {child_name}')
    203         if len(possible_children) == 1:

FileNotFoundError: Directory 1FQzXM2E28WF6fV7vy1K7HdxNV-w6z_Wx has no child named woa_t_an_COPY.zarr

We need to be creating a new directory somehow.

Writes to Zarr using gdrivefs extremely slow

I am writing a Zarr file to Google Drive using gdrivefs and Xarray, and the writes are extremely slow. The file is local to my notebook and is about 1.5 GB in size. It has a lot of variables (89), and it is chunked by day into 365 days. It has been working for about an hour and is approximately 10% finished. Here are the steps I am using:

ds = xr.open_zarr('ICE.2000.01-12.c41.zarr', consolidated=True, decode_times=False, chunks=False,
                  decode_cf=False, mask_and_scale=True)
gdfs = gdrivefs.GoogleDriveFileSystem(root_file_id='1PCBDhk5f3v5PoPCY3Rdcqgy4S_Yj2kCC', token='cache')
mapper = gdfs.get_mapper('ICE.2000.01-12.c41.zarr')
ds.to_zarr(mapper, compute=True, consolidated=True, encoding={})

Anything obvious that I am doing wrong here? Thanks.

cc: @raf-antwerpen

Anonymous access

The link below is a public folder which I can view and access over the web without being logged in to google:

https://drive.google.com/drive/folders/1FQzXM2E28WF6fV7vy1K7HdxNV-w6z_Wx?usp=sharing

The way gdrivefs works now, it's not possible to use in anonymous, un-authenticated mode. What would it take to enable this?

License for this repo

Hi everyone,

Thank you for imeplementing this! Awesome. Can I assume this is under BSD as well like other intake projects? If so, could you push a license file?

All the best,

Huu

investigate pydrive2

See https://github.com/iterative/PyDrive2 - claims easy oauth usage and simplified API compared to the google official lib.

"This app isn't verified" warning from google oauth

When I sign in with token='browser', google shows me this

I can eventually make it in, but it isn't very confidence-inspiring. I can't figure out how to verify my app. Should be using a different API key maybe?

Access gdrive using service account credentials

Is there any support for authentication using service account credentials in gdrivefs?

writing

The write API seems to be very similar to the GCS case: https://developers.google.com/drive/api/v3/manage-uploads#resumable

that is the complex version with chunks, such as Dask would like to use. However, for the simpler in-one upload, such as zarr would typically want, the files().create method seems to handle things easily.

Doesn't seem to work with distributed.

I tried some basic stuff with a dask_kubernetes on ocean.pangeo.io. No luck.

I created a cluster and connected to it, created a gdrivefs, and the tried to read / write via xarray. I immediately get a KilledWorker.

Sorry for not providing a reproducible example. The only example I know how to make is probably too complicated. I figured you would know how to do a proper test of distributed instead of whatever hack I come up with.

Retry logic for 403 actions

While integrating gdrivefs to a test suite, I noticed that it doesn't handle rate limit exceeding errors in the exponential waiting fashion (https://developers.google.com/drive/api/v3/handle-errors#exponential-backoff). Would it make sense if I contribute a similiar (maybe without using funcy.retry, so no dependencies would be introduced) decorator to the example below? So that this would be handled for all use cases (which is very important, if you are making simultaneous requests [e.g during CI, 10 workers], without a retry logic it just fails but with this it succeeds on one of the tries).

https://github.com/iterative/dvc/blob/63f32936b20c23abc32e9dba1aba19ab5db804e9/tests/remotes/gdrive.py#L19-L47

I/O operation on closed file

I am trying to open an Xarray dataset on Google Drive. No problem listing the contents of a directory or reading a Pandas dataframe with read_csv(). However xr.open_dataset() causes a strange "I/O operation on closed file" error, not when it is read opened, but when I try to print out the details. It appears to print some of the details and then error out with "closed file". Any idea what I am doing wrong? Thanks. I'm at HEAD on gdrivefs and 0.8.4 on fsspec. I believe this would be a working example for anyone:

gdfs = gdrivefs.GoogleDriveFileSystem(root_file_id='1PCBDhk5f3v5PoPCY3Rdcqgy4S_Yj2kCC', token='cache')
of = gdfs.open('CRND0103-2017-NY_Millbrook_3_W.nc')
with of as f:
    ds = xr.open_dataset(f)
ds

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/dataset.py in _repr_html_(self)
   1665         if OPTIONS["display_style"] == "text":
   1666             return f"<pre>{escape(repr(self))}</pre>"
-> 1667         return formatting_html.dataset_repr(self)
   1668 
   1669     def info(self, buf=None) -> None:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting_html.py in dataset_repr(ds)
    277         dim_section(ds),
    278         coord_section(ds.coords),
--> 279         datavar_section(ds.data_vars),
    280         attr_section(ds.attrs),
    281     ]

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting_html.py in _mapping_section(mapping, name, details_func, max_items_collapse, enabled)
    167     return collapsible_section(
    168         name,
--> 169         details=details_func(mapping),
    170         n_items=n_items,
    171         enabled=enabled,

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting_html.py in summarize_vars(variables)
    133     vars_li = "".join(
    134         f"<li class='xr-var-item'>{summarize_variable(k, v)}</li>"
--> 135         for k, v in variables.items()
    136     )
    137 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting_html.py in <genexpr>(.0)
    133     vars_li = "".join(
    134         f"<li class='xr-var-item'>{summarize_variable(k, v)}</li>"
--> 135         for k, v in variables.items()
    136     )
    137 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting_html.py in summarize_variable(name, var, is_index, dtype, preview)
    108     preview = preview or escape(inline_variable_array_repr(variable, 35))
    109     attrs_ul = summarize_attrs(var.attrs)
--> 110     data_repr = short_data_repr_html(variable)
    111 
    112     attrs_icon = _icon("icon-file-text2")

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting_html.py in short_data_repr_html(array)
     22         return internal_data._repr_html_()
     23     else:
---> 24         text = escape(short_data_repr(array))
     25         return f"<pre>{text}</pre>"
     26 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting.py in short_data_repr(array)
    461         return limit_lines(repr(array.data), limit=40)
    462     elif array._in_memory or array.size < 1e5:
--> 463         return short_numpy_repr(array)
    464     else:
    465         # internal xarray array type

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting.py in short_numpy_repr(array)
    435 
    436 def short_numpy_repr(array):
--> 437     array = np.asarray(array)
    438 
    439     # default to lower precision so a full (abbreviated) line can fit on

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/common.py in __array__(self, dtype)
    130 
    131     def __array__(self: Any, dtype: DTypeLike = None) -> np.ndarray:
--> 132         return np.asarray(self.values, dtype=dtype)
    133 
    134     def __repr__(self) -> str:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/variable.py in values(self)
    455     def values(self):
    456         """The variable's data as a numpy.ndarray"""
--> 457         return _as_array_or_item(self._data)
    458 
    459     @values.setter

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/variable.py in _as_array_or_item(data)
    258     TODO: remove this (replace with np.asarray) once these issues are fixed
    259     """
--> 260     data = np.asarray(data)
    261     if data.ndim == 0:
    262         if data.dtype.kind == "M":

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    675 
    676     def __array__(self, dtype=None):
--> 677         self._ensure_cached()
    678         return np.asarray(self.array, dtype=dtype)
    679 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/indexing.py in _ensure_cached(self)
    672     def _ensure_cached(self):
    673         if not isinstance(self.array, NumpyIndexingAdapter):
--> 674             self.array = NumpyIndexingAdapter(np.asarray(self.array))
    675 
    676     def __array__(self, dtype=None):

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    651 
    652     def __array__(self, dtype=None):
--> 653         return np.asarray(self.array, dtype=dtype)
    654 
    655     def __getitem__(self, key):

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    555     def __array__(self, dtype=None):
    556         array = as_indexable(self.array)
--> 557         return np.asarray(array[self.key], dtype=None)
    558 
    559     def transpose(self, order):

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in __getitem__(self, key)
     27     def __getitem__(self, key):
     28         return indexing.explicit_indexing_adapter(
---> 29             key, self.shape, indexing.IndexingSupport.OUTER_1VECTOR, self._getitem
     30         )
     31 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    835     """
    836     raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 837     result = raw_indexing_method(raw_key.tuple)
    838     if numpy_indices.tuple:
    839         # index the loaded np.ndarray

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in _getitem(self, key)
     36         with self.datastore.lock:
     37             array = self.get_array(needs_lock=False)
---> 38             return array[key]
     39 
     40 

/srv/conda/envs/notebook/lib/python3.7/site-packages/h5netcdf/core.py in __getitem__(self, key)
    144 
    145     def __getitem__(self, key):
--> 146         return self._h5ds[key]
    147 
    148     def __setitem__(self, key, value):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

/srv/conda/envs/notebook/lib/python3.7/site-packages/h5py/_hl/dataset.py in __getitem__(self, args)
    571         mspace = h5s.create_simple(mshape)
    572         fspace = selection.id
--> 573         self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
    574 
    575         # Patch up the output for NumPy

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.DatasetID.read()

h5py/_proxy.pyx in h5py._proxy.dset_rw()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dread()

h5py/defs.pyx in h5py.defs.H5Dread()

h5py/h5fd.pyx in h5py.h5fd.H5FD_fileobj_read()

/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/spec.py in readinto(self, b)
   1407         """
   1408         out = memoryview(b).cast("B")
-> 1409         data = self.read(out.nbytes)
   1410         out[: len(data)] = data
   1411         return len(data)

/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/spec.py in read(self, length)
   1392             length = self.size - self.loc
   1393         if self.closed:
-> 1394             raise ValueError("I/O operation on closed file.")
   1395         logger.debug("%s read: %i - %i" % (self, self.loc, self.loc + length))
   1396         if length == 0:

ValueError: I/O operation on closed file.
<xarray.Dataset>
Dimensions:                  (index: 365)
Coordinates:
  * index                    (index) int64 0 1 2 3 4 5 ... 360 361 362 363 364
Data variables:
    WBANNO                   (index) int64 ...
    LST_DATE                 (index) int64 ...
    CRX_VN                   (index) float64 ...
    LONGITUDE                (index) float64 ...
    LATITUDE                 (index) float64 ...
    T_DAILY_MAX              (index) float64 ...
    T_DAILY_MIN              (index) float64 ...
    T_DAILY_MEAN             (index) float64 ...
    T_DAILY_AVG              (index) float64 ...
    P_DAILY_CALC             (index) float64 ...
    SOLARAD_DAILY            (index) float64 ...
    SUR_TEMP_DAILY_TYPE      (index) object ...
    SUR_TEMP_DAILY_MAX       (index) float64 ...
    SUR_TEMP_DAILY_MIN       (index) float64 ...
    SUR_TEMP_DAILY_AVG       (index) float64 ...
    RH_DAILY_MAX             (index) float64 ...
    RH_DAILY_MIN             (index) float64 ...
    RH_DAILY_AVG             (index) float64 ...
    SOIL_MOISTURE_5_DAILY    (index) float64 ...
    SOIL_MOISTURE_10_DAILY   (index) float64 ...
    SOIL_MOISTURE_20_DAILY   (index) float64 ...
    SOIL_MOISTURE_50_DAILY   (index) float64 ...
    SOIL_MOISTURE_100_DAILY  (index) float64 ...
    SOIL_TEMP_5_DAILY        (index) float64 ...
    SOIL_TEMP_10_DAILY       (index) float64 ...
    SOIL_TEMP_20_DAILY       (index) float64 ...
    SOIL_TEMP_50_DAILY       (index) float64 ...
    SOIL_TEMP_100_DAILY      (index) float64 ...