Coder Social home page Coder Social logo

sshfs's People

Contributors

aguschin avatar benrutter avatar efiop avatar ianthomas23 avatar isidentical avatar kephale avatar notspecial avatar pmrowla avatar ryaminal avatar shcheklein avatar skshetry avatar uunal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sshfs's Issues

Question regarding call of stat() for parent dir

Hello,
while using sshfs in fsspec.open_files(), I discovered that stat() is called for the parent directory of the wanted files, even if it is already clear that this must be a directory. While this is most certainly not an issue for most cases, the sftp server I have to use behaves somewhat strange regarding this, as I get a permission error when trying to call stat() on these directories.

When using the default sftp implementation from fsspec there is no issue at all, so at least for me it seems that it should be possible without a call to stat(). Is there any way to achieve this with this library as well? I really like to use it because of performance reasons compared to sftp. Thank you!

Checksum command fails if remote doesn't support uname operation

When I try to fs.checksum(path) on a server that does not permit a uname command, I get a generic Channel Open Error: Session request failed. Looking through the debug logs, it seems like the issue is caused by the logic in the _get_system:

https://github.com/fsspec/sshfs/blob/3c10c1bfff44f111926d763a54343726832e2d42/sshfs/spec.py#L295:L300

The server I am working with does support the md5sum and sha1sum commands so the _checksum method as written should work but the actual command never triggers because it errors before that.

Potential Solutions:

  • Provide ability to pass checksum commands to the _checksum() method, this would allow me to specify a known command (so the _get_system() check can be bypassed). Ideally, being able to provide both the remote and the local checksum commands would be even better in the case of a Darwin system speaking with a Linux server.
  • Include error handling logic for _get_system() to provide a more detailed error message.

If I can get some guidance on the preferred approach, and if contributions are welcome. I'm happy to submit a PR.

Thank you!
Pratheek

Changes to be consistent with GenericFileSystem

Hi All.

While looking at using the GenericFileSystem and its rsync function, I've noticed a few inconsistencies that sshfs has with handling paths with the protocol(sftp://).

One fix was done in #43 but there appear to be a few more. One proposed solution was to take this upstream to the implementation of GenericFileSystem(see this discussion). But it is becoming apparent that this should be handled in the filesystem implementation. In this case, sshfs.

I was hoping it would be as easy as adding self._strip_protocol(lpath) to the references in _get_file and _put_file` and related methods, but i think it's a bit more nuanced than a first glance.

I've written a test case or two to illustrate the problem:

def test_path_with_protocol(fs: SSHFileSystem, remote_dir):
    # this would have failed before PR 43
    assert fs.isdir("/.") == fs.isdir("sftp:///.")

    # absolute path
    assert fs._strip_protocol("sftp:///.") == fs._strip_protocol("/.")
    # blah is detected as host and removed, even though a relative path may have been intended?
    assert fs._strip_protocol("sftp://blah") == fs._strip_protocol("")
    # another example of a potentially intended relative path but parsed as absolute
    assert fs._strip_protocol("sftp:///.") == fs._strip_protocol("/.")

The largest problem is with relative paths being passed in. The current _strip_protocol doesn't seem to be relative path friendly. I imagine this is mostly a non-issue for folks not using the GenericFileSystem or anything else that passes in protocol-aware paths. But, according to other FS implementations and the recommendation in this PR to fsspec, filesystem implementations should be able to handle a path with or without a protocol.

Curios on everyone's thoughts on doing this and if there are ideas on the best way to implement.

how to pass path_encoding parameter?

Hi, due to ronf/asyncssh#610 I'm trying to run something like:

fs = fsspec.filesystem(protocol="ssh", path_encoding=None, **storage_options)
print(fs.ls("xxx")[:10])

but path_encoding is not accepted by connect method called from asyncssh library.
Do you have an idea how to solve it?

`get_file` behaves differently using `SSHFileSystem` vs `LocalFileSystem `

I've implemented the AbstractFileSystem in my code in order to direct my application either to the local file system or file system over SSH. Only I noticed that the behavior of get_file is different in both. I wrote this little script to test and demonstrate.

from fsspec.implementations.local import LocalFileSystem
from sshfs import SSHFileSystem

ssh_fs = SSHFileSystem(
    "localhost",
    username="foobar",
    password="foobar",
)
ssh_fs.get_file("/tmp/foobar", ".")

local_fs = LocalFileSystem()
local_fs.get_file("/tmp/foobar2", ".")

I would expect that calling get_file on either with similar parameters would result the copying of the requested file to the local folder. Only the LocalFileSystem implementation results in an error:

IsADirectoryError: [Errno 21] Is a directory: '/home/west/Research/sshfs/.'

The LocalFileSystem implementation requires a full file path:

local_fs.get_file("/tmp/foobar2", "./foobar2")

I seems to me that the implemenation of get_file in SSHFileSystem does not follow the fsspec API. Or am I missing something?

Registering with fsspec, and speed relative to SFTP

Hi - I'm interested in using sshfs as a faster alternative to the builtin sftp filesystem in fsspec (and also need server side copy) in Runhouse, a compute and data sharing layer for ML. It appears to me that sshfs still is not built-into fsspec and I need to register it as suggested here to use it with apis like fsspec.open(). A few questions I couldn't figure out:

  1. Why has sshfs not yet been made a builtin implementation of fsspec, nor register itself upon installation like other non-builtins? Is it due to some stability or hardness bar it hasn't yet reached?
  2. Will it indeed be faster than the builtin SFTPFileSystem? I see that SSHFileSystem is faster than Paramiko, but can't tell if there's any reason it'd be faster than SFTP.

Corrupted files when using `get()`

I'm not able to debug this issue further, I can only share it.

Some files (.zip archives) are corrupted when using:

ssh.get(ssh_file["name"], "/tmp/")

What's important is the same files are corrupted the same way, it's not random at all, when tested multiple times. Files which are not corrupted are not currupted always also.

The fix was in this case to switch to https://github.com/althonos/fs.sshfs (completely solved the issue)

synchronous `rmdir()` fails silently

Hello.

Trying to remove directories using SSHFileSystem via rmdir fails silently.

Appears to be missing the synchronous wrapper for _rmdir. i.e. the equivalent of: mkdir = sync_wrapper(_mkdir) and thus ends up all the way in AbstractFileSystem.rmdir which is implemented as pass # not necessary to implement, may not have directories.

A local test of adding the sync_wrapper works ok so far.

No permissions on root level causes `SFTPPermissionDenied`

In my project I'm connected to a SFTP server I don't own. I just have rights to a few folders. Using put_file was throwing the following error:

  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/fsspec/asyn.py", line 85, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           │    │    │     │      │       └ {}
           │    │    │     │      └ ('/home/west/Projects/Abel/invoice-processor/files/odoo_downloads/742/8713783500248_F-2022-00082.xml', '/out/invoice/87137835...
           │    │    │     └ <bound method SSHFileSystem._put_file of <sshfs.spec.SSHFileSystem object at 0x7f4523e48a30>>
           │    │    └ <property object at 0x7f45252ec770>
           │    └ <sshfs.spec.SSHFileSystem object at 0x7f4523e48a30>
           └ <function sync at 0x7f45252ee5e0>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/fsspec/asyn.py", line 65, in sync
    raise return_result
          └ SFTPPermissionDenied('')
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
    │                 └ <coroutine object SSHFileSystem._put_file at 0x7f452473cf40>
    └ [SFTPPermissionDenied('')]
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
                 │     │       └ {}
                 │     └ (<sshfs.spec.SSHFileSystem object at 0x7f4523e48a30>, '/home/west/Projects/Abel/invoice-processor/files/odoo_downloads/742/87...
                 └ <function SSHFileSystem._put_file at 0x7f4525357e50>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/sshfs/spec.py", line 169, in _put_file
    await self._makedirs(self._parent(rpath), exist_ok=True)
          │    │         │    │       └ '/out/invoice/8713783500248_F-2022-00082.xml'
          │    │         │    └ <classmethod object at 0x7f4525544070>
          │    │         └ <sshfs.spec.SSHFileSystem object at 0x7f4523e48a30>
          │    └ <function SSHFileSystem._makedirs at 0x7f452535a4c0>
          └ <sshfs.spec.SSHFileSystem object at 0x7f4523e48a30>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
                 │     │       └ {'exist_ok': True}
                 │     └ (<sshfs.spec.SSHFileSystem object at 0x7f4523e48a30>, '/out/invoice')
                 └ <function SSHFileSystem._makedirs at 0x7f452535a430>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/sshfs/spec.py", line 231, in _makedirs
    await channel.makedirs(path, exist_ok=exist_ok, attrs=attrs)
          │       │        │              │               └ SFTPAttrs(type=5, size=None, alloc_size=None, uid=None, gid=None, owner=None, group=None, permissions=511, atime=None, atime_...
          │       │        │              └ True
          │       │        └ '/out/invoice'
          │       └ <function SFTPClient.makedirs at 0x7f452571dc10>
          └ <asyncssh.sftp.SFTPClient object at 0x7f4523df9ac0>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/asyncssh/sftp.py", line 4045, in makedirs
    await self.mkdir(curpath, attrs)
          │    │     │        └ SFTPAttrs(type=5, size=None, alloc_size=None, uid=None, gid=None, owner=None, group=None, permissions=511, atime=None, atime_...
          │    │     └ b'/'
          │    └ <function SFTPClient.mkdir at 0x7f4525722ee0>
          └ <asyncssh.sftp.SFTPClient object at 0x7f4523df9ac0>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/asyncssh/sftp.py", line 4989, in mkdir
    await self._handler.mkdir(path, attrs)
          │    │        │     │     └ SFTPAttrs(type=5, size=None, alloc_size=None, uid=None, gid=None, owner=None, group=None, permissions=511, atime=None, atime_...
          │    │        │     └ b'/'
          │    │        └ <function SFTPClientHandler.mkdir at 0x7f452571a040>
          │    └ <asyncssh.sftp.SFTPClientHandler object at 0x7f4523df9af0>
          └ <asyncssh.sftp.SFTPClient object at 0x7f4523df9ac0>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/asyncssh/sftp.py", line 2769, in mkdir
    await self._make_request(FXP_MKDIR, String(path),
          │    │             │          │      └ b'/'
          │    │             │          └ <function String at 0x7f4526581310>
          │    │             └ 14
          │    └ <function SFTPClientHandler._make_request at 0x7f45257181f0>
          └ <asyncssh.sftp.SFTPClientHandler object at 0x7f4523df9af0>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/asyncssh/sftp.py", line 2370, in _make_request
    result = self._packet_handlers[resptype](self, resp)
             │    │                │         │     └ <asyncssh.packet.SSHPacket object at 0x7f4523f13190>
             │    │                │         └ <asyncssh.sftp.SFTPClientHandler object at 0x7f4523df9af0>
             │    │                └ 101
             │    └ {101: <function SFTPClientHandler._process_status at 0x7f4525718280>, 102: <function SFTPClientHandler._process_handle at 0x7...
             └ <asyncssh.sftp.SFTPClientHandler object at 0x7f4523df9af0>
  File "/home/west/venvs/invoice-processor/lib/python3.8/site-packages/asyncssh/sftp.py", line 2386, in _process_status
    raise exc
          └ SFTPPermissionDenied('')

Turns out it is trying to the folder /, since I don't permissions on this level it returns a SFTPPermissionDenied. I've commented out the following line:

await self._makedirs(self._parent(rpath), exist_ok=True)

This fixes it for now, any ideas how to go about this?

move to fsspec org

There was talk of this and other fsspec-compatible implementations being transferred to github.com/fsspec . No rush, merely recording what was previously suggested.

Initialisation seems to maintain cached filesystem

I'm not 100% sure that the title here, is accurate, as it involves a bit more understanding of what's happening under the hood with asyncssh than I have so far.

I'm also not sure if this is intended behaviour vs actually a bug (sorry!)

The issue is something like this:

fs = SSHFileSystem(host, username=username, password=password)
for filepath in long_list_of_files:
    with fs.open(filepath) as file:
        _ = file.read()

If this runs for a long time, the connection might be shut down from the other side throwing up an asnycssh.sftp.SFTPNoConnection error, so far this is all as expected.

The bit that seems unusual is that something like this:

fs = SSHFileSystem(host, username=username, password=password)
for filepath in long_list_of_files:
    try:
        with fs.open(filepath) as file:
            _ = file.read()
    except SFTPNoConnection:
        new_fs = SSHFileSystem(host, username=username, password=password)
        with new_fs.open(filepath) as file:
            _ = file.read()

The new_fs will always throw up the same SFTPNoConnection error, which seems to be because something behind the scenes is being cached?

Notably, the following works by clearing the cache before reconnecting:

fs = SSHFileSystem(host, username=username, password=password)
for filepath in long_list_of_files:
    try:
        with fs.open(filepath) as file:
            _ = file.read()
    except SFTPNoConnection:
        fs.clear_instance_cache()
        new_fs = SSHFileSystem(host, username=username, password=password)
        with new_fs.open(filepath) as file:
            _ = file.read()

I'd assume the expected behaviour would be that initialising a new SSHFileSystem would create a fully new connection - is this intentional behaviour?

put_file does not create parent directories

Test to confirm this:

@pytest.mark.parametrize("cloud", [pytest.lazy_fixture("ssh")])
def test_put_file_ssh(tmp_dir, cloud):
    tmp_dir.gen("foo", "foo")
    cls, config, _ = get_cloud_fs(None, **cloud.config)
    fs = cls(**config)

    fs.fs.put_file("foo", "dir/foo")

Use sftp RENAME instead of copy and delete in case of moving a file

Hello,
I use fsspec to have a single api to work with multiple different storage backend.
I ran into some issues with the sftp implementation because the move method is used and fails with the error : asyncssh.misc.ChannelOpenError: Session request failed trying to do a cp command here https://github.com/fsspec/sshfs/blob/main/sshfs/spec.py#L162

From what I understand the cp command is not always (rarely ?) possible on sftp protocol and the posix_rename is not always supported either.
Would it be possible to replace this to copy and then remove the file by a sftp rename ?
In my case, that did fix the issue and this is the approach used for example in the fs.sshfs implementation. (https://github.com/althonos/fs.sshfs/blob/master/fs/sshfs/sshfs.py#L279)
And it seems quite easy since the rename is already implemented in asyncssh.
I could do a PR about it is an acceptable change.

_cat_file implementation

Hi, I have a feature request. Could the sshfs.SSHFileSystem get an implementation for _cat_file?

I'm trying to use sshfs with zarr, but hit a NotImplementedError when I try to construct a group.

Roughly what I've run:

import sshfs, zarr

fs = sshfs.SSHFileSystem(host)
store = zarr.storage.FSStore("/path/to/data.zarr", fs=fs, mode="r")

g = zarr.open(store)
File /usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/tasks.py:442, in wait_for(fut, timeout, loop)
    437     warnings.warn("The loop argument is deprecated since Python 3.8, "
    438                   "and scheduled for removal in Python 3.10.",
    439                   DeprecationWarning, stacklevel=2)
    441 if timeout is None:
--> 442     return await fut
    444 if timeout <= 0:
    445     fut = ensure_future(fut, loop=loop)

File /usr/local/lib/python3.9/site-packages/fsspec/asyn.py:395, in AsyncFileSystem._cat_file(self, path, start, end, **kwargs)
    394 async def _cat_file(self, path, start=None, end=None, **kwargs):
--> 395     raise NotImplementedError
Full Traceback
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[69], line 1
----> 1 g = zarr.open(store, mode="r")

File /usr/local/lib/python3.9/site-packages/zarr/convenience.py:120, in open(store, mode, zarr_version, path, **kwargs)
    118     return open_array(_store, mode=mode, **kwargs)
    119 elif contains_group(_store, path):
--> 120     return open_group(_store, mode=mode, **kwargs)
    121 else:
    122     raise PathNotFoundError(path)

File /usr/local/lib/python3.9/site-packages/zarr/hierarchy.py:1465, in open_group(store, mode, cache_attrs, synchronizer, path, chunk_store, storage_options, zarr_version, meta_array)
   1462 # determine read only status
   1463 read_only = mode == 'r'
-> 1465 return Group(store, read_only=read_only, cache_attrs=cache_attrs,
   1466              synchronizer=synchronizer, path=path, chunk_store=chunk_store,
   1467              zarr_version=zarr_version, meta_array=meta_array)

File /usr/local/lib/python3.9/site-packages/zarr/hierarchy.py:164, in Group.__init__(self, store, path, read_only, chunk_store, cache_attrs, synchronizer, zarr_version, meta_array)
    162     mkey = _prefix_to_group_key(self._store, self._key_prefix)
    163     assert not mkey.endswith("root/.group")
--> 164     meta_bytes = store[mkey]
    165 except KeyError:
    166     if self._version == 2:

File /usr/local/lib/python3.9/site-packages/zarr/storage.py:1393, in FSStore.__getitem__(self, key)
   1391 key = self._normalize_key(key)
   1392 try:
-> 1393     return self.map[key]
   1394 except self.exceptions as e:
   1395     raise KeyError(key) from e

File /usr/local/lib/python3.9/site-packages/fsspec/mapping.py:143, in FSMap.__getitem__(self, key, default)
    141 k = self._key_to_str(key)
    142 try:
--> 143     result = self.fs.cat(k)
    144 except self.missing_exceptions:
    145     if default is not None:

File /usr/local/lib/python3.9/site-packages/fsspec/asyn.py:114, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    111 @functools.wraps(func)
    112 def wrapper(*args, **kwargs):
    113     self = obj or args[0]
--> 114     return sync(self.loop, func, *args, **kwargs)

File /usr/local/lib/python3.9/site-packages/fsspec/asyn.py:99, in sync(loop, func, timeout, *args, **kwargs)
     97     raise FSTimeoutError from return_result
     98 elif isinstance(return_result, BaseException):
---> 99     raise return_result
    100 else:
    101     return return_result

File /usr/local/lib/python3.9/site-packages/fsspec/asyn.py:54, in _runner(event, coro, result, timeout)
     52     coro = asyncio.wait_for(coro, timeout=timeout)
     53 try:
---> 54     result[0] = await coro
     55 except Exception as ex:
     56     result[0] = ex

File /usr/local/lib/python3.9/site-packages/fsspec/asyn.py:409, in AsyncFileSystem._cat(self, path, recursive, on_error, batch_size, **kwargs)
    407     ex = next(filter(is_exception, out), False)
    408     if ex:
--> 409         raise ex
    410 if (
    411     len(paths) > 1
    412     or isinstance(path, list)
    413     or paths[0] != self._strip_protocol(path)
    414 ):
    415     return {
    416         k: v
    417         for k, v in zip(paths, out)
    418         if on_error != "omit" or not is_exception(v)
    419     }

File /usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/tasks.py:442, in wait_for(fut, timeout, loop)
    437     warnings.warn("The loop argument is deprecated since Python 3.8, "
    438                   "and scheduled for removal in Python 3.10.",
    439                   DeprecationWarning, stacklevel=2)
    441 if timeout is None:
--> 442     return await fut
    444 if timeout <= 0:
    445     fut = ensure_future(fut, loop=loop)

File /usr/local/lib/python3.9/site-packages/fsspec/asyn.py:395, in AsyncFileSystem._cat_file(self, path, start, end, **kwargs)
    394 async def _cat_file(self, path, start=None, end=None, **kwargs):
--> 395     raise NotImplementedError

NotImplementedError: 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.