lordmauve / chopsticks Goto Github PK

View Code? Open in Web Editor NEW

158.0 9.0 16.0 249 KB

Chopsticks is an orchestration library: it lets you execute Python code on remote hosts over SSH.

Home Page: https://chopsticks.readthedocs.io/

License: Apache License 2.0

Python 99.89% Shell 0.11%

orchestration ssh python

chopsticks's Issues

Handle infinite loops in agent process

If the controller host disconnects while an operation is in process, it is conceivable that the agent process could be left running.

A particular worry could be accidental hangs/infinite on the agent side, causing processes that cannot be terminated.

Possibly the remote agent should signal itself with SIGINT, causing a KeyboardInterrupt that should allow the stack to unwind - calling appropriate exception and finally handlers.

EP2017 Sprint details

@lordmauve
Hi Daniel, are there any details on chopsticks spring at EuroPython 2017?
Like exact day and time perhaps?
Thanks.

Buffer stderr until connected

In case of connection failures, the error message raised by Chopsticks is usually just "Unexpected EOF on stream".

Until a connection is established, Chopsticks could collect stderr lines, and report them as part of this error message. This would avoid having to piece together the causes of failure from both the stack trace and stderr information.

This will bevery useful for debugging especially with recursive tunnels.

Call logging.basicConfig on tunnel start

Chopsticks could call logging.basicConfig() on the remote process, so that logs are sent over stderr.

It could also pickle the current root formatter so that the format of logs printed from the remote side matches that from the local side. (However, this might simply add extra problems).

Custom SSH port with Chopsticks

I was wondering if its possible to use custom SSH port with chopstick tunnel.
Thanks

REPL

Consider creating some easy way to interactively script and explore remote systems in parallel.

can not call simple commands

The first example with calling time works perfectly. But when I try to do other simple things such as:

print(ssh.call(sys.version_info))

then the following exception is raised:

_pickle.PicklingError: Can't pickle <class 'sys.version_info'>: it's not the same object as sys.version_info

Also when getting a bit more complex:

print(ssh.call(subprocess.check_output, "apt-get update", shell=True).raise_failures())

Traceback (most recent call last):
  File "sticks.py", line 23, in <module>
    print(ssh.call(subprocess.check_output, "apt-get update", shell=True).raise_failures())
  File "/Users/konstruktor/.local/share/virtualenvs/shipit-WRCZ5US3/lib/python3.6/site-packages/chopsticks/tunnel.py", line 291, in call
    raise RemoteException(ret.msg)

I'm tunneling into a Ubuntu 18.04 machine. With SSHTunnel.

Error cleanly if operation called on closed tunnel

When an operation is performed on a closed tunnel, Chopsticks gets quite far into attempting to perform the operation. This should fail fast with a message like "Operation on closed tunnel". The current error message is this:

Traceback (most recent call last)
<ipython-input-4-43d4acc4cee4> in <module>()
----> 1 tun.call(ip)

/home/mauve/dev/chopsticks/chopsticks/tunnel.py in call(self, callable, *args, **kwargs)
    217         """
    218         self._call_async(loop.stop, callable, *args, **kwargs)
--> 219         ret = loop.run()
    220         if isinstance(ret, ErrorResult):
    221             raise RemoteException(ret.msg)

/home/mauve/dev/chopsticks/chopsticks/ioloop.py in run(self)
    275         self.running = True
    276         while self.running and (self.read or self.write):
--> 277             self.step()
    278         return self.result

/home/mauve/dev/chopsticks/chopsticks/ioloop.py in step(self)
    251         rfds = list(self.read) + [self.breakr]
    252         wfds = list(self.write)
--> 253         rs, ws, xs = select(rfds, wfds, rfds + wfds)
    254         if self.breakr in rs:
    255             rs.remove(self.breakr)

OSError: [Errno 9] Bad file descriptor

Failure during put() should abort gracefully

Currently when a .put() operation fails early, the controller continues sending chunks.

Additionally, the remote side will send a traceback for each chunk it receives after the first failure. This causes a KeyError on the controller.

To recover gracefully from this, the remote side should disregard uploaded chunks until it receives an acknowledgement from the controller that the upload has been aborted.

Groups/Tunnels should connect lazily

Groups and tunnels should connect lazily. This would allow groups and tunnels to be specified in a single "inventory" file.

Additionally, Groups should share tunnels to the same hosts, as hosts would typically be in a number of groups.

ErrorResult does not compare equal

It should be possible to compared ErrorResult instances to each other. They should also be hashable based on their message value.

This would be useful in testing.

If they are hashable they could be used as dict keys and in sets. It might be useful to be able to count the different error messages using a collections.Counter, for example.

Set operations in Groups

Groups are effectively sets of Tunnels. Now that Tunnels connect lazily, one could consider paradigms like

webservers = Group(...)
load_balancers = Group(...)
databases = Group(...)

with webservers + load_balancers as group:
    group.call(install_http_monitoring)

Compiler flags seem to enforce future.print_function for imported modules

It seems to be the case that imported modules are compiled with the from future import print_function option. This should not be the case - it should work like any other import.

Error importing packages that use pbr

When importing a package on the remote side that uses pbr, this error is raised:

Exception: Versioning for this project requires either an sdist tarball, or access to an upstream
git repository. It's also possible that there is a mismatch between the package name in setup.cfg and the argument given to pbr.version.VersionInfo. Project name mock was given, but was not able to be found.

Adding some examples

Hi @lordmauve do you have some examples to share about how to use chopsticks?

Actually I am using Paramiko to create some remote dirs and execute a remote command but it's not clear if I can do the same with chopsticks.

Filter operation on Groups

As discussed in #26, we should be able to treat Groups like sets - in order to be able to union them etc.

We should also add an operation group.filter(callable), which executes the callable on all hosts in the group and returns a new group that contains only those hosts where the callable returns True.

This would enable building groups based on dynamic information sourced from the hosts themselves.

API to send different tasks each group host

I've been working on https://github.com/SupercomputerInABriefcase/SuperComputerInABriefcase today - with the aim of running tasks on a cluster of Raspberry PIs as an educational exercise in distributed computing.

After a good while trying to understand/install OpenMPI (on laptop and PIs), I decided to try chopsticks and had it working in about 5 minutes.

But this sends the same task to each host in the group. Obviously we want to send different tasks to each host and send another task when that finishes.

Could this be done by either: a) passing in a list, or even better an iterator, of functions, to group.call and have the work spread across the hosts. Or b) add a call_async method to tunnel that takes a callback?

Fatal Python error: could not acquire lock at interpreter shutdown...

A little bit excited about this project :)

In the process playing around, and have been getting some odd results with the following program:

# main.py
from chopsticks.tunnel import Tunnel
import tasks

def main(host):
    host.call(tasks.get_time)

if __name__ == '__main__':
    main(Tunnel('[email protected]'))

# tasks.py
import subprocess

def get_time():
    return subprocess.run('time', stdout=subprocess.PIPE, shell=True).stdout.decode()

Sometimes nothing prints, and program exits with code 0.

Sometimes I get the following (expected) printout (and code 0):

[[email protected]] Usage: time [-apvV] [-f format] [-o file] [--append] [--verbose]
[[email protected]]        [--portability] [--format=format] [--output=file] [--version]
[[email protected]]        [--quiet] [--help] command [arg...]

... and sometimes I get a total failure:

[[email protected]] Usage: time [-apvV] [-f format] [-o file] [--append] [--verbose]
Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stderr>'>
at interpreter shutdown, possibly due to daemon threads

Thread 0x00007f86dde4c700 (most recent call first):
  File ".../lib/python3.6/site-packages/chopsticks/ioloop.py", line 155 in println
  File ".../lib/python3.6/site-packages/chopsticks/ioloop.py", line 162 in _check
  File ".../lib/python3.6/site-packages/chopsticks/ioloop.py", line 145 in on_data
  File ".../lib/python3.6/site-packages/chopsticks/ioloop.py", line 223 in step
  File ".../lib/python3.6/site-packages/chopsticks/ioloop.py", line 242 in run
  File "/usr/lib64/python3.6/threading.py", line 864 in run
  File "/usr/lib64/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib64/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007f86e259c4c0 (most recent call first):
[1]    6661 abort (core dumped)  ./main.py

The number of lines that successfully print before failure varies.

I'm running Python 3.6.1 locally, and 3.5.2 on the server. I don't know if it helps, but the result of running ssh -V is OpenSSH_7.5p1, OpenSSL 1.1.0f 25 May 2017.

Please let me know if there's any further information I could provide to help shed some light on this.

Decorated functions can't be called remotely

The following code actually doesn't work...

from chopsticks.tunnel import SSHTunnel

def deco(fn):
    return fn

@deco
def do_it():
    return 'done'

tunnel = SSHTunnel('[email protected]')
print(tunnel.call(do_it))

... because when building the source code to send, serialise_func() will append sub dependency functions below the original callable, hence when the code is remotely unpickled and executed, the decorator function is not yet declared.

Comparison with multiprocessing remote manager

The README says:

One might also draw a comparison with Python's built-in multiprocessing library, but instead of calling code in subprocesses on the same host, the code may be run on remote hosts.

What about chopsticks compared with the remote manager support in multiprocessing? That allows running the code on remote hosts through the multiprocessing interface. https://docs.python.org/3/library/multiprocessing.html#using-a-remote-manager

Callback for stderr messages

Currently lines from stderr are simple echoed prefixed by the hostname. This behaviour should be configurable.

One idea would be to register callbacks - or simply one global callback - to handle this output.

Howto create Tunnel with ssh password

Hi, I have SSH username and password - is there currently any way how to authenticate ssh tunnel with password? I don't know - overriding something, creating some objects by myself, etc? I'd really need to pass the password to connect. Any hints would be useful.

Thanks

Create a benchmark suite

In order to check for Chopsticks' IO performance, and catch regressions, we should create a suite of realistic tasks that can act as a stable benchmark.

This would allow us to tune for performance. There is plenty of scope for this - consider approaches like serialising messages only once across all hosts, or pipelining requests.

Traceback on slow connections when using python3

Hi @lordmauve ,

I've encountered a bug which took me quite a while to pinpoint: when using chopsticks with python3 on slow network connection, the reading of the bootstrap code through stdin fails.

Actually I originally encountered the bug when patching bubble.py whose file size became bigger and the connection to a google cloud engine server was slow, hence triggering the bug described here below. No need to say that I though for days that some of my code in bubble.py was the culprit before finding out it was just due to the filesize :)

Consider the following file:

# test.py
import sys
from chopsticks.tunnel import SSHTunnel

def test():
    return 'done'

tunnel = SSHTunnel(sys.argv[1])
print(tunnel.call(test))

I'm using a vm on my local computer for testing and when I want to throttle the connection I use this line in my ssh config:

Host foobar
    ProxyCommand pv -q -L 50k | nc %h 22

now consider the following test script executions:

$ python2 test.py server
done

$ python2 test.py server.slow  # with throttling
done

$ python3 test.py server
done

$ python3 test.py server.slow  # with throttling
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    print(tunnel.call(test))
  File "/home/amigrave/.local/lib/python3.5/site-packages/chopsticks/tunnel.py", line 296, in call
    self.connect()
  File "/home/amigrave/.local/lib/python3.5/site-packages/chopsticks/tunnel.py", line 142, in connect
    raise RemoteException(res.msg)
chopsticks.tunnel.RemoteException: Unexpected EOF on stream

The reason for this behavior lays in the bootstrap inline code executed with the python interpreter command line:

# tunnel.py#SubprocessTunnel
    PYTHON_ARGS = [
        '-usS',
        '-c',
        'import sys, os; sys.stdin = os.fdopen(0, \'rb\', 0); ' +
        '__bubble = sys.stdin.read(%d); ' % len(bubble) +
        'exec(compile(__bubble, \'bubble.py\', \'exec\'))'
    ]

The way we read stdin: 'import sys, os; sys.stdin = os.fdopen(0, \'rb\', 0); ' +.
So in python3 os.fdopen is an alias to open and for a reason I don't yet fully understand (maybe unbuffered file descriptors are set to os.O_NONBLOCK ?) the os.stdin.read(size) call will not return the good amount of bytes.
That said, in the python documentation it is stated

To read a file’s contents, call f.read(size), which reads some quantity of data and returns it as a string or bytes object. size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most size bytes are read and returned. If the end of the file has been reached, f.read() will return an empty string ('').

So it seems to be a normal behavior !?
Here's a line you can add in the boostrap code in order to check the fact (on my box it actually reads 16kb instead of 17030 bytes):

        'import sys, os; sys.stdin = os.fdopen(0, \'rb\', 0); ' +
        '__bubble = sys.stdin.read(%d); ' % len(bubble) +
+        'sys.stderr.write(\'Read %d - Got %%d\' %% len(__bubble)); ' % len(bubble) +
        'exec(compile(__bubble, \'bubble.py\', \'exec\'))'

And here's the result on my box with ssh throttling:

$ python3 test.py 192.168.56.103
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    print(tunnel.call(test))
  File "/home/amigrave/.local/lib/python3.5/site-packages/chopsticks/tunnel.py", line 296, in call
    self.connect()
  File "/home/amigrave/.local/lib/python3.5/site-packages/chopsticks/tunnel.py", line 142, in connect
    raise RemoteException(res.msg)
chopsticks.tunnel.RemoteException: Unexpected EOF on stream
[[email protected]] Read 17030 - Got 16384

I always worked with such behavior for socket.recv but I didn't knew it could happen to file objects (although I hope this only happens in special cases otherwise I won't sleep this week thinking about all the code I wrote so far).

So. I tried to fix it with a while loop in order to ensure we get all bytes before compiling the bootstrap code but you cannot syntactically use a while statement in an oneliner containing other statements.
I tried other approaches:

evaling a multiline string with \n replaced to \\n
base64 encoding then decoding/execing the code
writing the bootstrap code to a file then execing

but all approaches failed somewhere mainly because I lack the time to do it properly and also because I felt bad during my attempts which always looked like a pile of hacks on top of hacks.

This is why I think this issue needs to be fixed by you. I tried to compile as much information as possible in this issue so you can think about what would be the best option.

Allow using Chopsticks from Jupyter Notebook

Would be interesting to make Chopsticks available in Jupyter notebooks - ie. the ability to create a function without a notebook cell and call it on a remote host.

This would require serialising the function code, which might require a new serialisation method.

IPython kernels using Docker + Chopsticks

It should be possible to create a kernel provider for IPython that runs code in a Docker container. This would make it extremely easy to launch an kernel using an alternative version of Python (on Linux).

Use binary encoding for all data transfer

In 21722ba a binary encoding was added, and which is used for sending structured data from the host to the client. The motivation there was to avoid costly base64-in-JSON encoding and decoding for binary data, which is amplified when tunnelling because it would otherwise be performed at each hop.

However, our own encoding gives us the opportunity to safely support all Python primitive types, and not just be limited to JSON. The current encoding can already distinguish between list and tuples, bytes and strings. However, lots of interesting datastructures are precluded by being limited to JSON - frozenset-keyed dicts, for example! Meanwhile, we do not get the human-readable benefit of JSON, as it is very hard to inspect messages being passed already.

One problem with this proposal is that the encoding code will need to be present in both the orchestration host and the bubble. We ought to attempt to achieve this without copying and pasting. If we put it in a separate file we may be able to simply prepend it to to the bubble.py code.

We should also profile the encoding in comparison to JSON to avoid a possible performance regression.

chopsticks.tunnel.RemoteException in basic usage on Mac OS X

Executing in Macosx box with cpython 2.7:

#!/usr/bin/env python
from chopsticks.tunnel import Tunnel
tun = Tunnel('localhost')
import time
print('Time on %s:' % tun.host, tun.call(time.time))

Results to

Traceback (most recent call last):
  File "./py0.py", line 11, in <module>
    print('Time on %s:' % tun.host, tun.call(time.time))
  File "/Users/username/example/.chopsticks/lib/python2.7/site-packages/chopsticks/tunnel.py", line 179, in call
    raise RemoteException(ret.msg)
chopsticks.tunnel.RemoteException: Unexpected EOF on stream

And the same if not executing from a virtualenv

Traceback (most recent call last):
  File "./py0.py", line 11, in <module>
    print('Time on %s:' % tun.host, tun.call(time.time))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/chopsticks/tunnel.py", line 179, in call
    raise RemoteException(ret.msg)
chopsticks.tunnel.RemoteException: Unexpected EOF on stream

Same scripts works when executed against a linux box.

('Time on localhost:', 1476624529.645448)

Set process title in bubble

When ps is run on a host with an active tunnel, the output isn't pretty:

mauve    12457  0.3  0.5 194128 12124 ?        Ssl  22:05   0:00 /usr/bin/python3 -usS -c import sys, os; sys.stdin = os.fdopen(0, 'rb', 0); __bubble = sys.stdin.read(11522); exec(compile(__bubble, 'bubble.py', 'exec'

While it appears there is no single API to set the process arguments, it appears it is possible on most POSIX systems in some way. Chopsticks should take advantage of this to keep ps clean.

Tunnel destructor prevents subsequent stderr interception

When the Tunnel.close() is called, the ioloop writer and reader instances are closed and oddly this makes subsequent stderr writes not intercepted by the orchestrator.

Eg:

# -*- coding: utf-8 -*-
import time

from chopsticks.tunnel import Local


def func():
    import __bubble__
    __bubble__.debug("Hi there!")
    time.sleep(1)  # Wait stderr to be flushed


for i in range(3):
    print("Call #%s" % i)
    m = Local()  # triggers __del__ in cpython starting 2nd iteration
    m.call(func)

outputs the following:

$ python3 test.py server
Call #0
[localhost] Hi there!
Call #1
Call #2

...no second output...

ImportError from deserialise_func

My Env:

Linux 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
chopsticks ==> 1.0

Sample program

from chopsticks.tunnel import Local

import git

def do_it():
    repo=git.Repo('/my_git_repo')
    return repo.working_dir

local_tun = Local('localhost')
print(local_tun.call(do_it))

on python 2.7 the above code fails with

Traceback (most recent call last):
  File "test.py", line 11, in <module>
    print(local_tun.call(do_it))
  File "/usr/local/lib/python2.7/dist-packages/chopsticks-1.0-py2.7.egg/chopsticks/tunnel.py", line 299, in call
    raise RemoteException(ret.msg)
chopsticks.tunnel.RemoteException: Host 'localhost' raised exception; traceback follows
    Traceback (most recent call last):
      File "bubble.py", line 172, in wrapper
      File "bubble.py", line 236, in do_call
      File "chopsticks://chopsticks/serialise_main.py", line 143, in execute_func
        f = deserialise_func(*func_data)
      File "chopsticks://chopsticks/serialise_main.py", line 132, in deserialise_func
        __import__(mod)
    ImportError: No module named glob

On python 3.5 , code works fine.

When I tried to debug using rpdb, I found the below list of imports from deserialise_func

(Pdb) p imports  
set(['git.index.glob', 'git.index.stat', 'git.contextlib', 'git.logging', 'git.objects.mimetypes', 'git.platform', 'git.repo.git', 'git.index.sys', 'git.sys', 'git.objects.git', 'git.objects.logging', 'git.subprocess', 'git.objects.time', 'git.collections', 'git.os', 'git.config', 'git.functools', 'git.objects.calendar', 'git.refs.remote', 'git.objects.submodule.io', 'git.refs', 'git.locale', 'git.refs.git', 'git.diff', 'git.repo.fun', 'git.odict', 'git.objects.io', 'git.index', 'git.objects.os', 'git.objects', 'git.refs.symbolic', 'git.refs.log', 'git.index.typ', 'git.objects.submodule.util', 'git.repo.collections', 'git.objects.collections', 'git.repo.logging', 'git.objects.submodule.unittest', 'git.refs.tag', 'git.objects.fun', 'git.objects.submodule.base', 'git.objects.string', 'git.objects.util', 'git.db', 'git.objects.base', 'git.index.functools', 'git.util', 'git.objects.commit', 'git.getpass', 'git.refs.os', 'git.stat', 'git.ConfigParser', 'git.codecs', 'git.objects.blob', 'git.objects.tree', 'git.inspect', 'git.objects.submodule', 'git.objects.submodule.logging', 'git.objects.datetime', 'git.repo.string', 'git', 'git.refs.time', 'git.objects.gitdb', 'git.re', 'git.abc', 'git.index.os', 'git.index.fun', 'git.shutil', 'git.objects.stat', 'git.cmd', 'git.refs.reference', 'git.threading', 'git.index.subprocess', 'git.signal', 'git.refs.re', 'git.unittest', 'git.compat', 'git.objects.tag', 'git.index.git', 'git.time', 'git.index.struct', 'git.objects.submodule.git', 'git.repo', 'git.refs.head', 'git.repo.gitdb', 'git.gitdb', 'git.objects.submodule.weakref', 'git.index.base', 'git.index.util', 'git.io', 'git.index.gitdb', 'git.exc', 'git.objects.submodule.root', 'git.repo.re', 'git.remote', 'git.objects.re', 'git.repo.os', 'git.repo.gc', 'git.index.io', 'git.git', 'git.repo.sys', 'git.objects.submodule.os', 'git.objects.submodule.stat', 'git.refs.gitdb', 'git.index.tempfile', 'git.objects.submodule.uuid', 'git.index.binascii', 'git.repo.base'])

Crashes during loop execution can leave a tunnel in a bad state

Using loop.stop() to pass a result back to the calling thread can result in losing sync if the loop has previously crashed somehow.

I was able to get the following code:

print('num_users', tun.call(num_users))
print('getpass', tun.call(getpass.getuser))

to produce this output:

shadow_users root
getpass 40

This can be fixed by clearing the Tunnel's callbacks if the loop crashes. However, better would be to actually tie back the request ID to the response being waited for, and only terminate the loop if it matches.

Jupyter notebook import doesn't work recursively

I wrote this code in a Jupyter Notebook cell:

import os
from chopsticks.tunnel import SSHTunnel, Docker, Local

tun = Docker('docker')


class DockerLocal(Local):
    """A Python subprocess on a docker container"""
    python2 = python3 = 'python'

    
def num_procs():
    return sum(fname.isdigit() for fname in os.listdir('/proc'))


def local_tun():
    with DockerLocal() as tun:
        tun.call(num_procs)
    
tun.call(local_tun)

This crashes with this exception:

...snip...
/home/mauve/dev/chopsticks/chopsticks/tunnel.py in handle_imp(self, mod)
    162             # Special-case main to find real main module
    163             main = sys.modules['__main__']
--> 164             path = main.__file__
    165             self.write_msg(
    166                 OP_IMP,

AttributeError: module '__main__' has no attribute '__file__'

Cache paths used to serve each import

While the importer maintains a cache on each client, the deployment host traverses sys.path every time an import request is received. This is wasteful when we expect that in most cases imports will be needed by multiple connected tunnels.

Search for appropriate Python interpreter to use

The variation in Python interpreter paths means that it is always going to be unreliable to assume it's at a fixed location, such as /usr/bin/python{2,3}, as in the current implementation. Indeed we already work around this for the Docker Python images.

Instead, the bootstrap script could identify and exec an appropriate Python interpreter. By default we could just try python - which is likely to exist on the majority of systems - and switch if this is not correct and we can identify a more likely candidate.

Making this more difficult, there are several desirable properties of the current implementation to preserve.

It should be possible to bootstrap the remote agent without disk write access.
The bootstrap script should, if possible, be able to switch interpreter without re-sending the full bootstrap.
We don't want to slow down the agent by having to proxy the I/O back to the agent, etc.

Tunnel doesn't detect connection lost

The SSHTunnel doesn't appear to detect connection interruption in some cases.

Steps to reproduce:

Using a laptop, connect an SSHTunnel
Put laptop into sleep mode
Resume laptop
Chopsticks operations on the tunnel now hang

Issues reported by lgtm.com

Mark Shannon ran his tool, lgtm, over Chopsticks, and it found errors (and Chopsticks found false positives in lgtm!)

https://lgtm.com/projects/g/lordmauve/chopsticks

We should triage the issues identified here.

ZipImporter support

The import handler on the host looks through sys.path for Python code to send to the client. However it only looks in physical paths. Many Python modules may be installed as zipped eggs. Code from these should be importable.

Python supports arbitrary import hooks (we even make use of these), so there should be an API to allow this lookup to be extended by users. Alternatively, perhaps we can make use of Python's import hooks themselves to load code - though this may involve importing it on the host, which we perhaps do not want to do.

RFC: pencode performance improvements

Are you still using chopsticks, and if so would you like a PR to improve pencode performance?

I been experimenting, to improve the speed of dumping and loading. I've arrived at pencode_read5, a variant that uses dictionary dispatch in the decoder, removes the opcodes for singletons, and replaces obuf with BytesIO.read(). The result is

roughly 2x speedup of pdecode()
slight increase in uncompressed wire size due to None, True, and False becoming references

Benchmark	CPython 2.7	CPython 3.6
cpickle,proto=2,dumps	21.1 ms +- 0.5 ms	9.09 ms +- 0.14 ms
cpickle,proto=2,loads	19.0 ms +- 0.9 ms	9.71 ms +- 0.22 ms
pencode,proto=None,dumps	49.7 ms +- 0.5 ms	64.0 ms +- 0.7 ms
pencode,proto=None,loads	141 ms +- 7 ms	194 ms +- 13 ms
pencode_read5,proto=None,dumps:	49.0 ms +- 0.8 ms	64.1 ms +- 0.8 ms
pencode_read5,proto=None,loads	59.9 ms +- 0.9 ms	78.0 ms +- 8.1 ms

Add Python 2 support, command line parameters and docs for REPL

In 4e6028b a REPL was added that allows running Python code in a number of Docker containers at once.

This is an interesting proof-of-concept, but needs documentation, command-line parameters such as hosts to connect to, and finally documentation, to be useful in the general case.

Sudo support

If Chopsticks is to be usable for configuration management, it needs to be capable of acquiring root permissions.

Add a sudo option to the SSH tunnel.
Add sudo as a connection type, to allow running as root on the current host. This is more tricky, as it may require a pty for some sudo configurations.

Error when trying to print env

I'm having a problem running this simple function on a remote host:

import os
def print_env():
    print(os.environ)

The following is a copy of my terminal session:

Python 3.5.1 (default, Apr 18 2016, 11:46:32)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from chopsticks.tunnel import Tunnel
   ...: tun = Tunnel(host='my.remote.host', user='my_user_name')
   ...:

In [2]: import time
   ...: print('Time on %s:' % tun.host, tun.call(time.time))
   ...:

Time on my.remote.host: 1499942192.4012492

In [3]:

In [3]: import os
   ...: def print_env():
   ...:     print(os.environ)
   ...:

In [4]: print_env()
   ...:
environ({'LC_CTYPE': 'UTF-8', 'COMMAND_MODE': 'unix2003', 'Apple_PubSub_Socket_Render': '/private/tmp/com.apple.launchd.P6DZWedA1U/Render', 'LANG': 'en_US.utf-8', ... 'TERM_PROGRAM': 'iTerm.app'})

In [5]: tun.call(print_env)
   ...:
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-5ef9d5c918df> in <module>()
----> 1 tun.call(print_env)

/Users/my_user_name/.virtualenvs/default/lib/python3.5/site-packages/chopsticks/tunnel.py in call(self, callable, *args, **kwargs)
    286         """
    287         self.connect()
--> 288         self._call_async(loop.stop, callable, *args, **kwargs)
    289         ret = self._run_loop()
    290         if isinstance(ret, ErrorResult):

/Users/my_user_name/.virtualenvs/default/lib/python3.5/site-packages/chopsticks/tunnel.py in _call_async(self, on_result, callable, *args, **kwargs)
    295         id = self._next_id()
    296         self.callbacks[id] = on_result
--> 297         params = prepare_callable(callable, args, kwargs)
    298         self.reader.start()
    299         self.write_msg(

/Users/my_user_name/.virtualenvs/default/lib/python3.5/site-packages/chopsticks/serialise_main.py in prepare_callable(func, args, kwargs)
    124     """Prepare a callable to be called even if it is defined in __main__."""
    125     if isinstance(func, types.FunctionType) and func.__module__ == '__main__':
--> 126         func_data = serialise_func(func)
    127         return execute_func, (func_data,) + args, kwargs
    128     return func, args, kwargs

/Users/my_user_name/.virtualenvs/default/lib/python3.5/site-packages/chopsticks/serialise_main.py in serialise_func(f, seen)
     60     # expressions
     61     code = compile(source, '<main>', 'exec')
---> 62     names = trace_globals(code)
     63
     64     imported_names = {}

/Users/my_user_name/.virtualenvs/default/lib/python3.5/site-packages/chopsticks/serialise_main.py in trace_globals(code)
     14     global_ops = (LOAD_GLOBAL, LOAD_NAME)
     15     loads = set()
---> 16     for op, arg in iter_opcodes(code.co_code):
     17         if op in global_ops:
     18             loads.add(code.co_names[arg])

/Users/my_user_name/.virtualenvs/default/lib/python3.5/site-packages/chopsticks/serialise_main.py in iter_opcodes(code)
     31     if sys.version_info >= (3, 4):
     32         # Py3 has a function for this
---> 33         for _, op, arg in dis._unpack_opargs(code):
     34             yield (op, arg)
     35         return

AttributeError: module 'dis' has no attribute '_unpack_opargs'

Error when dis._unpack_opargs() does not exist

A user reports that some versions of Python lack dis._unpack_opargs(), causing a crash.

Currently we use a version check, which is unreliable. We should catch the AttributeError and only use the fallback in that case.

Cannot unpickle in main in Python 2

Trying to pass a function in __main__ to the remote agent in Python 2 causes

ErrorResult(u'Host \'worker-1\' raised exception; traceback follows\n\n    Traceback (most recent call last):\n      File "bubble.py", line 144, in wrapper\n      File "bubble.py", line 160, in handle_call_thread\n    ImportError: Cannot re-init internal module __main__')

Recursive tunnelling

Remote processes should be able to import Chopsticks and construct their own tunnels. This enables several things - such as tunneling to a remote host and then using the Sudo() tunnel for escalated privileges.

There are a couple of issues blocking this; one is that remote processes can not currently find the bubble code. Also, the imp handler doesn't know how to consult the bubble's importer to resolve imports.

It is preferable that this cannot result in infinite recursion. To avoid this, it might be possible to set a (global) depth limit like setrecursionlimit().

File streaming API

Need API for sending and receiving arbitrarily large files.

Sending we need the ability to pass arbitrary large files as parameters to remote hosts, while receiving we may just need a single API to retrieve files by path.

Failing to instantiate an object causes traceback

If we pass the wrong arguments to a tunnel (in this case a Docker was constructed without passing a name), then a traceback is printed (however the program does not crash).

Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "/home/mauve/dev/chopsticks/chopsticks/tunnel.py", line 533, in __del__
    self.close()
  File "/home/mauve/dev/chopsticks/chopsticks/tunnel.py", line 511, in close
    if not self.connected:
AttributeError: 'Docker' object has no attribute 'connected'

We can get around this by setting a class variable connected = False in the relevant base class.

stderr not flushed from client at tunnel close

There is currently no synchronisation between stdout (ie. call-return) and stderr streams. In particular, once we get a response on the stdout stream, the tunnel may be closed - before the stderr data can be received.

To avoid this we could try to sync up the stderr and ensure it is flushed before connection close. This could be quite tricky to achieve. The remote process could have spawned a long-running subprocess that writes to stderr, for example. We could perhaps insert a marker into stderr and read until we see it before shutting the tunnel down.

Alternatively perhaps we there is a way of inserting some grace time to finish reading stderr. It would be important that this doesn't involve blocking the main thread. For example, closing tunnels could be handed off to the stderr thread to shut down fully if stdin and stdout are closed.

Support pkgutil.get_data()

The importer hook in the remote bubble does not support the get_data() method, meaning that packages that include package data cannot be imported correctly.

As Chopsticks' own bubble.py is loaded as package data, this precludes importing chopsticks itself on the agent.

Write documentation for passwordless SSH

Chopsticks cannot currently deal with interactive password authentication.

We should ensure that this is well-documented; we could also consider adding documentation on how to configure SSH to ensure that this is the case (if it is not).

We might also want to look at how Chopsticks works with commandline ssh-askpass prompts when keys are passphrase-encrypted.

lordmauve / chopsticks Goto Github PK

chopsticks's Issues

Recommend Projects

Recommend Topics

Recommend Org