dotmesh-io / dotmesh Goto Github PK
View Code? Open in Web Editor NEWdotmesh (dm) is like git for your data volumes (databases, files etc) in Docker and Kubernetes
Home Page: https://dotmesh.com
License: Apache License 2.0
dotmesh (dm) is like git for your data volumes (databases, files etc) in Docker and Kubernetes
Home Page: https://dotmesh.com
License: Apache License 2.0
When instantiated with wrong (or not enough) arguments, dm push has a predisposition to panic. Fix this, by making it give sensible error messages.
API key:
Remote added
We want the full on frontend build container running in all circumstances (for dev trim) - the developer will always see the site via the frontend proxy
A volume could be renamed while transfers are in progress. This should be fine if, after setting up the transfer, everything is done in terms of volume UUIDs, but we're not sure if this is the case.
In general: Do all volume name resolution at the start of any operation, and use UUIDs thereafter, to avoid renames breaking in-progress things.
In particular: Do an audit of the code to ensure this is the case for already-written things.
problem: annoyingly if you try and create an etcd cluster in the same breath as installing the etcd operator, it fails because the operator hasnβt started yet
possible solution: use init containers to wait for the etcd operator to start up before trying to create an etcd cluster
downside: this means we need to bundle kubectl
or some code which uses the kube API
but we need to do that anyway, because we're writing a controller...
alternative: just document creating the etcd cluster manually, make it less magic and more explicit?
iow, moby/moby#19625 could affect datamesh users. check this and add a pre-flight check to dm cluster
luke@mashin-1:~$ docker run -ti -v stress-test:/foo --volume-driver dm ubuntu sh -c 'echo HELLO > /foo/WORLD'
luke@mashin-1:~$ logout
Connection to mashin-1 closed.
luke@cube:~$ while true; do for X in 1 2 3 4; do echo $X; ssh mashin-$X docker run -v stress-test:/foo --volume-driver dm ubuntu cat /foo/WORLD; done; done
1
HELLO
2
cat: /foo/WORLD: No such file or directory
3
cat: /foo/WORLD: No such file or directory
4
cat: /foo/WORLD: No such file or directory
1
HELLO
2
cat: /foo/WORLD: No such file or directory
3
cat: /foo/WORLD: No such file or directory
4
cat: /foo/WORLD: No such file or directory
1
it happens that stress-test pre-existed as a volume on the node, and so was somehow getting mounted as an ext4 volume (ie a regular bind-mount from the host). not sure how this happened, it shouldn't. but it was after manually blowing away /var/lib/docker/containers in attempt to workaround #23
leaving this open.
i think this happens when you have more than one branch
When you give dm push
a branch name as the third argument, it disregards it and uses whatever the currently checked out branch is.
See design document at https://docs.google.com/document/d/1ocHxqVM50k4aciA7ejiJIEN_bQ3zgC02zjjue29XKG4/edit#
On NixOS, dm cluster reset
doesn't clear out ~/.datamesh
for some reason.
We should make a library that can be used from everywhere that needs it; it should be a proper Go SDK with the following properties:
cmd/dm/pkg
); set up a go channel to push in a cancellation, and one which is used to push progress updates back.DotmeshVolume
) should be defined in the client library and not duplicated elsewhere in any of our repos (in particular, the RPC server in cmd/datamesh-server/pkg/main/rpc.go
should reference the definition from the go library as well).Existing duplicated code needs to be refactored out into the library, so that we use the go library ourselves. Any code that contains the string DotmeshRPC
is quite likely an API user (apart from cmd/datamesh-server/pkg/main/rpc.go
...)
1599 snapshots in one filesystem yields a ~2 minute delay between /ux showing empty list of volumes and it showing the complete list. dm list
starts responding sooner, oddly, but perhaps on a different node.
because the glossary says so https://docs.google.com/document/d/1OATfqls_EJx8DVmm8ZU00L9G5be4DWYSF6KvlOfulGk/edit#
https://github.com/docker/docker/blob/1.13.x/docs/extend/config.md
because then apparently plugins get started before application containers
stopping the untoward outcome of restarting docker when you have any datamesh containers breaking docker completely
Uninstalling dotmesh with dm cluster reset
when there are docker volumes which exist which reference dotmesh volumes leaves Docker in a state where it constantly hangs for long periods of time, looking for the dm
plugin with an exponential backoff.
Make dm cluster reset
enumerate docker volumes and warn about removing the references before uninstalling dotmesh. It should refuse to do it unless run with -f
.
And expected the same results as dm list
- perhaps we should alias dm dot X
-> dm X
when following the demo on the website, the commit metadata for the first commit is missing from datamesh cloud. Figure out why and fix it please π
we should have one
example: https://github.com/kubernetes/charts/tree/master/stable/gitlab-ce/templates
dm clone
might get confusing, because it means "pull to a new volume" and not "create a new branch/clone". for folks familiar with the concept of a filesystem clone, it will do the wrong thing.
maybe we could overload dm pull
so that it works for both the pull (new commits in an existing volume) and clone (new volume) cases?
I typed dm remote remove
(which does not exists) and it gave me dm remote list
- it should instead error no such command - please run help
tested pushing from macOS to cloud.datamesh.io:
2017/09/11 07:32:54 [updatePollResult] => /datamesh.io/filesystems/transfers/549cc00c-57bf-4ffc-4238-1c289e851eee, serialized: {"TransferRequestId":"549cc00c-57bf-4ffc-4238-1c289e851eee","Peer":"cloud.datamesh.io","User":"lukemarsden","ApiKey":"[redacted]","Direction":"push","LocalFilesystemName":"mydata","LocalCloneName":"","RemoteFilesystemName":"mydata","RemoteCloneName":"","FilesystemId":"b21d5469-0b91-4416-53fe-b0310263b94e","InitiatorNodeId":"3f1d7183af011a60","PeerNodeId":"","StartingSnapshot":"START","TargetSnapshot":"324aa0ad-048a-4470-41ab-ac6e97ffccdf","Index":1,"Total":1,"Status":"finished","NanosecondsElapsed":38859296692,"Size":221573656,"Sent":222119627,"Message":"Attempting to push b21d5469-0b91-4416-53fe-b0310263b94e got \u003cEvent error-pushing-posting: responseBody: Host is master for this filesystem (b21d5469-0b91-4416-53fe-b0310263b94e), can't write to it. State is backoff.\n, statusCode: 404, responseHeaders: map[Date:[Mon, 11 Sep 2017 07:32:00 GMT] Content-Length:[112] Content-Type:[text/plain; charset=utf-8]], requestURL: http://cloud.datamesh.io:6969/filesystems/b21d5469-0b91-4416-53fe-b0310263b94e/START/324aa0ad-048a-4470-41ab-ac6e97ffccdf\u003e"}
it looked like it worked, but there was maybe a 30 second delay after it finished pushing:
luke@starry:~$ dm push cloud
Calculating...
finished 211.31 MB / 211.31 MB [======================] 100.00% 5.45 MiB/s (1/1)
Done!
as google group
rollback doesn't stop and start running containers: it thereby corrupts running databases :(
I think rollback is more reliable now.
HOWEVER
if a container tries to start on a filesystem while it's being rolled back, we should make it wait until the rollback is complete.
neither does moving a container between hosts
is this true?
so that upgrades are seamless and you don't have to remember to pass e.g. --checkpoint-url
and --pool-name
etc or any other flags
and/or the config file should just be updated to support everything currently doable with flags.
to catch bugs in macos implementation that would otherwise remain hidden
push to the dothub (and then cleanup) instead!
also make the smoke tests run every hour.
Calculating...
Maximum retry attempts exceeded: &{error-from-send-nP %!s(*main.EventArgs=&map[err:0xc420533d80 output:internal error: Invalid argument
suspect caused by too-new zfs kernel modules on macos, mismatching with zfs binary bundled in docker image
to reproduce, on macOS at least:
dm cluster init
docker run -v name:...
sudo dm cluster reset
dm cluster init
docker run -v name:...
the second invocation seems to fail to refer somehow, e.g. files written to the volume don't then show up as dirtying the filesystem
depends: openzfs/openzfs@9896f2b
Design doc: here.
in interactive mode, at least. like git log
.
it previously caused a "can't find snapshots" error...
not sure what's going on here
https://github.com/lukemarsden/datamesh-server/commit/9f021997698a5ec612742a791c742262ef30387b introduces a race condition if there are no datamesh-using app containers running. fix it by waiting for the dummy plugin to come up before proceeding to run docker stuff.
things like prime.sh
still hanging around
luke@hackintosh:~/gocode/src/github.com/lukemarsden/datamesh$ docker start db
Error response from daemon: get mydata: VolumeDriver.Get: 404 page not found
Error: failed to start containers: db
also need to implement VolumeDriver.Capabilities
it's not consistent with git
behaviour, nor intuitive.
possible exception: if there are no remotes yet.
docker run ... -v foo:/data busybox touch /data/foo
dm cluster reset
docker run ... -v foo:/data busybox ls -alh /data/foo
on macOS, at least, this shows a file where there should be none! dm cluster reset
should totally wipe out the datamesh state.
not sure, docker run
worked but the dm volume never got mounted in the container, ext4 was in its place
It would be nice to be able to organise volumes into namespaces, so that different users can have volumes with the same names. This is particularly pertinent for multi-tenant public hosting setups!
See https://docs.google.com/document/d/1qtE096-8xLH5Ml6NLAkHRJhO7TGjLx7PfsblYdIRa24/edit# for a design document.
use datamesh to capture logs in elasticsearch and zipkin traces in mysql and prometheus metrics from acceptance test/soak test/stress test runs
and maybe snapshot/backup etcd. also the etcd behind discovery
Currently /ui/foo/bar is a 404, but the pushState URLs in the app resolve to these; this breaks reloading & linking. Fix it by making anything underneath /ui return the homepage.
but docker run doesn't
so:
$ sudo docker run -ti --volume-driver=dm -v β:/bar ubuntu bash
[...]
$ dm list
VOLUME SERVER BRANCH CONTAINERS
β 828634e156efc02c newbranch
foo 828634e156efc02c newbranch
* monday 828634e156efc02c newbranch
monday@newbranch 828634e156efc02c newbranch
$ dm switch β
Error: β is not a valid name
We could relax the restrictions on names in switch and other places that use it.
When a transfer is initiated, the Transfer RPC method injects a transfer request into etcd and returns. PollTransfer then proceeds to call GetTransfer in order to report stuff to the user, but it often gets in before the transfer has initiated, causing this to happen:
Calculating...
Got error, trying again: Response '{"jsonrpc":"2.0","error":{"code":-32000,"message":"No such intercluster transfer 9586f07d-7a5a-4733-41ec-59f16725ca06","data":null},"id":8674665223082153551}
' yields error No such intercluster transfer 9586f07d-7a5a-4733-41ec-59f16725ca06
finished 9.50 KB / 9.50 KB 100.00% 3.15 MiB/s (1/1)
Done!
It would be nice if the error return from GetTransfer had an error type code, so we can distinguish "transfer does not exist" errors from others without needing to parse the string. Then we could make the PollTransfer loop work like this:
luke@glow:~$ docker run -d -v mydata:/var/lib/mysql \
--volume-driver=dm --name=db -e MYSQL_ROOT_PASSWORD=secret mysql
gives:
docker: Error response from daemon: symlink /var/lib/docker/datamesh/mnt/dmfs/d446e530-c475-499a-74fe-149b02c730af /var/datamesh/mydata: file exists. See 'docker run --help'.
(old title: mint new credentials )
In dm cluster join
, currently the certificates that were first generated on the server where dm cluster init
was run are used.
Instead, we should mint new certificates using the extant CA for two reasons:
Ideally, these certs should be minted in the discovery service to avoid handing out keys to the kingdom, but a good first step here would be to move to minting them join
-side to fix the proxy-SPOF issue.
node001$ docker run --name x -v foo:/...
node002$ docker run --name x -v foo:/...
should fail (because container is running on node001, and has acquired a lock)
also:
currently running the container in two places at once and then later trying to move it again causes cannot receive incremental stream: destination pool/dmfs/26c9a9ee-6486-4316-420f-2a51c6f8ae9e has been modified
in the logs
must be something off with the handoff logic, it should bail if it can't unmount the filesystem...
There is logic to detect (somewhat asynchronously) what Docker containers are using what volumes, but that's not the only way volumes are used - Procure
and MountCommit
are available in the API but there's no API call to say you're finished with it. Ideally, Procure
and MountCommit
(even when called via the Docker volume plugin) should increment a reference count or other similar tracking mechanism, and an explicit ReleaseMount
API call be invoked to release it.
The "lock" held by these calls should also inhibit incoming transfers; letting them happen is a race to see whether the transfer drops the new snapshots or the running workload touches a file to make it dirty first.
As a user of a distributed database using dotmesh for backend storage, I'd like to be able to take consistent snapshots of all the dots used by my database nodes.
"Atomicity" is undefined in a distributed environment; we probably just need to be "as quick as possible" at doing all the snapshots at once, but maybe some atomicity could be arranged by synchronously talking to the database cluster itself, requesting that it prepare a stable global state, waiting for it to confirm it's done so, snapshotting, then telling the DB it no longer needs to maintain that stable global state.
meaning that data sent over them is vulnerable to being sniffed
to facilitate scripting
this is in dm list
now.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.