skycoin / skywire Goto Github PK

Skywire Node implementation

Dockerfile 0.04% Go 46.83% Shell 0.78% HTML 7.86% Makefile 0.56% JavaScript 1.10% TypeScript 22.77% CSS 0.07% SCSS 2.99% PowerShell 0.17% Batchfile 0.08% Java 16.75%

vpn meshnet software-defined-network

skywire's People

Contributors

Stargazers

Watchers

Forkers

ivcosla bigookie nkryuchkov pikomonde erichkaestner u5surf mdllife devzone777 asxtree asgaror fray circle-22 arc1999 musinit evanlinjin ness-project jeff-bouchard taras-skycoin abansinsi i-hate-nicknames darkren ness-network skyfleet ravenii coffeew-1337 alexadhy iwanmartinsetiawan skywirex libronlopes koad specter25 ppcamp dharmendrakariya ersonp senyoret1 xeroxparc mrpalide coder1966 clustercrypto windowsagent jdknives galzzly 0pcom 4rchim3d3s

skywire's Issues

Make /exec endpoint consistent with others

While all endpoints for nodes management look like /nodes/{pk}/{action}, the endpoint for command execution looks like /exec/{pk}, which is not consistent with others and violates REST.

I think /nodes/{pk}/exec would look better.

Document all the possible Hypervisor auth states

Feature description

Describe the feature
Document all the possible Hypervisor auth states, to make it easier to consume the API correctly.

Is your feature request related to a problem? Please describe.

Describe the solution you'd like
When creating documentation about the Hypervisor API, please add a section about all the possible results related to auth, like what happens when the session is not valid, what is the normal validity duration of a session cookie, if the user must change the password after a certain amount of time and all other relevant information.

Describe alternatives you've considered

Additional context
The information is needed for being able to use the API in a effective way.

Possible implementation

Implement backoff for transport reconnection

Feature description

Currently, when one edge of a transport goes down, the other edge will try to re-establish the transport every three seconds. We should implement a backoff for re-establishing the transport. @evanlinjin suggested using the retry logic in /internal/netutil . Also, we need to improve logging, so that these messages do not clutter stdout:

[2019-11-13T10:59:10+08:00] WARN [tp:03c775]: failed to redial underlying connection: dial tcp 192.168.0.107:7777: connect: connection refused

improve logging
implement retrial backoff

Invalid rule panic reappears.

panic: invalid rule

goroutine 66 [running]:
github.com/skycoin/skywire/pkg/routing.Rule.TransportID(0x7ff4f309d139, 0x36, 0x36, 0x0, 0x0)
        /home/evanlinjin/dev/skycoin/skywire/pkg/routing/rule.go:70 +0x13f
github.com/skycoin/skywire/pkg/router.(*Router).forwardPacket(0xc000176af0, 0xe1c480, 0xc0000c8010, 0xc000358266, 0xa, 0xa, 0x7ff4f309d139, 0x36, 0x36, 0x93c947, ...)
        /home/evanlinjin/dev/skycoin/skywire/pkg/router/router.go:219 +0x28f
github.com/skycoin/skywire/pkg/router.(*Router).handlePacket(0xc000176af0, 0xe1c480, 0xc0000c8010, 0xc000358260, 0x10, 0x10, 0xe0ef20, 0xc000162290)
        /home/evanlinjin/dev/skycoin/skywire/pkg/router/router.go:143 +0x4d4
github.com/skycoin/skywire/pkg/router.(*Router).Serve.func1(0xc000176af0, 0xe1c480, 0xc0000c8010)
        /home/evanlinjin/dev/skycoin/skywire/pkg/router/router.go:116 +0x107
created by github.com/skycoin/skywire/pkg/router.(*Router).Serve
        /home/evanlinjin/dev/skycoin/skywire/pkg/router/router.go:110 +0x109

[M2] Keep-alive packet not being propagated through all the route

Describe the bug
When node receives keep-alive packet, it update activity of the rule associated with the route. It works fine. But the packet is not being propagated forward the route. If we have, say, 3 nodes, where one of them is intermediary, there won't be any trouble. But if we add more intermediary nodes, they won't get this keep-alive packet. This way the route will get broken as soon as the data packets stop going through the network.

Actual behavior
Without transmitting data packets, having 2+ intermediary nodes the route gets broken after rule timeout

Expected behavior
Keep-alive packets are being handled by all the nodes along the route, therefore updating rule activity and preventing rules from being removed

Possible implementation
There's func handling keep-alive packet. We need to just forward the packet down the line, should be easy to implement

Improvements for the Manager UI

This is a list of some of the improvements that should be added to the manager:

In addition to this, dedicated controls should be added for the default apps, in the apps page, just as the testnet manager has dedicated controls for “Connect to Node”, “SSH Server“ and “SSH Client”. However, for doing this more info about the default apps is needed, and some modifications could be needed in the visor, hypervisor and the skywire-services repository.

There are some minor improvements that were not listed, and some additional major changes may become evident in the future.

Routes are removed after several hours

Describe the bug
When using the instructions written in https://github.com/SkycoinProject/skywire-mainnet/tree/mainnet-milestone2/cmd/hypervisor to use the hypervisor API with the skywire-services repo, if some routes are created with curl --data {'"recipient":"'$PK_A'", "message":"Hello Joe!"}' -X POST $CHAT_C, the routes start to be returned by the hypervisor API, but are erased after several hours (arround 6 hours), or at least the hypervisor API stops returning them.

Environment information:

OS: Linux (Ubuntu 18.04.1)
Platform: Linux 4.15.0-65-generic x86_64

Steps to Reproduce
It is complicate to see the problem and to have specific instructions about how to reproduce it, because the big amount of time it takes to happen. However, I think it takes about 6 hours without using the hypervisor API to make the problem to appear.

Actual behavior
The routes are deleted after several hours. In addition to that, running curl --data {'"recipient":"'$PK_A'", "message":"Hello Joe!"}' -X POST $CHAT_C does not recreate them.

Expected behavior
Routes should stay.

Additional context

Possible implementation

Implement Hypervisor <--> Visor connection over dmsg

Feature description

This is the old PR pointing to the old repo: https://github.com/skycoin/skywire/pull/563/files

[M2] Socket files are not removed on visor shutdown

Describe the bug
Socket files (app serve and dmsgpty) are not removed on visor shutdown. This way if we restart visor, it won't run because of address already in use

Environment information:
Independent

Steps to Reproduce
Steps to reproduce the behavior:

Run skywire visor
Shutdown visor
Run it again

Actual behavior
Visor fails to run because of address already in use

Expected behavior
Visor runs

Possible implementation
Remove socket files on visor shutdown

[M1] Integrate dmsgexec and remove skyssh/therealssh.

Tasks:

Remove dmsgexec.
Make dmsgexec use ssh for proper session management.
Integrate dmsgexec directly into visor.

Endpoint for restarting visor from hypervisor

Feature description
The hypervisor needs to be able to restart the visor because it's complicated to restart a big amount of visors manually.

Describe the solution you'd like
We can create a new visor process with the same args and kill the current one.

At the visor start:

Save the current working directory.
Save the command line args: the relative binary path, binary args

At the visor restart:

Since args may contain relative paths, either set the working directory to the saved one, or change relative paths in args to absolute ones using the saved working directory.
Run a new visor process with the required args.
Kill the current process.

Describe alternatives you've considered
When going back to the beginning of main(), it might be very complicated to close all open resources.

Additional context
Running a new visor and killing the current one may need further study whether there are any pitfalls. E.g. we need to ensure that all resources needed by the new visor are freed by the old one before the start of the new one.

It is not possible to delete transports and routes using the hypervisor API

Describe the bug
In the master branch, when calling the DELETE /api/nodes/{pk}/transports/{tid} and DELETE /api/nodes/{pk}/routes/{routeId} API endpoints for deleting transports and routes, the operation fails.

Environment information:

OS: Linux (Ubuntu 18.04.1)
Platform: Linux 4.15.0-65-generic x86_64

Steps to Reproduce
Steps to reproduce the behavior:

Start some nodes using the make integration-run-generic command of the skywire-services repository.
Create transports and routes.
Try to delete the transports and routes using the DELETE /api/nodes/{pk}/transports/{tid} and DELETE /api/nodes/{pk}/routes/{routeId} API endpoints.

Actual behavior
When trying to delete the transports and routes, the operation always fails and the response is something similar to {"error":"can not find app of name from node 024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7"}

Expected behavior
The transports and routes should be deleted.

Additional context

Possible implementation

Visors cannot connect to hypervisor

Describe the bug
When running the integration environment, connections between visors and hypervisor stuck in dmsg code.

Environment information:
Independent.

Steps to Reproduce
Run the generic integration environment.

Actual behavior
Visors cannot connect to hypervisor.

Expected behavior
Visors connect to hypervisor.

Overall testing of app2 and router2

Need to fix tests for the app2 and router2 modules. Need to do a perform a thorough testing of these modules

Proxy client does not work if yamux connection fails

If an intermediary node is restarted or if a network error happens, the proxy client does not reconnect and stops working. We need to implement reconnection logic.

nettest.TestConn fails with stcp.Conn

The test implemented in #42 for stcp.Conn panics with nil pointer dereference.

Sometimes it's TestConn/PingPong:

=== RUN   TestConn/PingPong
SIGN! len(b.Bytes) 396 dc56b62cf8625836528ccd75e696d5c65346df9433f49c8c0bc84d670e31975e
VERIFY! len(b.Bytes) 396 dc56b62cf8625836528ccd75e696d5c65346df9433f49c8c0bc84d670e31975e recovered: [2 5 203 190 126 217 33 250 75 59 226 126 36 57 212 254 13 226 200 213 128 212 136 120 236 13 119 155 170 154 29 162 152] <nil> expected: 0205cbbe7ed921fa4b3be27e2439d4fe0de2c8d580d48878ec0d779baa9a1da298
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x13e13db]

goroutine 40 [running]:
github.com/SkycoinProject/skywire-mainnet/pkg/snet/stcp.(*Conn).Read(0x0, 0xc000264e10, 0x8, 0x8, 0xc0000396c0, 0x100be95, 0x1454020)
	<autogenerated>:1 +0x2b
io.ReadAtLeast(0x5809090, 0x0, 0xc000264e10, 0x8, 0x8, 0x8, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.13.1/libexec/src/io/io.go:310 +0x87
io.ReadFull(...)
	/usr/local/Cellar/go/1.13.1/libexec/src/io/io.go:329
golang.org/x/net/nettest.testPingPong.func1(0x15709a0, 0x0)
	/Users/nkryuchkov/skywire-mainnet/vendor/golang.org/x/net/nettest/conntest.go:108 +0x11b
created by golang.org/x/net/nettest.testPingPong
	/Users/nkryuchkov/skywire-mainnet/vendor/golang.org/x/net/nettest/conntest.go:137 +0x130

Sometimes it's TestConn/RacyRead:

=== RUN   TestConn/RacyRead
SIGN! len(b.Bytes) 400 921359adcec09b1e27215ca1d8a3143ebaefdd19188961e8e4839aa93a8cd2ef
VERIFY! len(b.Bytes) 400 921359adcec09b1e27215ca1d8a3143ebaefdd19188961e8e4839aa93a8cd2ef recovered: [3 180 67 123 20 223 155 216 138 227 38 155 141 154 215 180 147 72 69 43 108 148 180 210 149 137 11 66 24 104 184 76 202] <nil> expected: 03b4437b14df9bd88ae3269b8d9ad7b49348452b6c94b4d295890b421868b84cca
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x13e162b]

goroutine 22 [running]:
github.com/SkycoinProject/skywire-mainnet/pkg/snet/stcp.(*Conn).Write(0x0, 0xc0001e6c00, 0x400, 0x400, 0x400, 0x0, 0x0)
	<autogenerated>:1 +0x2b
io.copyBuffer(0x1565240, 0xc000184170, 0x1565220, 0xc000184180, 0xc0001e6c00, 0x400, 0x400, 0xc000184180, 0xc000184180, 0x1466ea0)
	/usr/local/Cellar/go/1.13.1/libexec/src/io/io.go:404 +0x1fb
io.CopyBuffer(0x1565240, 0xc000184170, 0x1565220, 0xc000184180, 0xc0001e6c00, 0x400, 0x400, 0x13e0e03, 0x1570b80, 0xc0001ca280)
	/usr/local/Cellar/go/1.13.1/libexec/src/io/io.go:375 +0x82
golang.org/x/net/nettest.chunkedCopy(0x4960288, 0x0, 0x1564320, 0xc0000967b0, 0x10000c0000f9df0, 0x191c9a0)
	/Users/nkryuchkov/skywire-mainnet/vendor/golang.org/x/net/nettest/conntest.go:462 +0x11b
created by golang.org/x/net/nettest.testRacyRead
	/Users/nkryuchkov/skywire-mainnet/vendor/golang.org/x/net/nettest/conntest.go:148 +0xd0

The autostart value of the apps is not being restored after a restart

Describe the bug
If the autostart value of an application is changed by calling the PUT /api/visors/{pk}/apps/{app} API endpoint, the new value is not restored after stopping the visor and starting it again.

Environment information:

OS: Linux (Ubuntu 18.04.1)
Platform: Linux 4.15.0-65-generic x86_64

Steps to Reproduce
Steps to reproduce the behavior:

Start some nodes using the make integration-run-generic command of the skywire-services repository
Call GET /api/nodes/024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7. The autostart property of the skychat will be true
Call PUT /api/nodes/024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7/apps/skychat with {autostart: false} as content, to stop the skychat app.
Call GET /api/nodes/024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7 again. The autostart property of the skychat will be false
Call make integration-teardown; tmux kill-server in the command window that is running the skywire-services test enviroment, to stop it. Then call make integration-run-generic to start it again.
Call GET /api/nodes/024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7 again. The autostart property of the skychat will be true.

Actual behavior
After restarting the visor, the value set by calling PUT /api/nodes/024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7/apps/skychat is ignored.

Expected behavior
The value should survive the restart of the visor.

Additional context

Possible implementation

Add ability to delete entries in transport discovery.

Feature description

Currently, one can only update statuses of entries in transport discovery and cannot actually delete entries.

This causes issues when restarting transports. When we remove transports from visor, we only remove them locally. On visor startup, it polls transport discovery to determine saved transports. Hence, locally deleted transports are revived.

Implementation Details:

A transport between nodes A and B can only be deleted by A or B.

Tasks:

Add endpoint to transport discovery to delete transport.
Update transport discovery client.
Update (transport.Manager).DeleteTransport to call transport discovery's delete transport endpoint.

Use maps instead of slices where lookups often happen

https://github.com/SkycoinProject/skywire-mainnet/blob/master/pkg/visor/visor.go#L106

Not a part of your work, but what do you think of changing the type of appsConf from slice to hashmap? The most often thing we do is a lookup, which would be faster in that case.

Originally posted by @nkryuchkov in #62

Add deployment flag to gen-config command

Feature description

There are currently two deployments in use:

skycoin.com deployment (production)
skywire.cc deployment (testing)

The default skywire-cli node gen-config command should create a config file pointing to skycoin.com. There should be a flag to allow to generate a config pointing to skywire.cc.

Describe the solution you'd like

Skywire.cc Deployment:

http://routefinder.skywire.cc
http://transport.discovery.skywire.cc
http://dmsg.discovery.skywire.cc
Setup Node PK: 026c5a07de617c5c488195b76e8671bf9e7ee654d0633933e202af9e111ffa358d

skycoin.com Deployment:

http://transport.discovery.skywire.skycoin.com
http://routefinder.skywire.skycoin.com
http://messaging.discovery.skywire.skycoin.com
Setup Node PK: 026c5a07de617c5c488195b76e8671bf9e7ee654d0633933e202af9e111ffa358d

Go build Skychat dysfunctional on RPi

Describe the bug
After cloning the repo and running

make build; make install

it returns

GO111MODULE=on go build -race -o ./apps/skychat.v1.0 ./cmd/apps/skychat	
go build: -race is only supported on linux/amd64, linux/ppc64le, linux/arm64, freebsd/amd64, netbsd/amd64, darwin/amd64 and windows/amd64
Makefile:90: recipe for target 'host-apps' failed
make: *** [host-apps] Error 2

[M2] Check behavior of route groups

We need to check the behavior of route groups. Specifically, let's consider 2 route groups communicating. If visor with one of them comes down, and the second one still tries to communicate what errors will it be experiencing? Will there be errors at all? What should be considered as a correct behavior in this case?

Improve app configurability from Hypervisor

Feature description

Currently, we can only set the autostart parameter for applications on a visor from the hypervisor and start/stop applications. skyproxy as well as dmsgpty require more control however.

Users should be able to

enable/disable authentication and change auth passcode for skyproxy
set and manage the whitelist for dmsgpty

Add missing options to the hypervisor

Feature description
The manager UI has some options in the testnet that can not be implemented in the mainnet with the current hypervisor API. The options are:

Those options are for setting the address of the discovery service, checking updates for the node/visor and restarting the node/visor. Also, the mainnet UI currently has an option for selecting if the visor should act as an exit node, but there is no API endpoint for making it work.

Is your feature request related to a problem? Please describe.
There are some options in the manager that are not working and can not be implemented with the curren API.

Describe the solution you'd like
API endpoints must be added to make those options work, or the options should be removed from the UI.

Describe alternatives you've considered

Additional context

Possible implementation

[M2] Transport manager tests fail

Describe the bug
Two transport manager tests are failing randomly. Sometimes they pass, sometimes they don't. The failing tests are:

TestNewManager/check_read_write with error:

Which happens here:
TestNewManager/check_tp_logs with error:

Which happens here:

ALSO, this test sometimes fails with the different error:

Which happens on these lines:

These 2 are probably connected and happen because of a single bug

Also test sometimes hangs

Make visor listen for stcp transports automatically

Feature description

Get local IP address and make visor listen for stcp transports on that address and port 7777 by default.

Is your feature request related to a problem? Please describe.
Currently, we have to manipulate the configuration file manually to setup stcp transports and set the local IP and port to listen on. All of that should be automated.

Update Skywire Cli

We should rename skywire-cli node to visor for consistency.

rename skywire-cli node to skywire-cli visor

Milestone 1 Production Testing

We need to test milestone 1 in production. These are the functionalities we need to test for:

run proxy over TCP transport
run SSH over TCP transport
run chat over TCP transport
run hypervisor over production

Error trying to start an app that is already running

Describe the bug
When calling PUT /api/visors/{pk}/apps/{app} with "status": 1 to start the skychat app when it is already running, the console shows Failed to start app skychat: app skychat is already started, but the API does not return anything, so the client stays waiting indefinitely for the API to return something. If the same enpoint is called with "status": 0 while waiting for the response, the visor sometimes stops being accesible using the hypervisor API.

Environment information:

OS: Linux (Ubuntu 18.04.1)
Platform: Linux 4.15.0-65-generic x86_64

Steps to Reproduce
Steps to reproduce the behavior:

Start some nodes using the make integration-run-generic command of the skywire-services repository
Call http://{localIp}:8080/api/visors/024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7/apps/skychat using PUT with { "autostart": true, "status": 1 } as content. You should not get a response from this call.
Still waiting for the response for the request made in step 2, make a similar request again, but changing the content to { "autostart": true, "status": 0 }.
Try repeating steps 3 and 4, so that you will continue trying to start an already started app and then stopping it, until you start getting "error": "connection is shut down" as response. You may not need to repeat the previous steps for this to happen, you may start getting that response in the first try.

Actual behavior
The call made in step 2 never returs a valid response, even being that the console shows Failed to start app skychat: app skychat is already started, which would make a client app unable to give feedback to the user.

Also, the whole procedure eventually makes the console show Injected [CLOSE]: Closing stream... received=<type:CLOSE><id:… or Rejected [ACK]: Failed to grow remote window. error="local record of remote window has become invalid:…, after which it is not possible to access the visor using the hypervisor API, as it starts returning "error": "connection is shut down" all the time.

In fact, there are other ways to make the visor inaccessible via the hypervisor API after step 2, but the process is a bit erratic, so I do not have a specific serie of steps to make it happen.

Expected behavior
The call made in stept 2 should return an error and the procedure should not make the visor inaccessible

Additional context

Possible implementation

It is not possible to make operations related to the routes with the API

Describe the bug
The hypervisor API has various endpoints that allow to work with the routes, like GET /nodes/{pk}/routes/{rid}. The problem is that it is not possible to get the value that must be sent as the {rid} param, as the hypervisor API does not have an endpoint for getting the ID of the routes.

Calling the GET /nodes/{pk}/routes endpoint only returns a key property for each route, which worked before when sent as the {rid} param, but not anymore, as the API responds error: "invalid UUID length: 0".

Possible implementation
2 possible solutions are:

Add the ID to the GET /nodes/{pk}/routes endpoint.
Make the API endpoints related to the routes work if the key property returned by the GET /nodes/{pk}/routes endpoint is sent as the {rid} param.

Finishing messaging ==> dmsg renaming

There are still instances where we are referring to messaging rather than dmsg. We should remove/rename those instances.

Multiple errors running the hypervisor and visor on mainnet-milestone2

Describe the bug
Using the mainnet-milestone2 branch, the hypervisor does not run. Also, the visor has problems connecting to the dmsg server.

Environment information:

OS: Linux (Ubuntu 18.04.1)
Platform: Linux 4.15.0-65-generic x86_64

Steps to Reproduce
Steps to reproduce the behavior:

Run make install using the mainnet-milestone2 branch.
Run hypervisor gen-config -o hypervisor-config.json, to create a default configuration file for the hypervisor.
Run hypervisor. It will fail with Failed to parse rpc port from rpc address: parse :7080: missing protocol scheme.
Open the configuration file created in the second step and change the value of rpc_addr to localhost:7080.
Run hypervisor again. It will fail with Failed to parse rpc port from rpc address: strconv.ParseUint: parsing "": invalid syntax.
Open the configuration file created in the second step again and change the value of rpc_addr to http://skycoin.net:7080.
Run hypervisor again. The console will start displaying no dms_servers found: trying again in 1s... error="Get /dmsg-discovery/available_servers: unsupported protocol scheme """. The problem with the rpc address appears to be solved.
Open the configuration file created in the second step again and change the value of dmsg_discovery to https://messaging.discovery.skywire.skycoin.net.
Run hypervisor again. The console will start displaying no dms_servers found: trying again in 1s... error="Get https://messaging.discovery.skywire.skycoin.net/dmsg-discovery/available_servers: dial tcp: lookup messaging.discovery.skywire.skycoin.net: no such host".

In a similar way, trying to run a visor with the default config results in no dms_servers found: trying again in 1s... error="json: cannot unmarshal number into Go value of type disc.HTTPMessage" being displayed every second in the console.

Actual behavior
The errors previously described appear. Also, the hypervisor is not connecting with the visors, as it does in the master branch.

Expected behavior
The hypervisor and the visor should work like in the master branch, with the default config. If it is currently necessary to run a local dmsg server for making the visor and hypervisor work, it would be good to add instructions in the readme, and it could be also useful in other cases.

Additional context

Possible implementation

Service Discovery Client

Feature description

https://github.com/SkycoinPro/skywire-service-proxy-discovery

We need a client for this discovery service. This discovery enables Visors to advertise themselves with their geolocation to other visors wanting to connect.

Is your feature request related to a problem? Please describe.
Users should be able to connect to visors with certain properties (geolocation, connection characteristics etc). Most important for now is the location.

Visor apps fail to reconnect Streams

Under milestone2 branches, after a visor is restarted, communication between apps on different visors fails.

Steps to Reproduce

Run generic integration environment
Run make integration-startup
Run curl --data {'"recipient":"'$PK_A'", "message":"Hello Joe!"}' -X POST $CHAT_C
Check that on visor A the message appears and skychat has trashed the "Hello Joe!" message
Stop and restart visor-c
Run step 3) again. Now you should see on node A No stream of given ID messages, and thus it being unable to deliver it to skychat app
Additionally, if in step 5) we restart visor-b instead of visor-c we will see that visors a and c continuously try to reconnect with it without success.

Attached are logs collected from the described scenario.
visor-a.txt
visor-b.txt
visor-c.txt

Test stcp with nettest.

Feature description

Write a test for stcp using nettest.TestConn to ensure that stcp.Conn properly satisfies net.Conn.

Document Routes Endpoints

The routes endpoints in the visor are not documented in the Postman collection.

Merge router2 and app2 changes.

router2: #562
app2: nkryuchkov#1

Multiple errors in production environment

Describe the bug
Multiple errors can be observed in production logs.

Environment information:
Production environment

Steps to Reproduce
Check production logs.

Actual behavior
Production services run with errors.

Expected behavior
Production services run with no errors.

Additional context

Error initiating server connections by initiator: findOrConnectToServers: all servers failed

read: connection reset by peer)
[2019-12-26T19:20:23Z] WARN [dms_server]: failed to write frame: write error: write tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection error="write error: write tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection" srcClient=0242880d65dda463b8a0ca630ab7ebbc98e2bd9fb5172536218ddde2f705827901
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53939: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dms_server]: failed to write frame: write error: write tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection error="write error: write tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection" srcClient=0242880d65dda463b8a0ca630ab7ebbc98e2bd9fb5172536218ddde2f705827901
[2019-12-26T19:20:23Z] INFO [dms_server]: ClosingConn connCount=3 error="read failed: EOF" srcClient=0242880d65dda463b8a0ca630ab7ebbc98e2bd9fb5172536218ddde2f705827901
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53940: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dms_server]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53939: use of closed network connection" srcClient=0242880d65dda463b8a0ca630ab7ebbc98e2bd9fb5172536218ddde2f705827901
[2019-12-26T19:20:23Z] INFO [dms_server]: connection with client 0242880d65dda463b8a0ca630ab7ebbc98e2bd9fb5172536218ddde2f705827901 closed: error(read failed: EOF)
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53939: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dms_server]: failed to write frame: write error: write tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection error="write error: write tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection" srcClient=030903ae94ad689968367b2a6618587688d584886bf660151e7a6d3eb477796604
[2019-12-26T19:20:23Z] INFO [dms_server]: ClosingConn connCount=2 error="read failed: EOF" srcClient=030903ae94ad689968367b2a6618587688d584886bf660151e7a6d3eb477796604
[2019-12-26T19:20:23Z] WARN [dms_server]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53940: use of closed network connection" srcClient=030903ae94ad689968367b2a6618587688d584886bf660151e7a6d3eb477796604
[2019-12-26T19:20:23Z] INFO [dms_server]: connection with client 030903ae94ad689968367b2a6618587688d584886bf660151e7a6d3eb477796604 closed: error(read failed: EOF)
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53940: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53939: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T19:20:23Z] WARN [dmsg]: Failed to close connection error="close tcp 10.244.2.149:8080->192.168.143.73:53912: use of closed network connection"
[2019-12-26T20:05:28Z] INFO [dms_server]: ClosingConn connCount=1 error="read failed: read tcp 10.244.2.149:8080->192.168.143.73:53891: read: connection reset by peer" srcClient=029ecd5bd8ff5c591931bb2d0ffc14cb09eb5c4707712df7e4776fe79f565d7a75
[2019-12-26T20:05:28Z] INFO [dms_server]: connection with client 029ecd5bd8ff5c591931bb2d0ffc14cb09eb5c4707712df7e4776fe79f565d7a75 closed: error(read failed: read tcp 10.244.2.149:8080->192.168.143.73:53891: read: connection reset by peer)

[dmsgC_httpS]: no dmsg_servers found: trying again in 1s... error="something unexpected happened"

Fix hypervisor endpoints

The following endpoints seem to be broken:

PUT /api/nodes/{pk}/apps/{apps}
POST /api/nodes/{pk}/transports
DELETE /api/nodes/{pk}/transports/{tid}

Visors are intermittently disconnecting to the hypervisor

Describe the bug
When using the hypervisor to get info about the visors, it is very common to find errors due to the visors being intermittently disconnected from the hypervisor.

Environment information:

OS: Linux (Ubuntu 18.04.1)
Platform: Linux 4.15.0-65-generic x86_64

Steps to Reproduce
If you call GET http://{localIp}:8080/api/visors frequently, some times the response will have one or more visors with most fields empty.

Also, calling any of the API endpoints for getting info about a specific visor sometimes results in geting unexpected EOF or connection is shut down.

Actual behavior
The API is returning invalid responses when the connection to the visors is lost for a brief period of time.

The problem is frequent enough to be quite annoying in the manager UI.

Expected behavior
The API should return the expected responses.

Additional context

Possible implementation
If there is any serious complication for implementing a solution in the hypervisor, the client could implement something like a “noice cancellation” procedure to detect in which cases the hypervisor is just having a temporary disconnection. This would sometimes make the UI slower but should work.

If this is going to be done, it would be good to document the need to do so in some location related to the API, including ways to detect the disconnection and the amount of time in which a reconnection could be expected, so anyone using the API is aware of the need to do something similar.

Backport master fixes to Milestone2

Describe the bug
Some fixes have been applied to master but not to milestone2. Notably, these include

fixes to the hypervisor made by Evan
changing app config values from slice to map
listening on stcp by default
transport deregistration logic changes
remove therealssh

Invalidate the Hypervisor session after changing the password

Feature description

Describe the feature
If the Hypervisor is ever going to be used remotely by various users, changing the password should act as a way for instantly blocking access to unauthorized users at any time. Invalidate all sessions of the user should be part of that.

Is your feature request related to a problem? Please describe.
After changing the password o fan account, unauthorized users would still have access as long as they keep an earlier session open.

Describe the solution you'd like
All sessions of a particular account should be invalidated just after changing its password.

Describe alternatives you've considered

Additional context
This is not critical, as system administrators can delete all sessions by restating the Hypervisor, but would be more convenient.

Possible implementation

It is very difficult to delete the transports

Describe the bug
Using the hypervisor API it is very difficult to delete a transport from a visor, because after removing it from one visor, the other visor will create it again shortly after.

Environment information:

OS: Linux (Ubuntu 18.04.1)
Platform: Linux 4.15.0-65-generic x86_64

Steps to Reproduce
Steps to reproduce the behavior:

Create a transport with the POST /api/visors/{pk}/transports API endpoint.
Delete the transport with the DELETE /api/visors/{pk}/transports/{tid} API endpoint.
Use the GET /api/visors/{pk}/transports API endpoint to check the current transports list.

Actual behavior
If you call the GET /api/visors/{pk}/transports API endpoint just after deleting the transport, most likely the transport will not be listed, but after some time it will be there again, as the other visor will create it again.

Expected behavior
Transports should stay deleted.

Additional context

Possible implementation
The operation for deleting the transport should be informed to both visors, to avoid having the transport being created again after a short amount of time. Another option is to make the first visor wait for the second one to try to create the transport again and then inform it that the transport should be deleted there too.

As a temporary workarround, when deleting a transport the Manager could try to delete it in both visors if both are connected to the hypervisor, and show a warning to the user if one of the visors is not connected.

Remove therealssh from codebase

Describe the bug
Dmsgpty suppprt was recently added and there does not seem to be a point in keeping therealssh around. It seems difficult enough to maintain to warrant being removed.

Make install does not work on Windows

Describe the bug
Running make install on windows fails.

Environment information:

OS: Windows 10 (Uasing MinGW32)
Platform: X64

Steps to Reproduce
Steps to reproduce the behavior:

Run make install on the root of the repo.

Actual behavior
The process exits with the following error:

build github.com/SkycoinProject/skywire-mainnet/cmd/skywire-visor: cannot load github.com/sirupsen/logrus/hooks/syslog: no Go source files

Expected behavior
The proces should finish correctly.

Additional context
The only go file inside vendor/github.com/sirupsen/logrus/hooks/syslog starts with // +build !windows,!nacl,!plan9, so it is ignored on Windows.

Possible implementation
Apparently, to be fully multiplatform the code on github.com/sirupsen/logrus/hooks/syslog must not be used. The main Skycoin repo also uses logrus, but vendor/github.com/sirupsen/logrus/hooks/syslog was no vendored by dep, so I think the code in the main repo avoided using that part and that is why it works well on Windows.

Setup CLA

We should require signing of a CLA (Contributor License Agreement) for contribution.

Improve naming consistency and documentation

From Discord:

We are currently

using messaging in many instances instead of dmsg.
using node instead of visor (notably in the skywire-cli command and many more instances in the codebase)
using therealproxy where we should probably switch to a more straightforward name of skyproxy
talking in documentation about docker setups that do not exist anymore

Document stcp usage

Currently the stcp transport is working, but it is not documented how to get it running, since it requires manual creation of a pktable.

document with an example how to setup the stcp transport

The hypervisor returns inaccessible visors

Describe the bug
When calling GET http://{localIp}:8080/api/visors the hypervisor appears to return the list of all the nodes that have been connected to it, even if there is no connection to them anymore. Also, the response does not include any info indicating if the visor is connected to the hypervisor.

Environment information:

OS: Linux (Ubuntu 18.04.1)
Platform: Linux 4.15.0-65-generic x86_64

Steps to Reproduce
Steps to reproduce the behavior:

Start some nodes using the make integration-run-generic command of the skywire-services repository
Call GET http://{localIp}:8080/api/visors. You should get a list with 3 visors.
Call GET http://{localIp}:8080/api/visors/024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7. You should get the basic info of the visor.
Go to the console window used in step 1, press Ctrl+b and then 6 to open the tab of the first visor. Then press Ctrl+c to stop the visor.
Call GET http://{localIp}:8080/api/visors again. You should still get a list with 3 visors, but most of the fields of the 024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7 visor will now be empty.
Call GET http://{localIp}:8080/api/visors/024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7 again. You should get "error": "connection is shut down".

Actual behavior
GET http://{localIp}:8080/api/visors returns nodes that are not connected. One way to tell which nodes are not connected is by checking if the node_version and app_protocol_version fields are empty, but the problem is that the API endpoint sometimes return those fields empty for valid visors. Maybe that is related to #28 .

Expected behavior
Only connected visors should be returned. If returning visors that are not connected is a measure to minimize the effects of #28 . Then a procedure to deal with the problem should be created, because this issue affects the ability to show a list of connected nodes to the user.

Additional context

Possible implementation
Similar to #28 :

If there is any serious complication for implementing a solution in the hypervisor, the client could implement something like a “noice cancellation” procedure to detect in which cases the hypervisor is just having a temporary disconnection. This would sometimes make the UI slower but should work.

If this is going to be done, it would be good to document the need to do so in some location related to the API, including ways to detect the disconnection and the amount of time in which a reconnection could be expected, so anyone implmenting the API is aware of the need to do something similar.

[M2] Default route group I/O timeout

Describe the bug
The default I/O timeout for route group is hard coded now into constant. If we try to fully satisfy the net.Conn, we should remove the default timeout, since most of client applications expect such behavior

skycoin / skywire Goto Github PK

skywire's People

Contributors

Stargazers

Watchers

Forkers

skywire's Issues

Recommend Projects

Recommend Topics

Recommend Org