Coder Social home page Coder Social logo

Comments (37)

antonversal avatar antonversal commented on July 17, 2024

I can confirm, I have the same

from grpc-node.

cuisonghui avatar cuisonghui commented on July 17, 2024

I have the same problem.version 1.8.4

from grpc-node.

rhelenagh avatar rhelenagh commented on July 17, 2024

Me too!

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

Please, where is author?

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

I handle this specific error with code 14 and do this

oldClient.close()
const newClient = new Service(addr, grpc.credentials.createInsecure())

But newClient cannot connect to Server in some situations, so how to fully renew a client?

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

It seems we are hitting something like this as well. If a server is shutdown clients are not able to reconnect to a new instance. Instead they keep trying to complete requests against the old host and fails with error 14.

Currently our only work around is to restart the service to get it running on the new server hosts.
Are there anything we can do to assist with this?

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

Thanks for your reply.
What is on the new server hosts do you mean?
I just restart server, and in some situation, client request fails and gives error 14. In such situation, new Service is not effective. I can ensure I have changed to new client to request, but I keep getting error 14 after that.

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

We have services running in a cluster where we deploy new versions regularly. When a new version of the server is started and the old one is shut down, clients to the old service gets into above mentioned state.

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

Yes, your situation is the same with mine. So don't you have solution too?

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

Currently we've reverted to version 1.7.3 and we will investigate some more.

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

thanks for your share

from grpc-node.

fenos avatar fenos commented on July 17, 2024

This is affecting me as well, especially while i'm developing.

I have my nodemon restarting the server at every change.
Once the server came back to accept connection the client returns with a Connection Failed 14 Failed to read endpoint.

this is very annoying as now i need to restart the client too.

Is there any workaround for the time being?

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

Does v1.9.0 fix this?

from grpc-node.

rhelenagh avatar rhelenagh commented on July 17, 2024

Yes with version 1.9.0 is fixed! Thanks a lot!

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

@Crevil Can you confirm that v1.9.0 fixes this problem like @rhelenagh said?

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

Ok, I believe you. Thanks.

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

@Crevil Hello, after testing a lot, I find there is another problem at v1.9.1.
Sometimes when I restart grpc server, client request doesn't arrive server anymore and tcp connection doesn't appear too by ss -atpn. At the same time, I cannot see any error to let me restart client like as before.

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

@zyf0330 Actually we just experienced something alike yesterday. A client silently stopped receiving responses from the server, but a restart of the client service did fix it.
I'm currently looking into what happened and how we can detect it.

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

Thanks a lot. I can also fix it by restarting client.

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

The error appeared somewhat differently than the issues mention above. Last time we got an error code 14 on requests when the server had restarted. This time we got nothing. I suspect an exception was thrown, but we didn't expect this and therefor had no try catch around the client calls.

I was not able to reproduce the issue against the grpc package. In the initial post you mention you could replicate this. Do you have this setup available anywhere, e.g. GitHub?

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

I use a client which send request to server every 1 second, and restart server to make it happen. But its probability is low, I have no method to reproduce this problem reliably. Sorry.

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

We just made a couple of tests that do not indicate an issue with the grpc layer of things, although it still is my number one suspect. The test was as follows.

Run an HTTP Node.js server (service A) containing a gRPC client.
On inbound HTTP requests a gRPC request is send to a Go gRPC server (service B) and service A responds to the HTTP request.
We setup siege with 25 concurrent users without delay between requests. This resulted in around 100 requests/sec.

siege \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  https://endpoint.to.service.a.com/handler

When siege got running we shutdown service B and let it startup on another IP in the cluster.
Service A would hang in the transition when only a single replica of service B was running but when it got up service A resumed its requests.

The above was repeated several times with 1-3 replicas of service B.

Every time service A was able to recover from the dropout and continue to handle inbound requests.

from grpc-node.

murgatroid99 avatar murgatroid99 commented on July 17, 2024

When you shut the servers down, did you do a graceful shutdown or just kill the process? Did you observe a significant time delay between starting up the service B replica and having your requests start to complete again?

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

Killed to process. There was no noticeable delay in the startup. We are running a kubernetes setup so there are around 10 seconds delay from service to actually receiving traffic. When running a single replica service A was able to complete requests around that time.
With multiple replicas I noticed no delay at all. Traffic shiftet nicely to other services only failing those requests actually running on the service B instance when it was killed.

from grpc-node.

murgatroid99 avatar murgatroid99 commented on July 17, 2024

From your description, gRPC seems to be working as expected. When the existing server goes away, new requests start trying to connect to a new server, and when that server becomes available, those new requests start completing again.

If possible, the best way to handle that kind of situation would be to start up the new replica and wait for it to be ready to accept requests, then do a graceful shutdown of the existing server. If you do that, existing requests will complete, and new requests will go to the new server.

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

Yeah, that was my conclusion as well. I tested a non-graceful shutdown on purpose to see how the layers would react. Normally we roll new deployments out as you describe.
The reason we decided to test this again was an issue yesterday where we had a service that stopped responding to traffic and the only possibility we can come up with is that the grpc request hangs.
Do you know of any edge-cases where the Node client could get stuck?

(To mitigate this in the future we will make sure that the requests have deadlines.)

from grpc-node.

murgatroid99 avatar murgatroid99 commented on July 17, 2024

If a server dies abruptly, then pending calls against that server will probably not end until either that call's deadline or the TCP timeout, or maybe an HTTP/2 ping timeout. And the client probably also waits for the TCP timeout or the HTTP/2 ping timeout before attempting to reconnect to the new server. That may look like a hang, depending on how long those timeouts are.

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

@Crevil In my situation, I restart server by stop it normally, the signal should be SIGINT.
And about request timeout, it is useless if I have only one server, even if client can get timeout error.

from grpc-node.

Crevil avatar Crevil commented on July 17, 2024

Thanks for clarifying @murgatroid99. I suspekt that to be the case then. We've been using this package with Protobuf.js for the proto implementations but the TypeScript typings generated by pbts effectively hides the additional parameters for service methods, eg. passed to Client.makeUnaryRequest(). Because of this I had no deadlines setup. (Sure thing, I should have thought about it)

@zyf0330 The deadlines lets you retry the request or find an alternative result for the clients. That should be useful. But it seems that the problem we experienced is not consistent with this one after all. As mentionen, under high load we could not make the error appear.

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

When I can reproduce it, I will help.

from grpc-node.

nicolasnoble avatar nicolasnoble commented on July 17, 2024

Was there any luck, @zyf0330 ?

from grpc-node.

zyf0330 avatar zyf0330 commented on July 17, 2024

Sorry, I didn't work with this problem recently.

from grpc-node.

nicolasnoble avatar nicolasnoble commented on July 17, 2024

Alright, I'll close this one for now, as this might even be two different separate issues anyway. If you come around and manage to have a reproduction case for us, please open a new issue with the details of the reproduction.

from grpc-node.

anancds avatar anancds commented on July 17, 2024

I use tensorflow1.4 and had the same issue

from grpc-node.

nicolasnoble avatar nicolasnoble commented on July 17, 2024

@anancds please don't bump older issues like these. This one is confusing anyway because several different people added unrelated problems on top of it, so this isn't a useful comment. Please open a new issue detailing what's going on using our issue template.

from grpc-node.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.