Coder Social home page Coder Social logo

Comments (18)

jchambers avatar jchambers commented on August 19, 2024

Hm. I'm not really comfortable with the unsafe approach, especially since the Netty docs are pretty clear that those methods are:

Unsafe operations that should never be called from user-code.

I wonder if we should just not try to close the connection when an IO problem happens and couple that with more aggressive connection status checking in the READY state. Could there be IO problems that will leave the connection open, but in a bad state until reconnected? If the APNs gateway doesn't like what it's seeing, it will close the connection remotely. It seems like anything else would be temporary (full pipe) or would result in a closed connection anyhow.

Thoughts?

from pushy.

andrewscode avatar andrewscode commented on August 19, 2024

I wonder if we should just not try to close the connection when an IO problem happens and couple that with more aggressive connection status checking in the READY state

Unfortunately, this will still result in a race condition, albeit, a smaller one than what currently exists.

Could there be IO problems that will leave the connection open, but in a bad state until reconnected?

Absolutely

If the APNs gateway doesn't like what it's seeing, it will close the connection remotely. It seems like anything else would be temporary (full pipe) or would result in a closed connection anyhow.

Yes, but an attempted read must occur before the connection is closed. Unfortunately netty is flawed in this respect. Unless there is some other way to force netty to read from the socket on a write exception, I don't see a better solution that doesn't have race conditions.

from pushy.

jchambers avatar jchambers commented on August 19, 2024

But why male models?

I think I'm missing something important either in my understanding of the problem or the "force a read" strategy for solving it. Here's my understanding of the problem:

  1. We send a bad notification to the APNs gateway.
  2. We attempt to write a subsequent notification (good or bad) to the channel, but the write fails for some reason.
  3. We notice the write exception and immediately close the connection/try to reconnect.
  4. The first, bad notification arrives the the APNs gateway.
  5. The APNs gateway writes a six-byte error response and closes the connection on its end.
  6. Our FIN packet arrives at the APNs gateway.
  7. The APNs error response arrives on our end, but the connection is already closed, so we never actually read the error response and notify our RejectedNotificationListeners.

Is that fair?

So, next up, what does it mean to "force a read?" Is that a blocking operation? If not, it seems like we're just making sure we process the bytes we already have in the inbound buffer, but in my version of events above, that wouldn't actually help because the bytes wouldn't have arrived by the time we close the connection locally. If it is a blocking operation, it seems like we'd be in trouble in cases where we have a write failure without a notification rejection (since we'd sit there waiting for an error response that would never come).

Am I missing something or misunderstanding? I'm not trying to be contrary here; I just want to make sure we're on the same page.

Thanks!

from pushy.

andrewscode avatar andrewscode commented on August 19, 2024

Couple of corrections

1.) We send a bad notification to the APNs gateway.
2.) We continue writing subsequent notifications (good or bad) to the channel, no error yet

While we are writing notifications the following occurs:
3.) The first, bad notification arrives at the APNS gateway.
4.) The APNs gateway writes a six-byte error response and closes the connection on its end.
5.) For every packet received by the APN after this point, it will send a RST packet back. (it will send lots of these depending on the number of notifications after the bad notification)

6.) We continue to write a subsequent notification (good or bad) to the channel, and the 6 byte packet arrives along with the FIN packet. (netty hasn't read yet).
7.) The RST packet arrives, and netty tries to write a notification out. The write fails with a EPIPE exception (on windows this is slightly different and more buggy. but I'll keep this topic to linux).
8.) the operationComplete listener is fired with a "cause" of IOException.

So at this point in time there are 6 bytes to be read from the socket. Netty has not read them in yet, but they are available if the socket isn't closed.

I know in your listener you called "reconnect", which is another race condition. But for now, let's say you don't call reconnect. After the operationComplete method is finished, Netty will push a task on it's queue to close the socket. This task will run before netty will read from the socket. At that point the 6 bytes are lost.

So, next up, what does it mean to "force a read?" Is that a blocking operation?

Calling the proper method "channel.read()", doesn't actually read anything. It just schedules a read to occur in the next event loop (not to be confused with netty task loop). So when I say "force", I mean read the socket, don't schedule it for some future date. It is not a blocking operation. It reads any bytes that are immediately available to be read (which in this case is 6 bytes). If there are 0 bytes to read, it won't read anything and it won't block.

Am I missing something or misunderstanding? I'm not trying to be contrary here; I just want to make sure we're on the same page.

Thanks!

No worries, it's best to be on the same page. I should point out one last thing that this is entirely a race condition. it's possible to run this scenario and not see the problem every time. I was able to reproduce it every time by breaking after each write and watching the packet's go out and back in (and waiting for that RST packet). Once the RST packet arrived and netty wrote more bytes to the wire, the exception occurred and the 6 bytes are lost.

from pushy.

jchambers avatar jchambers commented on August 19, 2024

Okay; that's a more specific problem than I was imagining. Will dig in and see what I can do. It sounds like you're familiar with some of the upstream discussion around this issue; if so (and if you remember off the top of your head), can you point me to it? If not, please don't worry about it and I'll find it on my own.

Thanks for the detailed explanation!

from pushy.

impactmobile-dev avatar impactmobile-dev commented on August 19, 2024

I'm not aware of any upstream discussion... I'm sure it's out there but I was unable to find any info on this specific problem.

Cheers,

from pushy.

jchambers avatar jchambers commented on August 19, 2024

Cool. Thanks for looking into it.

from pushy.

andrewscode avatar andrewscode commented on August 19, 2024

Hm. I'm not really comfortable with the unsafe approach, especially since the Netty docs are pretty clear that those methods are:

Unsafe operations that should never be called from user-code.

One last thought on that comment. It also mentions:

Unsafe operations that should never be called from user-code. These methods are only provided to implement the actual transport, and must be invoked from an I/O thread

The method that ((NioUnsafe)future.channel().unsafe()).read(); is called from is run inside Netty's IO thread. So it would at least satisfy that requirement.

from pushy.

jchambers avatar jchambers commented on August 19, 2024

FYI, am finally digging into the internals of Netty to understand this a little better (not that I doubt your explanation; just trying to figure out if I can offer an upstream fix). Noticed this in the process: netty/netty#1952

from pushy.

andrewscode avatar andrewscode commented on August 19, 2024

Good find. What he's describing sounds just like our issue.

from pushy.

jchambers avatar jchambers commented on August 19, 2024

Looks like the upstream issue has been fixed; will close this as soon as Netty 4.0.12 is released and can be added as a dependency (and, of course, we make the necessary changes here).

@andrewscode Thank you for seeing both this and the Netty issue through.

from pushy.

andrewscode avatar andrewscode commented on August 19, 2024

No Problem. It's a good fix in general for netty. I'm not sure how it will work out with push notifications, simply because there's no way to wait for a read to occur. For instance, when a write exception occurs, the socket should eventually be closed but after the read. How one will synchronize that remains to be seen. If I call channel.read() then call channel.close(), is that good enough? Is that order guaranteed?

An even easier solution would be for apple to send back acks like a normal protocol. =)

from pushy.

jchambers avatar jchambers commented on August 19, 2024

An even easier solution would be for apple to send back acks like a normal protocol. =)

Seriously. That'd probably cut out about half of the code in Pushy ;)

from pushy.

flozano avatar flozano commented on August 19, 2024

I Hear You

from pushy.

jchambers avatar jchambers commented on August 19, 2024

@andrewscode I think this is fixed in #17. Could you please verify?

from pushy.

jchambers avatar jchambers commented on August 19, 2024

So it sounds like #17 isn't going to solve this on its own. We'd need an additional upstream fix (netty/netty#1985), but that's been closed as WONTFIX for now. If the upstream issue isn't resolved (or at least reopened) by Wednesday, I'm just going to go ahead and use the unsafe approach.

from pushy.

andrewscode avatar andrewscode commented on August 19, 2024

It's unfortunate they won't add a simple way to read from the socket. He's not understanding the problem correctly and is trying to solve it in a way without fully understanding the apn protocol.

Perhaps there's a different way we can ask for a fix that will solve our problems. I may look over their open issues and see if someone is having a similar problem (with a different protocol)

from pushy.

jchambers avatar jchambers commented on August 19, 2024

Agreed.

In the meantime, I've made #17 use the NioUnsafe approach (see d2b15c0). That should solve the issue for now; as soon as a better solution comes along, we'll switch to that.

from pushy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.