Coder Social home page Coder Social logo

Comments (5)

abyrd avatar abyrd commented on July 30, 2024

On the backend, the only log entry I see is 09:43:34.218 [qtp1413184129-17] ERROR c.c.t.c.SinglePointAnalysisController - analysis error which is also not very helpful in understanding what's gone wrong.

I can describe what I did immediately before the problem arose: I was doing analysis on all of Switzerland with walk+transit modes. I then switched to car mode. The worker re-linked the whole pointset for car mode. Now searches with any mode including walk+transit fail in this way.

The worker does not show any errors or any sign of failure. After a while the whole system begins working again, and in the worker log we see:

Jul 17, 2017 9:49:37 AM org.apache.http.impl.execchain.RetryExec execute

09:49:38
INFO: I/O exception (java.net.SocketException) caught when processing request to {}->http://10.0.0.120:9001: Connection timed out (Write failed)

09:49:38
[pool-3-thread-1] WARN com.conveyal.r5.analyst.cluster.AnalystWorker - Failed to mark task 296426 as completed.

09:49:38
org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java

09:49:38
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:107) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ... 7 more

09:49:38
Caused by: java.net.SocketException: Connection timed out (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126) at org.apache.http.impl.io.S

09:49:38
[pool-1-thread-4] ERROR com.conveyal.r5.analyst.cluster.AnalystWorker - Error writing travel time surface to broker
[pool-1-thread-4] ERROR com.conveyal.r5.analyst.cluster.AnalystWorker - Error writing travel time surface to broker

09:49:38
java.io.IOException: Pipe closed at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260) at java.io.PipedInputStream.receive(PipedInputStream.java:226) at java.io.PipedOutputStream.write(PipedOutputStream.java:149) at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253) at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211) at java
java.io.IOException: Pipe closed
at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
at java.io.PipedInputStream.receive(PipedInputStream.java:226)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at com.conveyal.r5.analyst.TravelTimeSurfaceComputer.write(TravelTimeSurfaceComputer.java:161)
at com.conveyal.r5.analyst.cluster.AnalystWorker.handleTravelTimeSurfaceRequest(AnalystWorker.java:534)
at com.conveyal.r5.analyst.cluster.AnalystWorker.handleOneRequest(AnalystWorker.java:414)
at com.conveyal.r5.analyst.cluster.AnalystWorker.lambda$null$6(AnalystWorker.java:671)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

I wonder if the single-point side channel was somehow blocked by a previous timeout. Could be related to R5 issue #303.

from analysis-backend.

mattwigway avatar mattwigway commented on July 30, 2024

I think I know what is going on here. The first thing is that the frontend expects all error responses to be JSON TaskErrors, so we should change the error returned by the backend. I think the root cause of the failure though is that Java PipedStreams don't work the way pipes work in any other language; I suspect the writing thread is dying before the response has been written to the broker, and the broker is then never sending it on to the frontend.

from analysis-backend.

abyrd avatar abyrd commented on July 30, 2024

This is still happening, specifically when working with the Netherlands network. The client locks on a white screen and shows the message "Fetch Error Unexpected token C in JSON at position 0". This is because the client is expecting a JSON error response, but the backend is just returning the plain text "Could not contact broker" (with content-type JSON).

The dearth of information is due to SinglePointAnalysisController line 84. It's swallowing the exception and returning a meaningless string via Spark's halt() method. The log message contains no detail: 08:02:41.590 [qtp802836797-319] ERROR c.c.t.c.SinglePointAnalysisController - analysis error.

Suddenly after a long time, spontaneously, the errors have ceased. This seems to be happening on large graphs that take so long to start up that the first single-point request fails with a timeout. Maybe some thread is left in a bad state by that initial failure (as mentioned above).

The first thing to do is to add more logging on the back end and return meaningful errors, potentially including a stack trace, up to the client so we can at least see what's failing. We should do this every single place an exception is caught.

from analysis-backend.

abyrd avatar abyrd commented on July 30, 2024

The errors are network timeouts. Strangely the server is also logging this message: 10:08:11.245 [qtp802836797-16] INFO spark.webserver.MatcherFilter - The requested route [/api/analysis/enqueue/single] has not been mapped in Spark and the system never seems to work again until you restart the broker (after the worker is up and running), which does remedy the situation.

from analysis-backend.

trevorgerhardt avatar trevorgerhardt commented on July 30, 2024

Lot's of changes to error handling since this Issue has last been updated, should be much clearer now!

from analysis-backend.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.