Comments (1)
This is probably the cause of the error in testGettingManyObjects
in stress_tests.py
in this log https://travis-ci.org/ray-project/ray/jobs/200210188.
testGettingManyObjects (__main__.TaskTests) ... Waiting for redis server at 127.0.0.1:38081 to respond...
10098:M 10 Feb 07:14:23.632 # Server started, Redis version 3.9.102
Failed to connect to the redis server, retrying.
Waiting for redis server at 127.0.0.1:38081 to respond...
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:870) Allowing the Plasma store to use up to 3.44GB of memory.
[INFO] (/Users/travis/build/ray-project/ray/src/photon/photon_scheduler.c:799) Start worker command is python /Users/travis/.local/lib/python3.5/site-packages/ray-0.0.1-py3.5.egg/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --object-store-name=/tmp/plasma_store79902205 --object-store-manager-name=/tmp/plasma_manager74482700 --local-scheduler-name=/tmp/scheduler28728453 --redis-address=127.0.0.1:38081
[ERROR] (/Users/travis/build/ray-project/ray/src/common/io.c:115: errno: Connection refused) Connection to socket failed for pathname /tmp/scheduler28728453.
[FATAL] (/Users/travis/build/ray-project/ray/src/photon/photon_client.c:14: errno: Bad file descriptor) Check failure: success == 0
Unable to register worker with local scheduler
0 libphoton.so 0x0000000102e0f1ff photon_connect + 287
1 libphoton.so 0x0000000102e0c24e PyPhotonClient_init + 78
[ERROR] (/Users/travis/build/ray-project/ray/src/common/io.c:115: errno: Connection refused) Connection to socket failed for pathname /tmp/scheduler28728453.
[FATAL] (/Users/travis/build/ray-project/ray/src/photon/photon_client.c:14: errno: Bad file descriptor) Check failure: success == 0
Unable to register worker with local scheduler
0 libphoton.so 0x0000000102e101ff photon_connect + 287
1 libphoton.so 0x0000000102e0d24e PyPhotonClient_init + 78
2 libpython3.5m.dylib 0x0000000100062329 type_call + 281
3 libpython3.5m.dylib 0x000000010000fd73 PyObject_Call + 99
4 libpython3.5m.dylib 0x00000001000bd766 PyEval_EvalFrameEx + 23590
5 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
6 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
7 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
8 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
9 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
10 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
11 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
12 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
13 python 0x0000000100000dc7 main + 215
14 python 0x0000000100000ce4 start + 52
15 ??? 0x0000000000000007 0x0 + 7
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 19
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 16
2 libpython3.5m.dylib 0x0000000100062329 type_call + 281
3 libpython3.5m.dylib 0x000000010000fd73 PyObject_Call + 99
4 libpython3.5m.dylib 0x00000001000bd766 PyEval_EvalFrameEx + 23590
5 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
6 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
7 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
8 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
9 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
10 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
11 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
12 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
13 python 0x0000000100000dc7 main + 215
14 python 0x0000000100000ce4 start + 52
15 ??? 0x0000000000000007 0x0 + 7
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 18
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 15
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 12
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 9
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 13
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 10
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 14
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 11
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 15
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 12
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 16
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 13
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 17
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 14
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 19
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 17
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 20
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 21
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 11
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 18
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 19
10098:signal-handler (1486710883) Received SIGTERM scheduling shutdown...
10098:M 10 Feb 07:14:43.662 # User requested shutdown...
10098:M 10 Feb 07:14:43.662 # Redis is now ready to exit, bye bye...
ok
The test still passes even though one of the workers failed to connect. Also, when we run self.assertTrue(ray.services.all_processes_alive())
, it doesn't catch the dead worker because workers are started from the local scheduler now (and not from services.py
).
from ray.
Related Issues (20)
- [Core] `ray.get_runtime_context().namespace` returns incorrect namespace in Actor Tasks HOT 1
- Release test serve_scale_replicas.aws failed HOT 2
- Release test chaos_dataset_shuffle_push_based_sort_1tb.aws failed HOT 1
- Release test serve_autoscaling_load_test.aws failed HOT 4
- RLlib Custom Gymnasium Example Fails HOT 6
- [RLlib] missing __init__ file
- RayService GCS not able to use AWS Elastic Cache Serverless Instance configured with Encryption in Flight (TLS) HOT 2
- [<Ray component: Core|RLlib|etc...>] HOT 1
- [Data] Support task reassignment in actor_pool_map_operator to improve fault tolerance
- [serve] strict enforcement of max ongoing requests doesn't work with lightweight update
- [Data] Ray Data prematurely closes progress bar
- [Train][Tune] Add stacklevel for all `logger.warning` calls.
- [Data] Ray Data progress bars left partially completed after execution completes on Jupyter notebook
- Release test dataset_shuffle_sort_1tb.aws failed HOT 4
- [core] ObjectRef-to-ActorHandle losts ref to actor after `ray.get` HOT 1
- CI test linux://python/ray/workflow:examples/comparisons/prefect/compute_fib_workflow is consistently_failing HOT 5
- RLlib get env object from (PPO) algorithm HOT 1
- [core] Using .options(concurrency_group) does not work if there is a task already running on the main thread HOT 1
- Adjust GitHub project description
- tmp directory path issue between Windows client and Linux Ray cluster head
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.