Coder Social home page Coder Social logo

Comments (10)

YYDS-cc avatar YYDS-cc commented on June 21, 2024

Additionally, when I run
python main.py object_nav_ithor_ppo_one_object -b projects/tutorials -s 12345
the monitor goes black momentarily, I know this is to open the search window, but after the monitor is back up, the terminal's info is no longer updated.
I have also run
sudo python scripts/startx.py &
but it doesn't do anything.

from allenact.

jordis-ai2 avatar jordis-ai2 commented on June 21, 2024

Hi @YDDS-cc,

Given your setup, I think it would be worth it to try using THOR in headless mode. For that, you need to pass a gpu_device instead of an x_display (using the CloudRendering platform). You can see an example here:

Let us know if this unblocked you!

from allenact.

YYDS-cc avatar YYDS-cc commented on June 21, 2024

Hi @jordis-ai2 ,
i try to change the headless to True, it doesn't working.

And i also try to comment out these code, It's still not working.

Did I change the code in the wrong place?

from allenact.

jordis-ai2 avatar jordis-ai2 commented on June 21, 2024

I think it I need to see the output you get when using headless mode. Can you copy it here?

from allenact.

YYDS-cc avatar YYDS-cc commented on June 21, 2024

[09/01 17:24:13 INFO:] Running with args Namespace(approx_ckpt_step_interval=None, ... ,[main.py: 452]
[09/01 17:24:18 INFO:] Git diff saved to experiment_output/used_configs/ObjectNavThorPPO/2023-09-01_17-24-15 [runner.py: 890]
[09/01 17:24:18 INFO:] Config files saved to experiment_output/used_configs/ObjectNavThorPPO/2023-09-01_17-24-15 [runner.py: 935]
[09/01 17:24:18 INFO:] Using 1 train workers on devices (device(type='cuda', index=0),) [runner.py: 317]
[09/01 17:24:19 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:19 INFO:] Using local worker ids [0] (total 1 workers in machine 0) [runner.py: 326]
[09/01 17:24:19 INFO:] Started 1 train processes [runner.py: 595]
[09/01 17:24:19 INFO:] Using 1 valid workers on devices (device(type='cuda', index=1),) [runner.py: 317]
[09/01 17:24:19 INFO:] Started 1 valid processes [runner.py: 622]
[09/01 17:24:21 INFO:] valid 0 args [...][runner.py: 433]
[09/01 17:24:21 INFO:] train 0 args [...] [runner.py: 416]
[09/01 17:24:22 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:22 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:29 INFO:] Starting 0-th VectorSampledTask worker with args [...]
[09/01 17:24:31 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args [...]
[09/01 17:24:31 INFO:] Starting 1-th VectorSampledTask worker with args [...]
[09/01 17:24:33 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args [...]
[09/01 17:29:33 ERROR:] [train worker 0 ] Encountered TimeoutError , exiting. [engine.py: 1858]
File "/allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", Line 272,in read_with_timeout
raise TimeError(
TimeouError: Did not receive output from 'VectorSampledTask' worker for 300 seconds.
[engine.py: 1861]
[09/01 17:29:34 ERROR:] Encountered Exception. Terminating runner. [runner.py: 1467]
[09/01 17:29:34 ERROR:] Traceback (most recent call last):
File "/allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[runner.py: 1468]
Traceback (most recent call last):
File "/allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[09/01 17:29:34 INFO:] Terminating train 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Terminating valid 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Termination signal sent to worker Train-0. Worker Train-0 is already closed, exiting. [runner.py: 348]
[09/01 17:29:34 INFO:] Joining train 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Termination signal sent to worker Valid-0. Forcing worker Valid-0 to close and exiting. [runner.py: 353]
[09/01 17:29:35 INFO:] Closed train 0 [runner.py: 1543]
[09/01 17:29:35 INFO:] Joining valid 0 [runner.py: 1543]
[09/01 17:29:35 INFO:] Closed valid 0 [runner.py: 1543]

from allenact.

jordis-ai2 avatar jordis-ai2 commented on June 21, 2024

If you do export ALLENACT_DEBUG_VST_TIMEOUT=1000 before calling the command you are currently using to start your experiment, does it also fail (just after a longer period of waiting)?

from allenact.

YYDS-cc avatar YYDS-cc commented on June 21, 2024

Changing the waiting time doesn't work.
Actually, export ALLENACT_DEBUG_VST_TIMEOUT=1000 can't change the waiting time, it is still 300 seconds.
So I made the change in

for space in read_fn(timeout_to_use=5 * self.read_timeout if self.read_timeout is not None else None) # type: ignore

and I still get the same error, only the waiting time has changed.

from allenact.

jordis-ai2 avatar jordis-ai2 commented on June 21, 2024

I assume at this point you must have already tried starting a standalone THOR controller to ensure everything is correctly installed, but just in case you haven't, can you try to run a script like:

from ai2thor.platform import CloudRendering
from ai2thor.controller import Controller
import cv2

c = Controller(platform=CloudRendering, gpu_device=0)
cv2.imwrite("/path/to/debug_output_image.png", c.last_event.frame[:,:,::-1])
c.stop()

?

from allenact.

YYDS-cc avatar YYDS-cc commented on June 21, 2024

The new code install the thor-CloudRendering platform and come a new issue, i meet the issue before when i run the PointNav task with command PYTHONPATH=. python allenact/main.py training_a_pointnav_model -o storage/robothor-pointnav-rgb-resnet-resnet -b projects/tutorials .

issue: RuntimeError: vulkaninfo failed to run, please ask your administrator to install vulkaninfo (e.g. on Ubuntu systems this requires running sudo apt install vulkan-tools).

But when i run the command sudo apt install vulkan-tools,
the server can't locate the package vulkan-tools
After using the sudo apt-get update, it still doesn't work.

I installed the same environment on my PC according to the tutorial (ubuntu18.04), both PointNav Task and ObjectNav Task have no problem.

from allenact.

jordis-ai2 avatar jordis-ai2 commented on June 21, 2024

https://packages.ubuntu.com/search?keywords=vulkan-tools has a list of packages for different Ubuntu versions. It's possible that third parties provide vulkan-tools for other/older versions.

It sounds like this is out-of-scope for AllenAct, so I'm closing the issue.

from allenact.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.