When I run command PYTHONPATH=. python allenact/main.py training_a_

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Only one valid platform is required to run AI2-THOR about allenact HOT 10 CLOSED

YYDS-cc commented on June 21, 2024

Only one valid platform is required to run AI2-THOR

from allenact.

Comments (10)

YYDS-cc commented on June 21, 2024

Additionally, when I run
python main.py object_nav_ithor_ppo_one_object -b projects/tutorials -s 12345
the monitor goes black momentarily, I know this is to open the search window, but after the monitor is back up, the terminal's info is no longer updated.
I have also run
sudo python scripts/startx.py &
but it doesn't do anything.

from allenact.

jordis-ai2 commented on June 21, 2024

Hi @YDDS-cc,

Given your setup, I think it would be worth it to try using THOR in headless mode. For that, you need to pass a gpu_device instead of an x_display (using the CloudRendering platform). You can see an example here:

allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py

Line 255 in 9772eee

device_dict = dict(

Let us know if this unblocked you!

from allenact.

YYDS-cc commented on June 21, 2024

Hi @jordis-ai2 ,
i try to change the headless to True, it doesn't working.

allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py

Line 75 in 9772eee

headless: bool = False,

And i also try to comment out these code, It's still not working.

allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py

Line 236 in 9772eee

if not self.headless:

Did I change the code in the wrong place?

from allenact.

jordis-ai2 commented on June 21, 2024

I think it I need to see the output you get when using headless mode. Can you copy it here?

from allenact.

YYDS-cc commented on June 21, 2024

[09/01 17:24:13 INFO:] Running with args Namespace(approx_ckpt_step_interval=None, ... ,[main.py: 452]
[09/01 17:24:18 INFO:] Git diff saved to experiment_output/used_configs/ObjectNavThorPPO/2023-09-01_17-24-15 [runner.py: 890]
[09/01 17:24:18 INFO:] Config files saved to experiment_output/used_configs/ObjectNavThorPPO/2023-09-01_17-24-15 [runner.py: 935]
[09/01 17:24:18 INFO:] Using 1 train workers on devices (device(type='cuda', index=0),) [runner.py: 317]
[09/01 17:24:19 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:19 INFO:] Using local worker ids [0] (total 1 workers in machine 0) [runner.py: 326]
[09/01 17:24:19 INFO:] Started 1 train processes [runner.py: 595]
[09/01 17:24:19 INFO:] Using 1 valid workers on devices (device(type='cuda', index=1),) [runner.py: 317]
[09/01 17:24:19 INFO:] Started 1 valid processes [runner.py: 622]
[09/01 17:24:21 INFO:] valid 0 args [...][runner.py: 433]
[09/01 17:24:21 INFO:] train 0 args [...] [runner.py: 416]
[09/01 17:24:22 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:22 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:29 INFO:] Starting 0-th VectorSampledTask worker with args [...]
[09/01 17:24:31 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args [...]
[09/01 17:24:31 INFO:] Starting 1-th VectorSampledTask worker with args [...]
[09/01 17:24:33 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args [...]
[09/01 17:29:33 ERROR:] [train worker 0 ] Encountered TimeoutError , exiting. [engine.py: 1858]
File "/allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", Line 272,in read_with_timeout
raise TimeError(
TimeouError: Did not receive output from 'VectorSampledTask' worker for 300 seconds.
[engine.py: 1861]
[09/01 17:29:34 ERROR:] Encountered Exception. Terminating runner. [runner.py: 1467]
[09/01 17:29:34 ERROR:] Traceback (most recent call last):
File "/allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[runner.py: 1468]
Traceback (most recent call last):
File "/allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[09/01 17:29:34 INFO:] Terminating train 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Terminating valid 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Termination signal sent to worker Train-0. Worker Train-0 is already closed, exiting. [runner.py: 348]
[09/01 17:29:34 INFO:] Joining train 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Termination signal sent to worker Valid-0. Forcing worker Valid-0 to close and exiting. [runner.py: 353]
[09/01 17:29:35 INFO:] Closed train 0 [runner.py: 1543]
[09/01 17:29:35 INFO:] Joining valid 0 [runner.py: 1543]
[09/01 17:29:35 INFO:] Closed valid 0 [runner.py: 1543]

from allenact.

jordis-ai2 commented on June 21, 2024

If you do export ALLENACT_DEBUG_VST_TIMEOUT=1000 before calling the command you are currently using to start your experiment, does it also fail (just after a longer period of waiting)?

from allenact.

YYDS-cc commented on June 21, 2024

Changing the waiting time doesn't work.
Actually, export ALLENACT_DEBUG_VST_TIMEOUT=1000 can't change the waiting time, it is still 300 seconds.
So I made the change in

allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py

Line 237 in 9772eee

    
           for space in read_fn(timeout_to_use=5 * self.read_timeout if self.read_timeout is not None else None)  # type: ignore

and I still get the same error, only the waiting time has changed.

from allenact.

jordis-ai2 commented on June 21, 2024

I assume at this point you must have already tried starting a standalone THOR controller to ensure everything is correctly installed, but just in case you haven't, can you try to run a script like:

from ai2thor.platform import CloudRendering
from ai2thor.controller import Controller
import cv2

c = Controller(platform=CloudRendering, gpu_device=0)
cv2.imwrite("/path/to/debug_output_image.png", c.last_event.frame[:,:,::-1])
c.stop()

from allenact.

YYDS-cc commented on June 21, 2024

The new code install the thor-CloudRendering platform and come a new issue, i meet the issue before when i run the PointNav task with command PYTHONPATH=. python allenact/main.py training_a_pointnav_model -o storage/robothor-pointnav-rgb-resnet-resnet -b projects/tutorials .

issue: RuntimeError: vulkaninfo failed to run, please ask your administrator to install vulkaninfo (e.g. on Ubuntu systems this requires running sudo apt install vulkan-tools).

But when i run the command sudo apt install vulkan-tools,
the server can't locate the package vulkan-tools
After using the sudo apt-get update, it still doesn't work.

I installed the same environment on my PC according to the tutorial (ubuntu18.04), both PointNav Task and ObjectNav Task have no problem.

from allenact.

jordis-ai2 commented on June 21, 2024

https://packages.ubuntu.com/search?keywords=vulkan-tools has a list of packages for different Ubuntu versions. It's possible that third parties provide vulkan-tools for other/older versions.

It sounds like this is out-of-scope for AllenAct, so I'm closing the issue.

from allenact.

Only one valid platform is required to run AI2-THOR about allenact HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent