Comments (10)
Additionally, when I run
python main.py object_nav_ithor_ppo_one_object -b projects/tutorials -s 12345
the monitor goes black momentarily, I know this is to open the search window, but after the monitor is back up, the terminal's info is no longer updated.
I have also run
sudo python scripts/startx.py &
but it doesn't do anything.
from allenact.
Hi @YDDS-cc,
Given your setup, I think it would be worth it to try using THOR in headless mode. For that, you need to pass a gpu_device
instead of an x_display
(using the CloudRendering
platform). You can see an example here:
Let us know if this unblocked you!
from allenact.
Hi @jordis-ai2 ,
i try to change the headless to True, it doesn't working.
And i also try to comment out these code, It's still not working.
Did I change the code in the wrong place?
from allenact.
I think it I need to see the output you get when using headless mode. Can you copy it here?
from allenact.
[09/01 17:24:13 INFO:] Running with args Namespace(approx_ckpt_step_interval=None, ... ,[main.py: 452]
[09/01 17:24:18 INFO:] Git diff saved to experiment_output/used_configs/ObjectNavThorPPO/2023-09-01_17-24-15 [runner.py: 890]
[09/01 17:24:18 INFO:] Config files saved to experiment_output/used_configs/ObjectNavThorPPO/2023-09-01_17-24-15 [runner.py: 935]
[09/01 17:24:18 INFO:] Using 1 train workers on devices (device(type='cuda', index=0),) [runner.py: 317]
[09/01 17:24:19 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:19 INFO:] Using local worker ids [0] (total 1 workers in machine 0) [runner.py: 326]
[09/01 17:24:19 INFO:] Started 1 train processes [runner.py: 595]
[09/01 17:24:19 INFO:] Using 1 valid workers on devices (device(type='cuda', index=1),) [runner.py: 317]
[09/01 17:24:19 INFO:] Started 1 valid processes [runner.py: 622]
[09/01 17:24:21 INFO:] valid 0 args [...][runner.py: 433]
[09/01 17:24:21 INFO:] train 0 args [...] [runner.py: 416]
[09/01 17:24:22 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:22 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:29 INFO:] Starting 0-th VectorSampledTask worker with args [...]
[09/01 17:24:31 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args [...]
[09/01 17:24:31 INFO:] Starting 1-th VectorSampledTask worker with args [...]
[09/01 17:24:33 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args [...]
[09/01 17:29:33 ERROR:] [train worker 0 ] Encountered TimeoutError , exiting. [engine.py: 1858]
File "/allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", Line 272,in read_with_timeout
raise TimeError(
TimeouError: Did not receive output from 'VectorSampledTask' worker for 300 seconds.
[engine.py: 1861]
[09/01 17:29:34 ERROR:] Encountered Exception. Terminating runner. [runner.py: 1467]
[09/01 17:29:34 ERROR:] Traceback (most recent call last):
File "/allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[runner.py: 1468]
Traceback (most recent call last):
File "/allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[09/01 17:29:34 INFO:] Terminating train 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Terminating valid 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Termination signal sent to worker Train-0. Worker Train-0 is already closed, exiting. [runner.py: 348]
[09/01 17:29:34 INFO:] Joining train 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Termination signal sent to worker Valid-0. Forcing worker Valid-0 to close and exiting. [runner.py: 353]
[09/01 17:29:35 INFO:] Closed train 0 [runner.py: 1543]
[09/01 17:29:35 INFO:] Joining valid 0 [runner.py: 1543]
[09/01 17:29:35 INFO:] Closed valid 0 [runner.py: 1543]
from allenact.
If you do export ALLENACT_DEBUG_VST_TIMEOUT=1000
before calling the command you are currently using to start your experiment, does it also fail (just after a longer period of waiting)?
from allenact.
Changing the waiting time doesn't work.
Actually, export ALLENACT_DEBUG_VST_TIMEOUT=1000
can't change the waiting time, it is still 300 seconds.
So I made the change in
and I still get the same error, only the waiting time has changed.
from allenact.
I assume at this point you must have already tried starting a standalone THOR controller to ensure everything is correctly installed, but just in case you haven't, can you try to run a script like:
from ai2thor.platform import CloudRendering
from ai2thor.controller import Controller
import cv2
c = Controller(platform=CloudRendering, gpu_device=0)
cv2.imwrite("/path/to/debug_output_image.png", c.last_event.frame[:,:,::-1])
c.stop()
?
from allenact.
The new code install the thor-CloudRendering platform and come a new issue, i meet the issue before when i run the PointNav task with command PYTHONPATH=. python allenact/main.py training_a_pointnav_model -o storage/robothor-pointnav-rgb-resnet-resnet -b projects/tutorials
.
issue: RuntimeError: vulkaninfo failed to run, please ask your administrator to install vulkaninfo
(e.g. on Ubuntu systems this requires running sudo apt install vulkan-tools
).
But when i run the command sudo apt install vulkan-tools
,
the server can't locate the package vulkan-tools
After using the sudo apt-get update,
it still doesn't work.
I installed the same environment on my PC according to the tutorial (ubuntu18.04), both PointNav Task and ObjectNav Task have no problem.
from allenact.
https://packages.ubuntu.com/search?keywords=vulkan-tools has a list of packages for different Ubuntu versions. It's possible that third parties provide vulkan-tools
for other/older versions.
It sounds like this is out-of-scope for AllenAct, so I'm closing the issue.
from allenact.
Related Issues (20)
- When I ran the Room Rearrangement task experiment, Eoferror appeared
- When I ran the Room Rearrangement task experiment, EOFerror appeared HOT 2
- Visualization part of the tutorial 'Using a pre-trained model' has error HOT 4
- Fix for vulnerable packages in the pip dependencies of this project HOT 1
- Cannot run "gym_mujoco_tutorial" HOT 1
- Visualize a policy from a checkpoint HOT 3
- Code is getting stuck after few iterations HOT 10
- Some depth values are not within the 0-5 range for RoboTHOR's DepthSensorThor HOT 2
- procthor HOT 4
- How to install and run habitat plugin HOT 1
- How to create custom dataset for object navigation and rearrangement? HOT 1
- Unable to access testing scenes in RoboTHOR HOT 7
- TimeOut error when attempting to run pre-trained RoboThor model checkpoint HOT 11
- How to create custom objectnav dataset HOT 1
- AttributeError: 'RolloutBlockStorage' object has no attribute 'memory' HOT 1
- pdb support HOT 1
- The URL link to download the dataset is invalid HOT 2
- Cannot experiment because my internet is slow (timeout) HOT 4
- Cannot reproduce the performance of EmbCLIP on the ObjectNav task of the RoboTHOR HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from allenact.