markfzp / act-plus-plus Goto Github PK

View Code? Open in Web Editor NEW

2.6K 2.6K 473.0 619 KB

Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN

Home Page: https://mobile-aloha.github.io/

License: MIT License

Python 100.00%

act-plus-plus's People

Contributors

Stargazers

Watchers

Forkers

sankeerthrao ramkumarkoppu chriscarrollsmith saquib-mehmood syguan96 iseailab tomchapin xgithubzero jeffreyma597 chunde henrytien rkp64 mivanovitch trigrass2 shunsunsun xiaoxiaohua1 isuyu oztc xiaojake mistxzh wpq3142 ajawebx richesthumanalive wumingmin qidu gaobaiming apescale-figmar elephantclock cyb3ryoga jeorgexyz suryatmodulus qli007 jpratt3000 stanleyshek lafroujianas yangyangfu auxon cheerybear evison farmer1979 edustack hungaier wardhunt t-bagwell sharingos-team pengg alexandor91 yamashirohermit peeped jeffor evelynmitchell conquereryang sarkarda petercao giantliu22 pleasewhy jn7163 ibm-t40 kassasin gullyb jacknion hanbohamburger jonathanrandall shunte88 shaouxyz 0x68616f79756e qqh263 phoenix-ra charwavid xichuan-github idreamsoft lqiang2003cn lennonreid arctanbell hydra1983 acproject qiyd81 assassindesign faquanzhang pollen-robotics ismailozenc guanjiawen008 luomor-ai tenorthisland wifior devdoshi chiefstone mufengjun260 mando1106 contropist dede-game hubch tianshenglongdi cent-usa2025 zevarela lukalearnscode a4435351 jmwdpk exarchias dream-runner-yu

act-plus-plus's Issues

Advice for running on other robot ?

Hi @MarkFzp @tonyzhaozh,

I managed to run yout code and train a policy from sim_transfer_cube_scripted (It is not really working, but it just probably need some more training steps)

I would like to run mobile-aloha on a different mobile bimanual robot (Reachy https://www.pollen-robotics.com/)

Could you give me an overview of the required steps ?

Do I need to describe the new robot in the mujoco format ? I am not sure I understand properly what is the role of the simulated environment in your training pipeline. Reading the paper, I did not really understant how you would specify a task like "Wipe wine" in the simulated environment.

Anyways, I guess I need to record episodes of a teleoperated task, and format it like in https://github.com/MarkFzp/mobile-aloha/blob/main/aloha_scripts/record_episodes.py ?

Thank you very much for your help !

Antoine

detr forward

parameter of forward is not match, as fig

about the 'vq' parameter function ?

Hi,　author ，　thanks for your share the excellent work , and I don't know about the parameter of the ' vq' argument function. can you explain it's function.

Simulation issues

The data generated by "record_sim_episodes. py" mostly fails. In the project, there is "sim_env. py" that can enable the simulation environment. However, even after configuring Interbotix, the following errors still exist. How can I build data through the simulation environment and test the program without hardware

Python sim_ Env.py
Timeout exceeded while waiting for service/master_ Left/set_ Operating_ Modes
The robot 'master_ Left 'is not discoverable Did you enter the correct robot_ Name parameter? Is the xs_ SDK node running? Quitting

Question about data augmentation and action coordinates

Hi, After reading your paper and test the code on both real dataset and sim dataset. I'm eager to the following questions:

1 In the paper, u mentioned that u used image augmentation. I'm wondering is there any correlation between pixel and actions, or did actions get augmented too? Is there any code for reference?

2 For co-training, did static aloha had zero-pad on tracer actions or the padding processing is in the imitate_episodes.py ?

3 In ALOHA, the dimension of actions is 14(6+1+6+1), and the actions are joint positon, therefore, the actions are not related to external coordinates. But mobile-aloha has 2 more base actions, so what coordinates are used during the movement process?

I'll be appreciated for your answer, huge thanks~~

Issue about training the model

Hello, folks,

I met an issue while training the model.

It seems caused by the permission to the project.

(aloha) jack@jack:~/repos/act-plus-plus$ python3 imitate_episodes.py --task_name sim_transfer_cube_scripted --ckpt_dir ./ --policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200  --lr 1e-5 --seed 0 --num_steps 10
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 1
wandb: You chose 'Create a W&B account'
wandb: Create an account here: https://wandb.ai/authorize?signup=true
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
wandb: Appending key for api.wandb.ai to your netrc file: /home/jack/.netrc
wandb: ERROR Error while calling W&B API: project not found (<Response [404]>)
Problem at: imitate_episodes.py 148 main
wandb: ERROR It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)
Traceback (most recent call last):
  File "imitate_episodes.py", line 666, in <module>
    main(vars(parser.parse_args()))
  File "imitate_episodes.py", line 148, in main
    wandb.init(project="mobile-aloha2", reinit=True, entity="mobile-aloha2", name=expr_name)
  File "/home/jack/anaconda3/envs/aloha/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1195, in init
    raise e
  File "/home/jack/anaconda3/envs/aloha/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1176, in init
    run = wi.init()
  File "/home/jack/anaconda3/envs/aloha/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 785, in init
    raise error
wandb.errors.CommError: It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)

Performance in simulation environment is not very good

Follow the environment configuration steps provided by your code repository to complete 50 simulations of the two tasks. The success rate of the first task is 10/50, and the success rate of the second task is 1/50. There is also a mirror task in the code. The success rate is 15/50.

But for the task of passing the wooden block, does it count as a success if it touches two robotic arms? Because I visualized and videoed each data, none of the tasks of passing the wooden blocks were truly successful.

I want to know if this is the case for everyone? GPU computing is not used in my environment.

Missing packages to run?

Followed the installation procedure to install listed pip packages in the condo environment but still get missing packages errors when I run python3 imitate_episodes.py

(aloha) Rams-MBP:act-plus-plus ramkumarkoppu$ python3 imitate_episodes.py
ROBOMIMIC WARNING(
    No private macro file found!
    It is recommended to use a private macro file
    To setup, run: python /Users/ramkumarkoppu/miniforge3/envs/aloha/lib/python3.8/site-packages/robomimic/scripts/setup_macros.py
)
Traceback (most recent call last):
  File "imitate_episodes.py", line 20, in <module>
    from policy import ACTPolicy, CNNMLPPolicy, DiffusionPolicy
  File "/Users/ramkumarkoppu/GIT/GitHub/Robotic_projs/act-plus-plus/policy.py", line 12, in <module>
    from robomimic.algo.diffusion_policy import replace_bn_with_gn, ConditionalUnet1D
ModuleNotFoundError: No module named 'robomimic.algo.diffusion_policy'
(aloha) Rams-MBP:act-plus-plus ramkumarkoppu$

why use the output of the first decoder layers in ACT model？

hs = self.transformer(src, None, self.query_embed.weight, pos, latent_input, proprio_input, self.additional_pos_embed.weight)[0], In the ACT model， should this index be -1？

Diffusion Policy parameters

Hi authors,

I used your 50 demo episodes to train ACT and it worked very well, achieving success rate up to 90% on the cube-transferring task. However, after I changed the algorithm to Diffusion Policy, it turned out that diffusion policy had very low success. I tried multiple hyperparameter settings in your commands.txt but they all couldn't work. The results are below:

(The green is ACT, while others are Diffusion Policy with different hyperparam sets)

So I wonder why that happens and could you share the best-working diffusion policy parameters? Thank you very much!

Error loading episode_XX.hdf5 in getitem

Hi!

After performing some changes described here : #2 (comment)

When running the imitate_episodes.py script, I get the following error:

I displayed the exception text in the code and got :
"Unable to synchronously open object (object 'left_wrist' doesn't exist)"

Any idea of what is going on ?

Thanks !

Load provided demo data

Hi after realizing that data generated by script is mostly fail #12 , i use the demo data provided by the author. However I ran into data loading error when running the 'sim_transfer_cube_scripted' task. But you can remove 'left_wrist' and 'right_wrist' at https://github.com/MarkFzp/act-plus-plus/blob/main/constants.py#L11 and data can load successfully. Hope that helps :)

How to ensure that the episode_len value remains unchanged when collecting data from multiple episodes of a task?

For example, in task aloha_mobile_wash_pan: in 50 episodes, episode_len=1099. How to ensure that the length of the episodes remains unchanged? Are there some requirements for the teacher's operation, or are the collection results truncated? Hope to receive your answer, thx.

Error: Visualize_episodes

Hello.

Following the readme.txt instructions, I tried running visualize the episode:
python visualize_episodes.py --dataset_dir dataset/aloha_mobile_wipe_wine --episode_idx 10
data save dir: download "aloha_mobile_wipe_wine" from google drive. (episode_0.hdf5 - to episode_49.hdf5)
result: numpy.exceptions.AxisError:
all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension
numpy.exceptions.AxisError: axis 2 is out of bounds for array of dimension 2

Please support. Thanks a lot!

TypeError: forward() get an unexpected keyword argument 'pos'

managed to load data but has error as the title

About action space

Thanks for your great job !
I have a question to ask, most datasets for robot manipulation tasks use the end effector's position as the action space, but both Mobile ALOHA and its predecessor's datasets use the joint positions as the action space. Have there been any attempts to use the end effector's position as the action space? If so, what were the results?

Whats's the reason to have two repos for the same project?

Hi,

Whats's the reason to have two repos for the same project? :

https://github.com/MarkFzp/act-plus-plus
and
https://github.com/tonyzhaozh/act

Hello, where can I buy that thirty thousand dollar robot?

I am a graduate student at a university in China. Our laboratory would like to know how to purchase this robot.

TypeError: forward() got an unexpected keyword argument 'src_key_padding_mask'

param error use imitate_episodes.py to train model.

TypeError: forward() got an unexpected keyword argument 'src_key_padding_mask'
TypeError: forward() got an unexpected keyword argument 'pos'

at detr_vae.py line 116:          encoder_output = self.encoder(encoder_input, pos=pos_embed, src_key_padding_mask=is_pad)

where transformer.py forward function param is     
def forward(self, src, mask, query_embed, pos_embed, latent_input=None, proprio_input=None, additional_pos_embed=None)

Are there any code updated haven't pushed?

RuntimeError: torch.cat(): expected a non-empty list of Tensors

I met an issue while training the model.

I have already run the episodes in simulation

but
Found 0 hdf5 files

Data from: ['~/aloha/act-plus-plus-main/data/sim_transfer_cube_scripted']

Train on [0] episodes
Test on [0] episodes

Traceback (most recent call last):
File "imitate_episodes.py", line 667, in
main(vars(parser.parse_args()))
File "imitate_episodes.py", line 166, in main
train_dataloader, val_dataloader, stats, _ = load_data(dataset_dir, name_filter, camera_names, batch_size_train, batch_size_val, args['chunk_size'], args['skip_mirrored_data'], config['load_pretrain'], policy_class, stats_dir_l=stats_dir, sample_weights=sample_weights, train_ratio=train_ratio)
File "/home/lqx/aloha/act-plus-plus-main/utils.py", line 245, in load_data
_, all_episode_len = get_norm_stats(dataset_path_list)
File "/home/lqx/aloha/act-plus-plus-main/utils.py", line 174, in get_norm_stats
all_qpos_data = torch.cat(all_qpos_data, dim=0)
RuntimeError: torch.cat(): expected a non-empty list of Tensors

Not very good performance

Hi I followed the instructions and trained the act policy for relatively long time (around 6 hours and 50k steps) and the success rate does not seem to be great. I have checked the tuning tips and plan to train for longer. However just want to know what can make the performance better? One issue i discover is that the scripted demo are mostly unsuccessful (just 30% success) and that is one potential problem, but are there other factors?
I post the wandb plot below

about obstacle avoidance

my question is when the robot moving, how the base move and avoid obstacles?

Question regarding training setup

Hello,

Congrats on the nice work. I had a few questions:

As far I understand, ACT for Aloha was trained for each task separately, but I am unsure about the same for the ACT model in mobile Aloha?
If a single model is trained for all tasks in mobile Aloha, how does the model now which task to do? Is it just from the starting condition?
Also, within the same task, are all the subtasks trained together. If so, I was a bit confused how would each subtask accuracy be calculated seperately as done in the paper. As I believe the model takes in continuous stream of input and it would be hard to stop in the middle and start. Maybe I am missing something.

Again congrats on the paper and looking forward to your response.

Best,
Ankit

version for Windows

For anyone who is interested ... here is a version of the project that can be installed and ran on Windows. It was tested on Windows 11 but I strongly assume that it will also work on other versions of Windows as well as other OS. Besides issues related to Windows this version also includes fixes in certain files (e.g. in detr_vae.py) that are not related to OS. Enjoy!

https://bitbucket.org/yitzhaksp/act-plus-plus_ys/src/main/

ERROR Error while calling W&B API: project not found (<Response [404]>)

wandb: W&B API key is configured. Use wandb login --relogin to force relogin
wandb: ERROR Error while calling W&B API: project not found (<Response [404]>)
Problem at: imitate_episodes.py 148 main
wandb: ERROR It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)
Traceback (most recent call last):
File "imitate_episodes.py", line 666, in
main(vars(parser.parse_args()))
File "imitate_episodes.py", line 148, in main
wandb.init(project="mobile-aloha2", reinit=True, entity="mobile-aloha2", name=expr_name)
File "/home/hf/anaconda3/envs/aloha/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1189, in init
raise e
File "/home/hf/anaconda3/envs/aloha/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1170, in init
run = wi.init()
File "/home/hf/anaconda3/envs/aloha/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 781, in init
raise error
wandb.errors.CommError: It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)

Hi! The above is an error message. How can I obtain permission for mobile-aloha2 and solve this problem

how to inference？

Hi, thank you for great work.

I get a issue. How to real-time inference?

or infer a trained scene， can infer a new scene？

From where to get checkpoints please?

Hi,

From where to get checkpoints please? --ckpt_dir <ckpt dir>

Simulated scripted data generation

Hi, thanks for your brilliant work and open-source code!

I tried your code to generate simulated scripted data and train ACT on the generated data. However, it turns out that the performance is low. I visualized and looked through the generated data and found that most of them were unsuccessful, which can explain why the trained ACT's success rate is low since the teacher is wrong. (which I hope will explain #8 and #11).

Moreover, after I filtered out the failed trajectories in record_sim_episodes.py, it seems that many 'successful' ones aren't truly successful (the red block falls off).

Therefore I'd like to ask that why the simulation (scripted policy) cannot generate successful trajectories very well? Is there any method to solve it?

eval bad

When I had the following warning in training, and finally eval through imitate_episodes is also bad, what is the reason
Warning: step duration: 0.091 s at step 211 longer than DT: 0.02 s, culmulated delay: 17.275 s

co_training data

I have a question about co-training
looking at the constants.py, you can see the following dataset_dir aloha_compressed_dataset
is the aloha_compressed_dataset the same as the aloha_static_contraining _dataset for co_trainning?

it' too slow to train whole dataset(aloha_mobile_elevator_truncated+aloha_static_contraining _dataset)
how is it a good idea to organize the dataset?

"aloha_mobile_elevator_2 and aloha_mobile_elevator_button" there is no data. can you share the missing data?

'aloha_mobile_elevator_truncated_cotrain':{
    'dataset_dir': [
        DATA_DIR + '/aloha_mobile_elevator_truncated',
        DATA_DIR + '/aloha_mobile_elevator_2',
        DATA_DIR + '/aloha_mobile_elevator_button',
        DATA_DIR + '/aloha_compressed_dataset',
    ], # only the first dataset_dir is used for val
    'stats_dir': [
        DATA_DIR + '/aloha_mobile_elevator_truncated',
        DATA_DIR + '/aloha_mobile_elevator_2',
    ],
    'sample_weights': [3, 3, 2, 1],
    'train_ratio': 0.99, # ratio of train data from the first dataset_dir
    'episode_len': 2250,
    'camera_names': ['cam_high', 'cam_left_wrist', 'cam_right_wrist']
},

Documentation of the provided datasets

Hi everyone, is there a possibility to have some kind of documentation of the datasets used (https://drive.google.com/drive/folders/1FP5eakcxQrsHyiWBRDsMRvUfSxeykiDc) to fully understand their structures? In case there already is, do you know where I can find it? Thank you in advance for your response.

Is this code modification correct?

in detr_vae.py, the 285 line, change encoder = build_transformer(args) to encoder = build_encoder(args).

about the significance of data?

thanks for your great job~

I didn't understand the specific meaning of the data stored in the action. Are the first six data points the pose of the end effector of the manipulator?

the difference between sim_human and sim_scripted

This project is really great. I would like to know the difference between sim_human and sim_scripted. What is the maximum number of simulated data sets we can generate for training?

Paper explain

Thank you for sharing such a great project.
Refer to the paper "https://mobile-aloha.github.io/resources/mobile-aloha.pdf" I have some questions, hoping for your help:

What is the definition of "Co-train"? Source corresponds to "co-train"
"VINN + Chunking" is "CNNMLP" in the source code, right?

Thanks a lot.

Use the Real Environment dataset to train the model, Program running Error

Hi, thanks for your brilliant work and open-source code!
I tried your dataset files to generate mp4 file, or to train the model.
[https://drive.google.com/drive/folders/1vhKa4tNPfMkeK7SW86vKorb39epTYW39] —— aloha_mobile_chair_truncated
program running error, error code is :
Traceback (most recent call last):
File "visualize_episodes.py", line 160, in
main(vars(parser.parse_args()))
File "visualize_episodes.py", line 44, in main
save_videos(image_dict , DT, video_path=os.path.join(dataset_dir, dataset_name + '_video.mp4'))
File "visualize_episodes.py", line 78, in save_videos
all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension
File "<array_function internals>", line 200, in concatenate
numpy.AxisError: axis 2 is out of bounds for array of dimension 2
The real environment file data format is not same as simulation generated data files especially the camera data.
How to solve this problem?

How to replace the robotic arm in the simulation environment

Hello, what do I need to do to replace the robotic arm in the simulation environment with our robotic arm? What files do I need to change

Train ACT with aloha_mobile_wipe_wine dataset. Error: No module named 'aloha_scripts'

can not find this file: aloha_scripts.constants

Thank you

There is only one set of robotic arms. Can one be used as the master and the other as the slave for training and operation?

AttributeError: 'CNNMLP' object has no attribute 'action_dim'

When I choose the policy class == CNNMLP， it occur the error. how to solve it.

Missing .gitmodules

There is a submodule byol_pytorch but no accompanying .gitmodules.

about dert code problem(transformer part)

Hello, in your detr code, you use transformer get the output is [bs, hidden_dim, feature_dim], the code is

self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]

the transformer code is

hs = self.decoder(tgt, memory, memory_key_padding_mask=mask, pos=pos_embed, query_pos=query_embed)
hs = hs.transpose(1, 2)
return hs

Based on my understanding, in your code, you only choose the first decoder layer output as the feature to predict the action. However, i see the original detr code the transformer output is:

hs = self.decoder(tgt, memory, memory_key_padding_mask=mask, pos=pos_embed, query_pos=query_embed)
return hs.transpose(1, 2), memory.permute(1, 2, 0).view(bs, c, h, w)

The original detr code use the same feature processing code

hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]
outputs_class = self.class_embed(hs)

I would like to ask why only the first-layer output is chosen as the feature. Would selecting the seventh layer be a better choice? Thank you！！！

eval success rate 0.5

task is sim_transfer_cube_scripted, 50 episodes, trainning parameters:
python3 imitate_episodes.py --task_name sim_transfer_cube_scripted \ --ckpt_dir ./models --policy_class ACT --kl_weight 10 --chunk_size 100 \ --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 --num_steps 2000 \ --lr 1e-5 --seed 0 --eval
Is the eval result normal?

Error loading /home/robot/refer/act-plus-plus/data/sim_transfer_cube_scripted/episode_10.hdf5 in getitem

Error loading /home/robot/refer/act-plus-plus/data/sim_transfer_cube_scripted/episode_10.hdf5 in getitem