I find the reward function to be the most important part of RLHF, because it is the pa

Unified reward function/model architecture for a wide range of tasks about palm-rlhf-pytorch HOT 2 OPEN

lucidrains commented on July 29, 2024

Unified reward function/model architecture for a wide range of tasks

from palm-rlhf-pytorch.

Comments (2)

James4Ever0 commented on July 29, 2024

RLHF requires creating multiple models like SFT, RM, PPO-tuned model. Is it possible to improve storage and memory efficiency, reduce computation if we freeze some huge layers of the pretrained model, only fine-tune certain layers to create SFT, RM, PPO using OpenDelta or other libraries/methods? I read that your repo is using LoRA but I'm not sure if it fulfills all goals described above. Common implementations like minRLHF requires four separate models, three are derived from the pretrained model as actor, critic and reference, in addition to an external sentiment rating model.

from palm-rlhf-pytorch.

James4Ever0 commented on July 29, 2024

To address this proposal even further, I think a good reward function can self-evolve and adapt to new environments (when the data source is no longer fixed static "archives" but streaming), making this model communicative, multipurpose, realtime and even into AGI. A good reward function can let the agent to learn from almost anything, including human feedback, computer system (sensor data, terminal/GUI input/output, internet, program threads and more) and self-invented signals. WebGPT is a clear example to make GPT3 into an active agent. There will be more to come.

from palm-rlhf-pytorch.

Recommend Projects

Unified reward function/model architecture for a wide range of tasks about palm-rlhf-pytorch HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent