carperai / instructgpt Goto Github PK

View Code? Open in Web Editor NEW

71.0 9.0 3.0 6 KB

For experiments involving instruct gpt. Currently used for documenting open research questions.

License: MIT License

instructgpt's Introduction

BigModelName

This repository is for open-questions relating to RLHF and InstructGPT as pertaining to BigModelName.

Open Questions

What is the preference rate of PPO vs PPO-Ptx? Why was 27.8 chosen as the mixing factor between the pre-training gradients and the PPO gradients?
What do the gradient norms and gradient noise scales look like for PPO grads vs pre-training grads?
How important is SFT pretraining on human-written completions?

instructgpt's People

Contributors

Stargazers

Watchers

Forkers

cat-state jon-tow stjordanis

instructgpt's Issues

[task] Programming Socratic Questions

🚀 The Task

Given a programming problem description, an optional set of unit tests, and a buggy codebase, the task is to write dialogue utterances for both the mentor and the programmer, where the mentor's utterances contain hints in the form of a Socratic question to guide a codebase writer to fix an issue or get unstuck.

Example

<instruction_text> You are a mentor. You aim to guide the programer so that they learn. Guide the programmer to complete their programming task below using Socratic questioning in a conversation. Avoid giving away the answer or solution directly.
<problem_desc> Python program to count the number of words, where it is assumed that all words are separated by spaces.
<code_state>

def  count_words(sentence) :
    words = 0
    counter  = 0
    while (counter < len(sentence)) :
           if (sentence[counter] == ' ') :
                   words += 1
                  counter  += 1
     return words

Mentor: Hello! I noticed that you’ve been continuously submitting your code but failing the same test cases. Do you need help?
Programmer: Hello! Yes, I am having trouble... My word count seems to be off.
Mentor: Let's think through this together! It seems like you are counting spaces. Let's consider the string 'hi there' how many spaces are in that? [Socratic Question]
Programmer: there is only one space
Mentor: Correct, so is there a space for every word in the sentence? [Socratic Question]
Programmer: No, there is not! Hmm...
Mentor: So, 'hi there' has 1 space but two words. 'I love geese' has 2 spaces but 3 words. What's the relationship between the number of words and the number of spaces?
Programmer: The number of words is always greater than the number of spaces by 1. I think I get it, I'll put a +1 to return words.

Additional Notes

Relates to the pair programming project.

[task] Personalized Dialog

🚀 The Task

Given a "role" tag, the model should be able to adapt its "personality" to user liking in order to promote more productive pair programming interactions.

Concerns: This could lead to malicious use of the model.

Example

Role: You are a comical but highly proficient software engineer.

Can you help me with the following issues?

Additional Notes

Reference: personaGPT and character.ai
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Code Explanation

🚀 The Task

Given a snippet of code, can the model explain the functionality? The model could then follow up with its own code documentation; docstrings, comments, etc. if possible.

Example

Explain the following code snippet:

def magical_fn(n):
        if n == 0:
            return True
        else:
            if (n-1) == 0:
                return False
            else:
                return magical_fn((n-1)-1)

Additional Notes

Reference: Replit + Codex: Beta Release
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Conversation Summarization & QA

🚀 The Task

Given a chat transcript, the model should be able to summarize it or answer questions about it.

Example

Summary:

Summarize the following chat log:

<bob>: How is your day going?
<alice>: Great! I purchased a goose just now.

Summary:

Question-Answering:

Given the following chat log, answer the provided question:

<bob>: How is your day going?
<alice>: Great! I purchased a goose just now.

Question: What did <alice> purchase?
Answer:

Additional Notes

Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Code Tracing

🚀 The Task

Given a code snippet, the model should be able to trace through the computation to compute the final result or return value. This task ensures the model can semantically execute code.

Example

If I call the following snippet with the value, `x`, what will it return?
  <code snippet>
Return:

Additional Notes

NOTE: This may be very difficult to do without CoT.

Creator: Axel Marmet
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Self-Correction

🚀 The Task

When the model summarizes a passage of text or code, it may fail to account for important information that only the user knows. The user should be able to point out what the model is missing, and the model should correct its own summary.

Example

No response

Additional Notes

Creator: @ericyu3
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Code Coverage Resolution

🚀 The Task

Given a code coverage report (e.g. from codecov) generate tests to build out incomplete and partially covered regions of code.

Example

Write unit tests for regions of code that are partially covered according to the following coverage report:
    <codecov report>

Additional Notes

Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Creative Writing Distillation

🚀 The Task

Given a piece of complex creative writing, the model should be able to output a simplified form for which a user can probe the model with questions.

NOTE: This is not about generating creative work but distilling it.

Example

No response

Additional Notes

Google Sheets Reference: CarperAI: Instruct-GPT Tasks

Physical Reasoning

🚀 The Task

Given some description of objects in the world, have the model generate descriptions/solutions for some goal or problem. Many of these could be derived from PIQA examples

Example

(PIQA) I'm camping and my pillow floated down the river. I have a tin can, a trash bag, and a rubber band. Is there anyway I can make a pillow?
A: Blow into a trash bag and tie with rubber band

(Other example) I have a sheet of paper, how do I make a paper airplane?
A: Fold it in half lengthwise, on one end, fold the corners towards the folded line, .....

Additional Notes

PIQA paper: https://arxiv.org/abs/1911.11641
Example used from PIQA:
[Goal] Make an outdoor pillow
[Sol1] Blow into a tin can and tie with rubber band
[Sol2] Blow into a trash bag and tie with rubber band

Rewrite/natural language transformation

🚀 The Task

To be able to rewrite a sentence or a phrase in a different manner or style.

Example

Input: Make the sentence more descriptive.
There was someone who was walking down a road and he saw a man.
Output: There was a young guy who was walking down a long, flat road surrounded by big trees and he saw an old man.

Additional Notes

No response

[task] Ideation

🚀 The Task

Ask a chatbot to come up with a list of ideas.

Example

User: What should I cook for dinner today? Can you give me ten suggestions?
Chatbot: [lists ten suggestions on what to cook]

Additional Notes

This idea might be hampered by the lack of diversity found in RLHF tuning.

[task] Security Vulnerability Detection

🚀 The Task

Given source code with (potential) security vulnerabilities, the model should be able to detect and discuss the issue with the programmer.

Example

<user> Is there any issue with this code? 
  {code that can be attacked through buffer overflow}
<instruct> Yes, the array can be used for a buffer overflow attack
<user> Is it dangerous?
<instruct> Yes it allows for arbitrary code execution

Additional Notes

Reference: @amarmet in the CarperAI Discord server.
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Code Snippet Optimization

🚀 The Task

Given a code snippet and an instruction for performance optimization, the model should generate the appropriate source code replacements for optimality. ((Can be at least partially grounded by checking results are similar and runtime is faster).

Example

Optimize the runtime of the following code snippet by using dynamic programming:
...

Vectorize the following loop:
...

Additional Notes

Reference: @amarmet in the CarperAI Discord server.
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Interactive Debugging

🚀 The Task

Minimize stack traces, locate bugs, use GDB/PDB, and produce failing test cases from a crash.

Example

Locate the bug from the following stack trace and suggest a fix:

Stack Trace:

Traceback (most recent call last):
  File "example.py", line 1, in <module>
    import xyz
ModuleNotFoundError: No module named 'xyz'

Fix:

Additional Notes

Creator: @cat-state
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Documentation Aware Assistance

🚀 The Task

If a user is using an AI pair programmer to work with a custom library or framework, performance will be much better if the prompt includes some documentation of the library or some examples of usage. The model should be able to utilize this context effectively.

Example

Help me write a chatbot using the OpenAI Completions API. Here is their API documentation:
  
  [Create completion](https://beta.openai.com/docs/api-reference/completions/create)
  POST
   
  https://api.openai.com/v1/completions
  
  Creates a completion for the provided prompt and parameters
  
  ==== 
  
  Request body
  
    model: string (Required)
      ID of the model to use. You can use the [List models](https://beta.openai.com/docs/api-reference/models/list) API to see all of your available models, or see our [Model overview](https://beta.openai.com/docs/models/overview) for descriptions of them.
  ...

Additional Notes

Creator: @ericyu3
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Automated Issue Fixer

🚀 The Task

Given a GitHub Issue as context and the relevant line to be fixed, the model should be able to generate the corresponding fix.

Example

https://github.com/${org}/${repo}/blob/${sha}/${path}#L89-L101

Change this code to fix {issue}:

Additional Notes

Relevant Application: Robb Oat is a robot software engineer
- @ robb-oat
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Code Transpiler

🚀 The Task

Given a code snippet in one source language, rewrite it in another language. This is useful for converting programs in "slow" languages to "faster" ones ("I have this really concise code written in Haskel; Can you make it fast by converting it to C++?")

Example

Rewrite the following program:
     <source-lang-1>
into <source-lang-2>

Additional Notes

Reference: @amarmet in the CarperAI Discord server.
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

Simpler text

🚀 The Task

To be able to have the model produce easier to understand vocabulary or wording of your input without changing the meaning.

Example

Input: Make this text simpler.
A wealthy man who's bald is no longer a policeman. However, he's still famous and the vehicle he drives is still flash.
Output: A rich man who has no hair is no longer a cop. But he's still well known and the car he uses is still cool to look at.

Additional Notes

No response

[task] External API License Analysis

🚀 The Task

Given a software project repository or source code, the model should help users understand terms-of-service / terms-of-use and other licenses (uncover implicit ToS).

(In other words, we want the model to aid us in understanding the terms of use for an external API and/or for code to be adopted.)

Example

Read the following terms of use for this API and tell me how we can use it for this application:
    <Terms of Service>

Additional Notes

Reference: @JohnNay
- Email john.j.nay at gmail dot com if you want to chat about ideas on this.
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Automated Builds

🚀 The Task

Automatically create and maintain build scripts, config files, and Docker images.

Example

No response

Additional Notes

Creator: @cat-state
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Design Doc Feedback

🚀 The Task

Given a system design document, the model should be able to provide feedback such as inline comments a la Google Docs along with overall software architecture suggestions (sort of as an API generator).

Example

No response

Additional Notes

Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Automated REPL Interaction

🚀 The Task

Given some code, the model should be able to run it through an interpreter, check the output, and correct itself, adapting to the interpreter's feedback.

Example

No response

Additional Notes

Similar to Codex: https://www.youtube.com/watch?v=_3MBQm7GFIM&t=265s
Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Instruction-Induction Benchmark Tasks

🚀 The Task

The model should be able to follow the instruction-induction benchmark tasks outlined in Table 1 of Honovich et al.'s Instruction Induction: From Few Examples to Natural Language Task Descriptions

Example

Additional Notes

Google Sheets Reference: CarperAI: Instruct-GPT Tasks

[task] Pull Request Summarization from Diffs

🚀 The Task

Given the commit diffs for a pull request, the model should be able to summarize the changes made by the source contributor.

Example

Summarize a pull request for the following diffs:

diff --git a/file.py b/file.py
index ...
--- a/file.py
+++ b/file.py
- print("hello world!")
+ print("HELLO WORLD!")

Additional Notes

Reference: What The Diff

[task] Translation

🚀 The Task

Ask a chatbot to translate an utterance or set of utterances. This can be augmented with existing aligned datasets

Example

User: Here are two sentences, please translate them to [TARGET LANGUAGE]
Bot: [Sentences in target language]

Additional Notes

This can trivially be instructed from a plethora of existing datasets, and probably does not require prompt collection.