probcomp / hfppl Goto Github PK

Probabilistic programming with HuggingFace language models

Home Page: https://probcomp.github.io/hfppl/

Python 82.30% HTML 17.70%

huggingface-transformers language-model ppl probabilistic-programming python3

hfppl's Introduction

LLaMPPL + HuggingFace

LLaMPPL is a research prototype for language model probabilistic programming: specifying language generation tasks by writing probabilistic programs that combine calls to LLMs, symbolic program logic, and probabilistic conditioning. To solve these tasks, LLaMPPL uses a specialized sequential Monte Carlo inference algorithm. This technique, SMC steering, is described in our recent workshop abstract.

This repository implements LLaMPPL for use with HuggingFace Transformers.

Installation

If you just want to try out LLaMPPL, check out our demo notebook on Colab, which performs a simple constrained generation task using GPT-2. (Larger models may require more RAM or GPU resources than Colab's free version provides.)

Note

We use poetry to manage dependencies. If you don't have poetry installed, you can install it with pip install poetry.

To get started on your own machine, clone this repository and run poetry install to install hfppl and its dependencies.

git clone https://github.com/probcomp/hfppl
cd hfppl
poetry install

Then, try running an example. Note that this will cause the weights for Vicuna-7b-v1.5 to be downloaded.

poetry run python examples/hard_constraints.py

If everything is working, you should see the model generate political news using words that are at most five letters long (e.g., "Dr. Jill Biden may still be a year away from the White House but she is set to make her first trip to the U.N. today.").

Modeling with LLaMPPL

A LLaMPPL program is a subclass of the hfppl.Model class.

from hfppl import Model, LMContext, CachedCausalLM

# A LLaMPPL model subclasses the Model class
class MyModel(Model):

    # The __init__ method is used to process arguments
    # and initialize instance variables.
    def __init__(self, lm, prompt, forbidden_letter):
        super().__init__()

        # A stateful context object for the LLM, initialized with the prompt
        self.context = LMContext(lm, prompt)
        self.eos_token = lm.tokenizer.eos_token_id
        
        # The forbidden letter
        self.forbidden_tokens = set(i for (i, v) in enumerate(lm.vocab)
                                      if forbidden_letter in v)
    
    # The step method is used to perform a single 'step' of generation.
    # This might be a single token, a single phrase, or any other division.
    # Here, we generate one token at a time.
    async def step(self):
        # Condition on the next token *not* being a forbidden token.
        await self.observe(self.context.mask_dist(self.forbidden_tokens), False)
        
        # Sample the next token from the LLM -- automatically extends `self.context`.
        token = await self.sample(self.context.next_token())

        # Check for EOS or end of sentence
        if token.token_id == self.eos_token or str(token) in ['.', '!', '?']:
            # Finish generation
            self.finish()

    # To improve performance, a hint that `self.forbidden_tokens` is immutable
    def immutable_properties(self):
        return set(['forbidden_tokens'])

The Model class provides a number of useful methods for specifying a LLaMPPL program:

self.sample(dist[, proposal]) samples from the given distribution. Providing a proposal does not modify the task description, but can improve inference. Here, for example, we use a proposal that pre-emptively avoids the forbidden letter.
self.condition(cond) conditions on the given Boolean expression.
self.finish() indicates that generation is complete.
self.observe(dist, obs) performs a form of 'soft conditioning' on the given distribution. It is equivalent to (but more efficient than) sampling a value v from dist and then immediately running condition(v == obs).

To run inference, we use the smc_steer or smc_standard methods:

import asyncio
from hfppl import smc_steer

# Initialize the HuggingFace model
lm = CachedCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", auth_token=<YOUR_HUGGINGFACE_API_TOKEN_HERE>)

# Create a model instance
model = MyModel(lm, "The weather today is expected to be", "e")

# Run inference
particles = asyncio.run(smc_steer(model, 5, 3)) # number of particles N, and beam factor K

Sample output:

sunny.
sunny and cool.
34° (81°F) in Chicago with winds at 5mph.
34° (81°F) in Chicago with winds at 2-9 mph.
hot and humid with a possibility of rain, which is not uncommon for this part of Mississippi.

Further documentation can be found at https://probcomp.github.io/hfppl.

hfppl's People

Contributors

Stargazers

Watchers

Forkers

jagilley omarcr joaoloula postylem ellieyhcheng tanapl multipolarityai benlipkin perrybleiberg zehsilva benlebrun

hfppl's Issues

More examples?

Hi,

Great work.

Would it be possible to get more examples? I'd like to see how to use Infilling and Prompt Intersection are used.

I think it would really help my understanding.

Thank you!

AttributeError: 'LMContext' object has no attribute 's'

https://colab.research.google.com/drive/1uJEC-U8dcwsTWccCDGVexpgXexzZ642n?usp=sharing#scrollTo=gsi7R6D0NcGn&uniqifier=1

10 self.lm = LMContext(LLM, prompt)
11 self.q = LMContext(LLM, prompt)
---> 12 self.prompt_len = len(str(self.lm.s))
13 self.max_tokens = max_tokens
14

AttributeError: 'LMContext' object has no attribute 's'

Same generation for different N for K>1

I have a question about the role of N (the number of particles), and K (the factor) in the task of prompt intersection. I am trying to replicate Fig. 4 in the workshop paper, with the same 2 prompts.

I am observing that, for N>1 and K=1, the obtained continuations are different for different N. Instead, as soon as K>1, they are equal. I attach a couple of examples and my code for the mode.

Does the fact that I chose batch_size=1 matter? I stop generations after 20 tokens.

Example N =2, K=1
20 and has never spoken to me. (Though we may be in the same lecture
compression of time and the expansion of space. Are you aware of the work of John Arch

Example N =2, K=2
19th century English physicist James Clerk Maxwell. His work on elect
19th century English physicist James Clerk Maxwell. His work on elect

import asyncio
import os
import string
import torch
from hfppl import CachedCausalLM
from hfppl import LMContext
from hfppl import Model
from hfppl import smc_standard, smc_steer
from hfppl.distributions import transformer



if "HF_AUTH_TOKEN" in os.environ:
    HF_AUTH_TOKEN = os.environ["HF_AUTH_TOKEN"]

# Load the language model.
# Mistral and Vicuna are open models; to use a model with restricted access, like LLaMA 2,
# pass your HuggingFace API key as the optional `auth_token` argument:
#LLM = CachedCausalLM.from_pretrained(
#    "meta-llama/Meta-Llama-3-8B", auth_token=HF_AUTH_TOKEN
#)
LLM = CachedCausalLM.from_pretrained("lmsys/vicuna-7b-v1.5")
# LLM = CachedCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
LLM.batch_size = 1


class PromptIntersection(Model):
    # Initialize
    def __init__(self, prompts,max_tokens):
        super().__init__()
        self.s = ""
        self.prompts = prompts
        self.x = [LMContext(LLM, p)
                    for p in prompts]
        self.max_tokens = max_tokens

    # Generate
    async def step(self):
        w = await self.sample(self.x[0].next_token())

        # Reduce number of max tokens remaining
        self.max_tokens -= 1

        #(self.transformer(self.x[0]))
        for x in self.x[1:]:
            await self.observe(x.next_token(), w)

        if w == LLM.tokenizer.eos_token_id or self.max_tokens == 0:
            self.finish()
        else:
            self.s += w



prompts = ["My favorite physicist is probably ", "My favorite writer is probably "]


async def main():

    constraint_model = PromptIntersection(prompts,20)
    particles = await smc_steer(
        constraint_model, 2,3
    )
    for p in particles:
        print(f"{p.s}")


asyncio.run(main())

Removing `transformers==4.30` requirement

I haven't tested comprehensively as I don't know what error was coming up with transformers > 4.30, but I'm pretty sure that if line 238 of llms.py is changed from

logits = self.model(torch.tensor([[self.tokenizer.bos_token_id]]).to(self.model.device)).loss['logits'][0][0]

logits = self.model(torch.tensor([[self.tokenizer.bos_token_id]]).to(self.model.device)).logits[0][0]

that will allow the latest HF Transformers package to be installed. I've tested it, and it's working for me on transformers==4.35. Interestingly, this allows for use with Mistral's 7B model!

Submitting an issue instead of a PR for now as I haven't tested comprehensively when line 238 does and doesn't work with respect to HF transformers versions.

ValueError: probabilities do not sum to 1

I modify .lm.s ---> lm it can run.But have a new issue.

/usr/local/lib/python3.10/dist-packages/hfppl/distributions/lmcontext.py in sample(self)
24 async def sample(self):
25 probs = np.exp(self.ctx.next_token_logprobs)
---> 26 token_id = np.random.choice(len(probs), p=(probs))
27 self.ctx.tokens.append(token_id)
28 logprob = self.ctx.next_token_logprobs[token_id]

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: probabilities do not sum to 1

Request for further details on Feynman-kac formulae

thank you for releasing this amazing repo - SMC steering looks very promising

I think this will probably be the first time a lot of DL Engineers come across the Feynman-Kac formulae

It'd be very helpful if you could share a supplement note or a review accessible to someone with a CS background to help us understand the Feynman-kac formulae better - the reference in your paper (by Del Moral) is to a dense, 600 page book!

hope you think this is a reasonable request and thank you again for sharing your work

Rewriting the prompt_intersection.py example from the LLaMPPL repo with hfppl

I am attempting to rewrite the prompt_intersection.py example from the LLaMPPL repo.

My model has a list of LMContexts whose strings I am updating at the end of my step() function. But it looks like calling smc_steer() is generating all new tokens using the original prompts/contexts, and not the updated contexts as new tokens are appended.

Here is my code:

LLM = CachedCausalLM.from_pretrained("gpt2")
LLM.batch_size = 40

class PromptIntersection(Model):
    def __init__(self, prompts):
        super().__init__()
        self.contexts = [LMContext(LLM,"<|endoftext|>"+prompt) for prompt in prompts]

    async def step(self):

        # Observe from other LLMs
        token = await self.sample(Transformer(LLM,self.contexts[0].s),
                    proposal=await self.locally_optimal_proposal())

        for context in self.contexts[1:]:
            await self.observe(Transformer(LLM,context.s), token)

        print(f'new token = {token}')

        # Check for eos
        if token == LLM.tokenizer.eos_token_id:
            self.finish()
            return

        # Update context
        for context in self.contexts:
            context.s += token
            print(f"New context = {context.s}")


    async def locally_optimal_proposal(self):

        logprobs = [context.next_token_logprobs for context in self.contexts]
        p_scores = sum(logprobs)
        q_logprobs = p_scores - hfppl.logsumexp(p_scores)
        return hfppl.TokenCategorical(LLM,q_logprobs)

The following code produces the following output (notice the print statements above) and does not seem to terminate (presumably because it never generates an EOS token for the reason I suggested above.)

Code:

prompts = ["My favorite writer is probably","My favorite physicist is probably"]
model = PromptIntersection(prompts)
await smc_steer(model, 1, 3)

Output:

<ipython-input-9-e3dde579facb>:13: RuntimeWarning: coroutine 'Model.observe' was never awaited
  self.observe(Transformer(LLM,context), token)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
new token = Ġthe
New context = <|endoftext|>My favorite writer is probably the
New context = <|endoftext|>My favorite physicist is probably the
new token = Ġmy
New context = <|endoftext|>My favorite writer is probably my
New context = <|endoftext|>My favorite physicist is probably my
new token = Ġthe
New context = <|endoftext|>My favorite writer is probably the
New context = <|endoftext|>My favorite physicist is probably the
new token = Ġthe
New context = <|endoftext|>My favorite writer is probably my the
New context = <|endoftext|>My favorite physicist is probably my the
new token = Ġthe
New context = <|endoftext|>My favorite writer is probably my the
New context = <|endoftext|>My favorite physicist is probably my the
new token = Ġa
New context = <|endoftext|>My favorite writer is probably my a
New context = <|endoftext|>My favorite physicist is probably my a
new token = Ġthe
New context = <|endoftext|>My favorite writer is probably my a the
New context = <|endoftext|>My favorite physicist is probably my a the
new token = Ġthe
New context = <|endoftext|>My favorite writer is probably my a the
New context = <|endoftext|>My favorite physicist is probably my a the
new token = Ġthe
New context = <|endoftext|>My favorite writer is probably my a the
New context = <|endoftext|>My favorite physicist is probably my a the
new token = ĠDavid
New context = <|endoftext|>My favorite writer is probably my a the David
New context = <|endoftext|>My favorite physicist is probably my a the David
new token = Ġthe
New context = <|endoftext|>My favorite writer is probably my a the the
New context = <|endoftext|>My favorite physicist is probably my a the the
new token = ĠJames
New context = <|endoftext|>My favorite writer is probably my a the James
New context = <|endoftext|>My favorite physicist is probably my a the James
...

Any other feedback on the code would be appreciated as well. In particular, I'm not sure if I should be appending the new token to each LMContext separately.

Thank you in advance for your help, I have high hopes about trying out prompt intersection for several use cases!

particle `start` method is not called by `smc_standard`

Simply adding the following lines to the top of the method should do the trick:

for particle in particles:
    particle.start()

Example usage of `twist`

Could you provide an example of how to use Model.twist()? Currently, I'm calling it in step() with values between 0 and 1. I'd imagine that the effect size should be magnified more than this to get a bigger difference after the exp function is called (e.g., multiplying by 10 prior to calling twist())?

EDIT: I've got an example of steering the model with differentials of cosine similarity to different embeddings using twist() up and running here, this can probably be closed. Happy to open a PR to add it to the examples folder if you like.