Coder Social home page Coder Social logo

dfki-nlp / llmcheckup Goto Github PK

View Code? Open in Web Editor NEW
9.0 5.0 1.0 11 MB

Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-explanation" (Wang et al. 2024)

Home Page: https://arxiv.org/abs/2401.12576

Python 82.51% Dockerfile 0.28% JavaScript 0.45% HTML 16.75%
chatbot conversational-ai dialogue explainability explainable-ai interface interpretability language-models llms nlp xai

llmcheckup's Introduction

LLMCheckup

Static Badge Static Badge Static Badge Static Badge

Dialogical Interpretability Tool for LLMs

💥Running with conda / virtualenv

Note: Please use Python 3.8+ and torch 2.0+

Create the environment and install dependencies

Conda

conda create -n llmcheckup python=3.9
conda activate llmcheckup

venv

python -m venv venv
source venv/venv/activate

⚙️Install the requirements

python -m pip install --upgrade pip
pip install -r requirements.txt
python -m nltk.downloader "averaged_perceptron_tagger" "wordnet" "omw-1.4"

🚀Launch system

python flask_app.py

💟Supported explainability methods

  • Feature Attribution
    • Attention, Integrated gradient, etc.
    • Implemented by 🐛inseq package
  • Semantic Similarity
  • Free-text rationalization
    • Zero-shot CoT
    • Plan-and-Solve
    • Optimization by PROmpting (OPRO)
    • Any customized additional prompt according to users' wish
    • Notice: Above mentioned options can be freely chosen in the interface - "Prompt modification"
  • Data Augmentation
    • Implemented by NLPAug package or few-shot prompting
  • Counterfactual Generation

🤗Models:

In our study, we identified three LLMs for our purposes.

🐳Deployment:

We support different methods for deployment:

✏️Support:

Method Unix-based Windows
Original
GPTQ
Bitsanbytes*
petals**

*: 🪟 For Windows: if you encounter errors while installing bitsandbytes, then try:

python -m pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl

**: petals is currently not supported in windows, since a lot of Unix-specific things are used in petals. See issue here. petals is still usable if running LLMCheckup in Docker or WSL 2.

🔍Use case:

Fact checking

Dataset: COVID-Fact

Link: https://github.com/asaakyan/covidfact

Structure

{
    Claim: ...,
    Evidence: ...,
    Label: ...,
}

Commonsense Question Answering

Dataset: ECQA

Link: https://github.com/dair-iitd/ECQA-Dataset

{
    Question: ...,
    Multiple choices: ...,
    Correct answer: ...,
    Positive explanation: ...,
    Negative explanation: ...,
    Free-flow explanation: ...,
}

📝Input with multi modalities

  • Text
  • Image
    • Image upload
    • Optical Character Recognition
  • Audio
    • A lightweight fairseq s2t model from meta
    • If you encounter errors when reading recorded files: soundfile.LibsndfileError: Error opening path_to_wav: Format not recognised.. Then please try to install ffmpeg.
      • 🐧In Linux: sudo apt install ffmpeg or pip3 install ffmpeg
      • 🪟In Windows: Download ffmpeg from here and add the path to system environment

llmcheckup's People

Contributors

qiaw99 avatar nfelnlp avatar

Stargazers

Hadi Asghari avatar Jeff Carpenter avatar  avatar Xiaojie Gu avatar Пётр Рокосов avatar  avatar Gabriele Sarti avatar  avatar  avatar

Watchers

Jack Chen avatar  avatar  avatar Kostas Georgiou avatar  avatar

Forkers

techthiyanes

llmcheckup's Issues

Refactor project structure

https://docs.python-guide.org/writing/structure/

This is a good starting point, you shouldn't have one single folder holding a dozen of codefiles. The reason being, that many files in a single folder make code mostly look unstructured and hard to oversee and its worse in its maintainability, as your code isn't broken down in (mostly) independent modules.

I personally also like using the java structure (even though abbreviated at times).
https://stackoverflow.com/questions/28160379/how-to-create-a-test-directory-in-intellij-13/28161314#28161314

Rename setters and getters

https://github.com/nfelnlp/LLMCheckup/blob/8cab97fde6ec7d54100daccf3628813fd3715a3d/logic/conversation.py#L31
update_name -> set_name, also you could use @Property if you like that more

https://github.com/nfelnlp/LLMCheckup/blob/8cab97fde6ec7d54100daccf3628813fd3715a3d/logic/conversation.py#L35
update_contents -> set_contents

https://github.com/nfelnlp/LLMCheckup/blob/8cab97fde6ec7d54100daccf3628813fd3715a3d/logic/conversation.py#L39
update_type -> set_type

but setters and getters aren't really needed anyways, you're accessing a (I assume) public class attribute and you aren't doing any logic in the setters or getters. The class itself looks like a @DataClass. If you don't know dataclasses, https://docs.python.org/3/library/dataclasses.html dataclasses are really neat, think of them as the python equivalent to structs

also https://github.com/nfelnlp/LLMCheckup/blob/8cab97fde6ec7d54100daccf3628813fd3715a3d/logic/conversation.py#L19C56-L19C56
"kind" method argument is ambigous, use type instead, but type is also ambigous, use conversation_type or something similiar instead and rename class property accordingly

inseq assertion error

Prompt:

show me the 5 most important features for data point 730 by attention

Error Track:

  File "C:\Users\87290\DFKI\LLMCheckup\flask_app.py", line 267, in get_bot_response
    response = BOT.update_state(user_text, conversation)
  File "C:\Users\87290\DFKI\LLMCheckup\logic\core.py", line 536, in update_state
    returned_item = run_action(
  File "C:\Users\87290\DFKI\LLMCheckup\logic\action.py", line 49, in run_action
    action_return, action_status = actions[p_text](
  File "C:\Users\87290\DFKI\LLMCheckup\actions\explanation\feature_importance.py", line 130, in feature_importance_operation
    out_agg = out.aggregate(inseq.data.aggregator.SubwordAggregator)
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\data\attribution.py", line 617, in aggregate
    aggregated.sequence_attributions[idx] = seq.aggregate(aggregator, **kwargs)
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\data\aggregator.py", line 249, in aggregate
    return aggregator.aggregate(
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\data\aggregator.py", line 720, in aggregate
    return super().aggregate(attr, source_spans=source_spans, target_spans=target_spans, **kwargs)
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\data\aggregator.py", line 546, in aggregate
    return super().aggregate(attr, source_spans=source_spans, target_spans=target_spans, **kwargs)
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\data\aggregator.py", line 102, in aggregate
    cls.post_aggregate_hook(aggregated, **kwargs)
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\data\aggregator.py", line 337, in post_aggregate_hook
    cls.is_compatible(attr)
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\data\aggregator.py", line 432, in is_compatible
    assert attr.target_attributions.shape[1] == attr.attr_pos_end - attr.attr_pos_start
AssertionError

prompt:

primary features of data point 3444 by input gradient
  File "C:\Users\87290\DFKI\LLMCheckup\flask_app.py", line 267, in get_bot_response
    response = BOT.update_state(user_text, conversation)
  File "C:\Users\87290\DFKI\LLMCheckup\logic\core.py", line 536, in update_state
    returned_item = run_action(
  File "C:\Users\87290\DFKI\LLMCheckup\logic\action.py", line 49, in run_action
    action_return, action_status = actions[p_text](
  File "C:\Users\87290\DFKI\LLMCheckup\actions\explanation\feature_importance.py", line 120, in feature_importance_operation
    out = inseq_model.attribute(
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\models\attribution_model.py", line 445, in attribute
    attribution_outputs = attribution_method.prepare_and_attribute(
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\attr\attribution_decorators.py", line 71, in batched_wrapper
    out = f(self, *args, **kwargs)
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\attr\feat\feature_attribution.py", line 237, in prepare_and_attribute
    attribution_output = self.attribute(
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\attr\feat\feature_attribution.py", line 372, in attribute
    attr_pos_start, attr_pos_end = check_attribute_positions(
  File "C:\Users\87290\anaconda3\envs\llm\lib\site-packages\inseq\attr\feat\attribution_utils.py", line 85, in check_attribute_positions
    raise ValueError("Start and end attribution positions cannot be the same.")
ValueError: Start and end attribution positions cannot be the same.

Remove unnecessary directories

The directories /cache, /data shouldn't be uploaded as source files.

Cache can be generated on the fly and the data should be available somewhere?

Guard clauses

Show CUDA OOM error on interface

I tried running the default config which has a Llama-2-7B-chat-hf model and ran into the common CUDA out of memory error. I wonder if we should show some sort of hint as well as a recommendation on the interface about this. Specifically, we could replace the standard response

I'm sorry but could you rephrase the message, please?

with

While decoding the response, I recognized a CUDA out of memory. I suggest to choose a smaller model for your hardware configuration. You can do that by opening the global_config.gin file and editing the value of GlobalArgs.config to an equivalent with a model of smaller parameter size, e.g. "ecqa_llama_gptq.gin" or "ecqa_pythia.gin".

[2024-01-09 07:29:27,018] INFO in core: getting grammar
[2024-01-09 07:29:27,018] INFO in core: About to decode
[2024-01-09 07:29:28,031] INFO in flask_app: Traceback getting bot response: Traceback (most recent call last):
  File "/home/nfel/PycharmProjects/LLMCheckup/flask_app.py", line 267, in get_bot_response
    response = BOT.update_state(user_text, conversation)
  File "/home/nfel/PycharmProjects/LLMCheckup/logic/core.py", line 572, in update_state
    parse_tree, parsed_text = self.compute_parse_text(text)
  File "/home/nfel/PycharmProjects/LLMCheckup/logic/core.py", line 376, in compute_parse_text
    parsed_text = self.mprompt_parser.parse_user_input(text)
  File "/home/nfel/PycharmProjects/LLMCheckup/parsing/multi_prompt/prompting_parser.py", line 173, in parse_user_input
    parsed_operation = self.generate_with_prompt(operation_type_prompt, user_input).replace("[E]", "").strip()
  File "/home/nfel/PycharmProjects/LLMCheckup/parsing/multi_prompt/prompting_parser.py", line 96, in generate_with_prompt
    outputs = self.decoder_model.generate(**inputs, generation_config=self.generation_config)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1719, in generate
    return self.sample(
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2801, in sample
    outputs = self(
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 672, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/nfel/PycharmProjects/LLMCheckup/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 383, in forward
    value_states = torch.cat([past_key_value[1], value_states], dim=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 9.78 GiB total capacity; 5.66 GiB already allocated; 75.00 MiB free; 6.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

[2024-01-09 07:29:28,031] INFO in flask_app: Exception getting bot response: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 9.78 GiB total capacity; 5.66 GiB already allocated; 75.00 MiB free; 6.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

requirements.txt has version conflicts

ERROR: Cannot install -r .\requirements.txt (line 10), -r .\requirements.txt (line 73), -r .\requirements.txt (line 90) and huggingface-hub==0.4.0 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested huggingface-hub==0.4.0
datasets 2.10.1 depends on huggingface-hub<1.0.0 and >=0.2.0
sentence-transformers 2.2.0 depends on huggingface-hub
transformers 4.34.1 depends on huggingface-hub<1.0 and >=0.16.4

To replicate:
Download repository on fresh venv and execute
"pip -r requirements.txt"

Why use gin?

What is wrong with using json / ini files? They are much more popular and better to overlook.

I have discussed this with @nfelnlp and he seems to understand my point but I would genuinely like to know why you've decided against writing a small loader.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.