x-plug / mplug-owl Goto Github PK
View Code? Open in Web Editor NEWmPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
Home Page: https://www.modelscope.cn/studios/damo/mPLUG-Owl
License: MIT License
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
Home Page: https://www.modelscope.cn/studios/damo/mPLUG-Owl
License: MIT License
Thank you for your contribution!
I tried to make a custom image pair dataset as:
{"image": ["image1.jpg","image2.jpg"], "text": "The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: \nHuman: \nHuman: Can you compare the different between these two images?\nAI: xxxx", "task_type": "xxx"}
However the training loss is always NaN.
How can I train a custom image pair dataset, or how did you train your video data?
Thank you so much!
As the comment text in config file, the size of each dataset (# [50997(alpaca), 155562(llava), 53456(quora), 101466(sharegpt)] 361481 ) is different from the original dataset.
Is there any code or script to filter the data?
Could you add the minimum resource requirement in the readme doc? I think many people are curious about it.
what is the network architecture of the visual abstractor?
Hi there,
I've been using mPLUG-owl and noticed a significant difference in inference speed compared to other models such as Otter and multimodal-GPT. It also outperforms Vicuna and LLaMA in terms of speed. I'm curious to know the reason behind this performance gap.
Could you kindly shed some light on the factors contributing to the observed speed advantage of mPLUG-owl over these models? I'm curious to know what factors or optimizations contribute to its improved performance. :)
文中 Fig2 里 stage2 画的是Abstractor冻住了,但是论文描述和代码里似乎都是训练的?
I am confused about how to pass the video into the model through the interface example you provided? Looking forward to your help,Thanks!
Dear authors,
I want to do the 1st stage training with my own caption data, have you provided the training script of that in this repo?
I only find the instructions related to the 2nd training stage in README.md
Thanks for your help!
Have you guys tested finetuning the whole llama decoder for the finetuning stage instead of using LoRA? Curious what findings or insights y'all might have there, since I didn't see it included in the paper.
1.One error should be solved
when install apex, there will be 4 erors about "convert unsigned long to long", you need to edit:
(1) line 65 in apex_22.01_pp/csrc/mlp.cpp
auto reserved_space = at::empty({reserved_size}, inputs[0].type());
change to:
auto reserved_space = at::empty({static_cast<long>(reserved_size)}, inputs[0].type());
(2) line 138 in apex_22.01_pp/csrc/mlp.cpp
auto work_space = at::empty({work_size / sizeof(scalar_t)}, inputs[0].type());
change to:
auto work_space = at::empty({static_cast<long>(work_size / sizeof(scalar_t))}, inputs[0].type());
or you need to change the compile option
2.one improvement for reducing the CUDA memory
when launch the owl_demo.py using a GPU with 16G, I ran into a CUDA memory overflow error. Then I edit here:
line 33 and 34 in interface.py:
model = model.to(device)
model = model.to(dtype)
change to:
model = model.to(dtype)
model = model.to(device)
Then, After the demo is started, the memory usage is about 14 GB. It can run very well on a 16GB GPU.
I run train_it.sh on my own data
I got the file as below:
optimizer.pt rng_state_0.pth rng_state_2.pth rng_state_4.pth rng_state_6.pth scheduler.pt training_args.bin
pytorch_model.bin rng_state_1.pth rng_state_3.pth rng_state_5.pth rng_state_7.pth trainer_state.json
How to use this model
I just cannot start instruction tuning for stage2 on single A100 with 40GB VRAM.
TKS!
I appreciate the great work!! But when I tried to deploy the model in my local machine, there is no response after I clicked the download button for the weight. Is it possible to store the weight in different platform?
According to the paper, the training data in the 1st stage is 104 billion tokens. Since the captions are short, we assume each caption has 20 tokens. 104B/20 = 5200M captions, which is amazing. Maybe my calculation is wrong, would you mind explaining the number of captions you used during the 1st training stage? Thanks in advance.
Hi, it seems like the checkpoint links for both the pre-trained and the instruction fine-tuned models are the same in the readme. They both point to http://mm-chatgpt.oss-cn-zhangjiakou.aliyuncs.com/mplug_owl_demo/released_checkpoint/pretrained.pth.
Is this the pre-trained or the instruction fine-tuned model? I'm assuming since this is the checkpoint used in the demo, it is the instruction fine-tuned model?
conda create -n mplug_owl python=3.10
conda activate mplug_owl
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
I strictly followed the installation instructions above.
python -m serve.web_server --base-model 'pretrained_weights/mplug-owl-llama-7b' --port 8501 --bf16
But when I try to run a simple example like this one below.
I got an error.
Traceback (most recent call last):
File "/data/pcl/proj/mPLUG-Owl/serve/model_utils.py", line 64, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/data/pcl/proj/mPLUG-Owl/serve/model_worker.py", line 110, in generate_with_callback
self.model.generate(**kwargs)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data/pcl/proj/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 1524, in generate
query_outputs = self.abstractor(
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/pcl/proj/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 1092, in forward
encoder_outputs = self.encoder(
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/pcl/proj/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 936, in forward
layer_outputs = layer_module(
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/pcl/proj/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 874, in forward
cross_attention_outputs = self.crossattention(
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/pcl/proj/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 846, in forward
attention_output = self.output(self_outputs[0], hidden_states)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/pcl/proj/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 791, in forward
input_tensor = input_tensor + self.mlp(self.norm2(input_tensor))
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/pcl/proj/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 679, in forward
hidden_states = self.ffn_ln(hidden_states)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/pcl/proj/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 158, in forward
output = torch.nn.functional.layer_norm(
File "/data/pcl/miniconda3/envs/ovd2/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected weight to be of same shape as normalized_shape, but got weight of shape [2816] and normalized_shape = [2048]
If there is any help, I am very grateful!!
Hi, thank you for sharing your awesome work. I have been trying to run the model locally but the apex dependency and its dependency to specific CUDA versions make the environment setup with older GPUs tricky. Really appreciate if you can remove the apex dependency soon. Thanks!
First of all, thanks for your great work.
From the paper, I see learnable queries in visual abastractor. I think it may be similar to Perceiver in Flamingo or Q-Former in BLIP-2. But I don't find the implementation in your code about learnable queries (mPLUG_OwlVisualAbstractorEncoder and mPLUG_OwlVisualAbstractorModel in modeling_mplug_owl.py).
I am curious about the details of visual abastractor. In other words, is it seems to Q-Former or Perceiver? The details do not contain in your paper and I cannot find in the code.
Thanks again.
感谢你们的工作,以及发布到huggingface的努力,实在是太棒了。
这个是blip-2写在readme里面的, https://github.com/salesforce/LAVIS/tree/main/projects/blip2 ,可以参考一下最下面。
或者可以附一下huggingface调用mplug-owl的说明吗?我在huggingface里面找了好久没找到呢?只找到了https://huggingface.co/MAGAer13/mplug-owl-llama-7b/discussions ,和
from transformers import AutoModel
model = AutoModel.from_pretrained("MAGAer13/mplug-owl-llama-7b"),
但是至于如何进行inference,就没找到了。
Hi, is it possible to get the tokenwise log-likelihood scores of different outputs from the model?
The use-case would be something like:
Given an interleaved image/text input and a list of output text candidates, we should be able to get a score for each output candidate and then return their ranked list, rather than generating the outputs directly. This would be close to how LLMs are evaluated on MCQ tasks. An example from the T0 paper Page 6 (https://arxiv.org/pdf/2110.08207.pdf):
For tasks that involve choosing the correct completion from several options (e.g. multiple choice
question answering), we follow Brown et al. (2020) and use rank classification to evaluate our
model: we compute the log-likelihood of each of the target options under the fine-tuned model and
select the option with the highest log-likelihood as the prediction. For simplicity, we do not apply
length normalization to the log-likelihoods of the target options.
Is it straightforward to do this with mPLUG-Owl? I assume since the LM is built with transformers there should be a possibility to use output score functions already implemented (haven't dug into this yet)?
Thank you for the amazing work and releasing such concise code. I had a question about how to do model inference with multi-image inputs. Expanding on your provided inference code, something like this comes to mind:
# We use a human/AI template to organize the context as a multi-turn conversation.
# <image> denotes an image placehold.
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <image>
Human: <image>
Human: Is the colour of the animal in the first image the same as the second image?
AI: ''']
# The image paths should be placed in the image_list and kept in the same order as in the prompts.
# We support urls, local file paths and base64 string. You can custom the pre-process of images by modifying the mplug_owl.modeling_mplug_owl.ImageProcessor
image_list = ['https://xxx.com/image_1.jpg', 'https://xxx.com/image_2.jpg',]
Could you please confirm if this is the right way to do multi-image inference with the model?
Thanks!
Just a reminder that MiniGPT-4 is missing in your related projects. Even you have used some testing images same as used in MiniGPT-4 and also treat MiniGPT-4 as a comparison baseline.
Dear authors,
Thank you for the great work and open sourcing.
I noticed that you keep about 1k validations for alpaca, llava, quora, and sharegpt #39. Since I want to keep the same setting in my experiments, would you mind sharing the splits with me?
I've used the v0 checkpoint for some experiments.
mPLUG-Owl/server_mplug/utils.py
Line 139 in cc770c2
mPLUG-Owl/server_mplug/utils.py
Line 123 in cc770c2
As title.
Unfortunately, I ran into another issue with dependency conflicts.
There's an open PR that bumps torch
to 1.13.1
. However, torchvision==0.13.1 is not compatible with torch==1.13.1.
What version of torchvision would you recommend?
Do you use open source data for the video dataset, or do you organize it yourself?
Interface.py can not load the model that finetuned with lora.
How many GPUs (V100 or A100) are required?
Is it going to be released in the next version?
There are conflicts between the dependencies specified in the README and env.yaml
.
In the README, torchvision
is not specified and torch
is pinned at 1.13.1
. Further, in env.yaml
peft
is not listed as a dependency.
PyTorch=1.13.1 (1.13.1 is required by the peft)
However, when I tried to run the demo like:
python -m server_mplug.owl_demo --debug --port 6363 --checkpoint_path 'your checkpoint path' --tokenizer_path 'your tokenizer path'
I got an import error for torchvision
.
I then referred to env.yaml
for dependencies, but there torch is pinned at 1.12.1
- pytorch=1.12.1=py3.10_cuda11.3_cudnn8.3.2_0
This conflicts with the version specified in the README.
WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option / --install-option. Consider using --config-settings for more flexibility.
DEPRECATION: --no-binary currently disables reading from the cache of locally built wheels. In the future --no-binary will not influence the wheel cache. pip 23.1 will enforce this behaviour change. A possible replacement is to use the --no-cache-dir option. You can use the flag --use-feature=no-binary-enable-wheel-cache to test the upcoming behaviour. Discussion can be found at pypa/pip#11453
Processing c:\users\haose\github\mplug-owl-main\apex
Running command python setup.py egg_info
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\haose\Github\mPLUG-Owl-main\apex\setup.py", line 130, in
_, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
File "C:\Users\haose\Github\mPLUG-Owl-main\apex\setup.py", line 17, in get_cuda_bare_metal_version
raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
torch.version = 2.0.0+cu117
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: 'C:\Users\haose\anaconda3\envs\owl\python.exe' -c '
exec(compile('"'"''"'"''"'"'
distutils.core
to work with newer packaging standards.sys.argv[0]
to the underlying setup.py
, when invoking setup.py
so-c
. This avoids the following warning:import os, sys, tokenize
try:
import setuptools
except ImportError as error:
print(
"ERROR: Can not execute setup.py
since setuptools is not available in "
"the build environment.",
file=sys.stderr,
)
sys.exit(1)
file = %r
sys.argv[0] = file
if os.path.exists(file):
filename = file
with tokenize.open(file) as f:
setup_py_code = f.read()
else:
filename = ""
setup_py_code = "from setuptools import setup; setup()"
exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'C:\Users\haose\Github\mPLUG-Owl-main\apex\setup.py'"'"',), "", "exec"))' egg_info --egg-base 'C:\Users\haose\AppData\Local\Temp\pip-pip-egg-info-h58h_b5f'
cwd: C:\Users\haose\Github\mPLUG-Owl-main\apex
Preparing metadata (setup.py) ... error
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Hi.
Thanks for this great work.
I've used the Huggingface demo to generate descriptions for some images with the following prompt:
Describe this image as detailed as possible.
I also used the 8bits model in colab. This is the code that I used to generate the descriptions:
import torch
from PIL import Image
import requests
from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration
from mplug_owl.tokenization_mplug_owl import MplugOwlTokenizer
from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor
pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b'
model = MplugOwlForConditionalGeneration.from_pretrained(
pretrained_ckpt,
load_in_8bit = True,
torch_dtype = torch.half,
device_map= 'auto'
)
image_processor = MplugOwlImageProcessor.from_pretrained(pretrained_ckpt)
tokenizer = MplugOwlTokenizer.from_pretrained(pretrained_ckpt)
processor = MplugOwlProcessor(image_processor, tokenizer)
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <image>
Human: Describe this image as detailed as possible.
AI: ''']
image_list = ["/path_to_image"]
generate_kwargs = {
'do_sample': True,
'top_k': 5,
'max_length': 512
}
from PIL import Image
import requests
images = [Image.open(_) for _ in image_list]
inputs = processor(text=prompts, images=images, return_tensors='pt')
inputs = {k: v.bfloat16() if v.dtype == torch.float else v for k, v in inputs.items()}
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
res = model.generate(**inputs, **generate_kwargs)
sentence = tokenizer.decode(res.tolist()[0], skip_special_tokens=True)
print(sentence)
However, the results from Huggingface demo are different from the locally runned model.
For example, Huggingface will describe an image as:
The painting depicts a woman with her arms outstretched and wearing a red dress, standing in front of a brightly colored background with a vibrant rainbow-like design. The woman's pose appears confident and dynamic, as if she is ready to embrace the colorful surroundings.
There are several other objects in the scene, including a potted plant located on the left side of the painting, a handbag situated near the bottom right corner, and a cup placed towards the right side. Additionally, there is a bowl on a stand near her right foot and another bow on her left arm, adding to the artwork' s vivid appearance.
But when I run the model on colab, for the same image, I obtain the following description:
The image is a painting featuring a colorful dog with a purple and green background. The dog's body is in the middle of the painting, while its head appears at the left side of the picture, slightly turned to the right. Its fur is a mix of purple, green, and brown, giving it a vibrant appearance. There are a few more dogs present in the background, but their focus is not as prominent as the main subject's. The background consists of various colors, including red, blue, yellow, orange, white, and purple, creating a visually engaging and lively composition. The overall painting has a cheerful and playful mood.
The second description is wrong, as there are no dogs in the image. I noticed that many descriptions generated when running the model on colab are completely out of concept. Is there a something that I am doing wrong? Could it be because the model is loaded differently?
I also noticed that even when using the Huggingface demo, the model hallucinates and includes elements in the description that are not present in the image. For example, in the first description there are no handbags, cups or bowls. For example, given the image of a statue it will start to describe how the statue is surrounded by people that are admiring the statue when there are no people or crowds in the image whatsoever.
Is there a way to control the hallucinations?
And why are the results so different when I run the model in different environments (Huggingface vs colab)?
I apologize for the long post.
Any help is greatly appreciated.
Thank you!
Hi. Thanks for providing the code for huggingface.
I am trying to use the following code in colab, but the session crashes because it runs out of ram.
I am using colab pro with high-ram setup with 25 gb of ram and T4 gpu. But the session still crashes.
# Load via Huggingface Style
from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration
from mplug_owl.tokenization_mplug_owl import MplugOwlTokenizer
from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor
pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b'
model = MplugOwlForConditionalGeneration.from_pretrained(
pretrained_ckpt,
torch_dtype=torch.bfloat16,
)
image_processor = MplugOwlImageProcessor.from_pretrained(pretrained_ckpt)
tokenizer = MplugOwlTokenizer.from_pretrained(pretrained_ckpt)
processor = MplugOwlProcessor(image_processor, tokenizer)
In the readme it was mentioned that the offline demo can be inferenced with only a single 16GB T4 GPU with 8 bits support.
How can I do this in colab?
Thank you!
I tried to prepare environment with conda env create -f env.yaml
, but failed. The error message as below:
Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 2.10.1 Requires-Python <3; 2.11.0 Requires-Python <3; 2.11.1 Requires-Python <3; 2.4.0 Requires-Python <3; 2.4.1 Requires-Python <3; 2.4.2 Requires-Python <3; 2.4.3 Requires-Python <3; 2.4.4 Requires-Python <3; 2.5.0 Requires-Python <3; 2.5.1 Requires-Python <3; 2.5.2 Requires-Python <3; 2.6.0 Requires-Python <3; 2.6.1 Requires-Python <3; 2.6.2 Requires-Python <3; 2.7.0 Requires-Python <3; 2.7.2 Requires-Python <3; 2.8.0 Requires-Python <3; 2.8.1 Requires-Python <3; 2.8.2 Requires-Python <3; 2.8.3 Requires-Python <3; 2.8.4 Requires-Python <3; 2.8.5 Requires-Python <3; 2.8.6 Requires-Python <3; 2.8.7 Requires-Python <3; 2.9.2 Requires-Python <3
ERROR: Could not find a version that satisfies the requirement apex==0.1 (from versions: 0.9.8dev.linux-i686, 0.9.8.dev0, 0.9.8a0.dev0, 0.9.9.dev0, 0.9.10.dev0)
ERROR: No matching distribution found for apex==0.1
failed
CondaEnvException: Pip failed
I can input video in the Hugging face demo, but I can't find any relevant video data processing in the code. are you only sampling 4 frames of video in the front end and inputting them into the model as images?This is very important to me, please let me know, thanks!
hi to the team, thanks for your hard work and mPLUG-Owl demostrated great performance!
during reading the paper, i found that in "4.1 Experimental Setup/Data and Training Details.", it said "we gather pure text instruction data from three distinct sources: 102k data from the Alpaca [Taori et al., 2023], 90k from the Vicuna [Vicuna, 2023], and 50k from the Baize [Xu et al., 2023a]."
however, to my knowledge
would you mind sharing more information about the datasets? thanks a lot!
请问 mPLUG-Owl要多少显存,和minigpt4的13b模型比起来效果如何
env.yaml depends on apex, and apex depends on the torch in env.yaml, it is not easy to install dependencies
Hello, how can I do instruct tuning on multi-GPU on the single machine? I have four 3090 GPUs, can I instruct tuning owl on my machine?
After preparing all environments follwing the instructions, trying:
from interface import get_model model, tokenizer, img_processor = get_model( checkpoint_path='checkpoint path', tokenizer_path='tokenizer path')
facing:
Traceback (most recent call last): File "/mPLUG-Owl/try.py", line 3, in <module> model, tokenizer, img_processor = get_model( File "/mPLUG-Owl/interface.py", line 15, in get_model model = mPLUG_OwlForConditionalGeneration(config=config) File "/mPLUG-Owl/mplug_owl/modeling_mplug_owl.py", line 973, in __init__ self.vision_model = CLIPVisionTransformer(config.vision_config) File "/mPLUG-Owl/clip/modeling_clip.py", line 972, in __init__ self.pre_layernorm = MixedFusedLayerNorm(embed_dim, eps=config.layer_norm_eps) File "/anaconda3/envs/mplug_owl/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 212, in __init__ super().__init__(normalized_shape=normalized_shape, eps=eps, elementwise_affine=True) File "/anaconda3/envs/mplug_owl/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 166, in __init__ fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda") File "/anaconda3/envs/mplug_owl/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 674, in _load_unlocked File "<frozen importlib._bootstrap>", line 571, in module_from_spec File "<frozen importlib._bootstrap_external>", line 1176, in create_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed ImportError: /anaconda3/envs/mplug_owl/lib/python3.10/site-packages/fused_layer_norm_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl
How to fix it? Thanks for your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.