llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this about privategpt HOT 23 CLOSED

rexzhang2023 commented on August 17, 2024

llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this

from privategpt.

Comments (23)

dennydream commented on August 17, 2024 4

I got the same messages.

from privategpt.

TechVentureBuilder commented on August 17, 2024 3

(venv) my@laptop:~/privateGPT$ python privateGPT.py
llama.cpp: loading model from ./models/ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113748.20 KB
llama_model_load_internal: mem required = 5809.33 MB (+ 2052.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size = 512.00 MB
AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Using embedded DuckDB with persistence: data will be stored in: db
Illegal instruction (core dumped)

from privategpt.

manojkr19 commented on August 17, 2024 1

I get the same error and the embedding process takes a while for the state of the union doc. My machines heats up and fan blows at full speed. I've a M2 MAX with 38 core and 64GB of memory. Inference also takes a long time to and it crashes after 2-3 questions.

from privategpt.

toninog commented on August 17, 2024 1

some of the errors here relate to memory - as in the host system does not have enough memory. When you see "Killed" it looks like the kernel killed the process (python) due to lack of memory

Run a dmesg or check /var/log/messages for more information

from privategpt.

imartinez commented on August 17, 2024 1

We are not using llama.cpp as the embeddings model anymore.
Plus, ingest got a LOT faster with the use of the new embeddings model #224

Note: this is a breaking change, any existing database will stop working with the new changes. You'll need to re-ingest your docs. It is recommended as the process is faster and the results are better.

from privategpt.

toninog commented on August 17, 2024

Is this the full output? If not - please post the full output

from privategpt.

rexzhang2023 commented on August 17, 2024

(privategpt) root@alienware17B:/home/rex/privateGPT# python privateGPT.py
llama.cpp: loading model from ./models/ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 2052.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size = 512.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Using embedded DuckDB with persistence: data will be stored in: db
gptj_model_load: loading model from './models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
gptj_model_load: ggml ctx size = 4505.45 MB
gptj_model_load: memory_size = 896.00 MB, n_mem = 57344
gptj_model_load: ................................... done
gptj_model_load: model size = 3609.38 MB / num tensors = 285
Enter a query:

from privategpt.

timonweb commented on August 17, 2024

I got the same errors

from privategpt.

alxspiker commented on August 17, 2024

(privategpt) root@alienware17B:/home/rex/privateGPT# python privateGPT.py llama.cpp: loading model from ./models/ggml-model-q4_0.bin llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 4113739.11 KB llama_model_load_internal: mem required = 5809.32 MB (+ 2052.00 MB per state) ................................................................................................... . llama_init_from_file: kv self size = 512.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | Using embedded DuckDB with persistence: data will be stored in: db gptj_model_load: loading model from './models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ... gptj_model_load: n_vocab = 50400 gptj_model_load: n_ctx = 2048 gptj_model_load: n_embd = 4096 gptj_model_load: n_head = 16 gptj_model_load: n_layer = 28 gptj_model_load: n_rot = 64 gptj_model_load: f16 = 2 gptj_model_load: ggml ctx size = 4505.45 MB gptj_model_load: memory_size = 896.00 MB, n_mem = 57344 gptj_model_load: ................................... done gptj_model_load: model size = 3609.38 MB / num tensors = 285 Enter a query:

That seems to be working, Look at "Enter a query:". I always get the message about mmap because I use old llama7b and 13b models. I just ignore it.

from privategpt.

timonweb commented on August 17, 2024

@alxspiker I'm getting the error at the ingestion step:

llama.cpp: loading model from ./models/ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113748.20 KB
llama_model_load_internal: mem required  = 5809.33 MB (+ 2052.00 MB per state)
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)```

from privategpt.

d2rgaming-9000 commented on August 17, 2024

I get this error too.

from privategpt.

Puttappaiahm commented on August 17, 2024

I get the same error. Any quick help to sort out this error is greatly appreciated.

from privategpt.

Bioleme commented on August 17, 2024

llama.cpp: loading model from models/ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
I get the same error. Any quick help to sort out this error is greatly appreciated.

from privategpt.

wouterstultiens commented on August 17, 2024

Same error...

from privategpt.

citron commented on August 17, 2024

same here

from privategpt.

jasonogrady commented on August 17, 2024

This is the error I'm getting:

main: build = 553 (63d2046)
main: seed = 1684182263
llama.cpp: loading model from ggml-alpaca-7b-native-q4.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = ggmf v1 (old version with no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
error loading model: this format is no longer supported (see ggerganov/llama.cpp#1305)
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'ggml-alpaca-7b-native-q4.bin'
main: error: unable to load model

from privategpt.

ByteEvangelist commented on August 17, 2024

main: build = 553 (63d2046)
main: seed = 1684197811
llama.cpp: loading model from ./models/ggml-model-q4_1.bin
llama_model_load_internal: format = ggjt v1 (pre #1405)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 6656
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 52
llama_model_load_internal: n_layer = 60
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 17920
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 30B
error loading model: this format is no longer supported (see ggerganov/llama.cpp#1305)
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/ggml-model-q4_1.bin'
main: error: unable to load model

from privategpt.

CraftCanna commented on August 17, 2024

Same issue
Loading documents from source_documents
Loaded 28 documents from source_documents
Split into 2405 chunks of text (max. 500 tokens each)
llama.cpp: loading model from models/ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1000
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113748.20 KB
llama_model_load_internal: mem required = 5809.33 MB (+ 2052.00 MB per state)

from privategpt.

zzzgit commented on August 17, 2024

what's the cause?

from privategpt.

zzzgit commented on August 17, 2024

ecs-user@mimiako:~/privateGPT$ python3 ./privateGPT.py 
llama.cpp: loading model from /home/ecs-user/privateGPT/downloadedFiles/ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1000
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113748.20 KB
llama_model_load_internal: mem required  = 5809.33 MB (+ 2052.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size  = 1000.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
Using embedded DuckDB with persistence: data will be stored in: /home/ecs-user/privateGPT/vector/
gptj_model_load: loading model from '/home/ecs-user/privateGPT/downloadedFiles/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 4505.45 MB
gptj_model_load: memory_size =   896.00 MB, n_mem = 57344
gptj_model_load: ...........................Killed

from privategpt.

dennis-gonzales commented on August 17, 2024

same error

from privategpt.

kkski commented on August 17, 2024

I redownloaded the model and embeddings file, and it goes through after that.

from privategpt.

JA-Bonilla commented on August 17, 2024

@kkski What do you mean by "goes through" after you redownloaded the models?

from privategpt.

llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this about privategpt HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent