blinkdl / chatrwkv Goto Github PK
View Code? Open in Web Editor NEWChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
License: Apache License 2.0
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
License: Apache License 2.0
Is this tested to work on 32 gb ram RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth?
Also converting model of 26gb using the convert_model.py made the size ~51gb. How is this effective?
Hi. This isn't an issue, but I didn't know where else to put this, haha.
I've been watching the progress of ChatRWKV (which is awesome; thank you so much for developing this), and I'm a user of oobabooga's Web UI (so I'm aware of the thread on RWKV support). I like to tinker with text generation models that can be used for chatbot tasks and on CPUs, with little system memory.
I heard about int8 quantization, and not having a good enough GPU (but plenty of RAM) on my main PC, I gave it a try via cpu fp32i8
. To my surprise, it works! I still needed swap space to load the model, but after that I was able to run 7B under 8.8 GiB of RAM (with spikes to around 10.5 GiB while generating), and it loaded and generated faster than plain bf16, to the best of my recollection. It was a few days back, but these were the results I remember writing down:
# MODEL MEMORY USAGE
169M 1.0 GiB (fp32) / 743.0 MiB (bf16) / 856.7 MiB (fp32i8) / 877.9 MiB (bf16i8)
430M 2.1 GiB (fp32) / 1.3 GiB (bf16) / 1.2 GiB (fp32i8) / 2.4 GiB (bf16i8)
1.5B 6.3 GiB (fp32) / 3.4 GiB (bf16) / 5.7 GiB (fp32i8) / 4.5 GiB (bf16i8)
3B ??.? GiB (fp32) / 6.1 GiB (bf16) / 4.3 GiB (fp32i8) / ~9.1 GiB (bf16i8)
7B ??.? GiB (fp32) / ??.? GiB (bf16) / 8.8 GiB (fp32i8) / ??.? GiB (bf16i8)
I do notice that it fluctuates a bit (I tried 1.5B just now, and after it was done loading it ended up idling at around 2.5 GiB the first time, 5.8 GiB the second time, and back to 2.4 GiB the third time); I'm not sure why.
But yeah, I don't see any mentions of cpu fp32i8
anywhere, not even in any Discord servers, only mentions of cuda fp16i8
, so I was wondering if this was intended or if it's just a nice side effect?
想用自己的中文数据训练要怎么做呢?数据是怎样的?需要什么样的机器配置?
Thanks for this great code and models.
I've been testing long form test generation from prompts with:
RWKV-4-Pile-14B-20230228-ctx4096-test663.pth
RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth
I've made the necessary simple edits to chat.py, API_DEMO.py, and your gradio app.py to generate 4000+ tokens with both models using the same settings and found the 4096 model generates relatively coherent output for the entire 4000+ tokens. No small task.
Unfortunately, the 8192 model starts well then severely breaks down at around 2048 max tokens until by the end of 4000-8000 tokens it's simply repeating words. e.g. "... The Prince and the Pauper , The Prince and the Pauper , _The Prince and"
I understand most users are interested in chat-style generation and will never be interested in long form replies but I'm still wondering if there are there are special settings to improve model 8192 output relative to 4096?
Also, I saw a post where a 50b model is in the works? Fantastic if that happens.
That would be a great time to have a version of the API_DEMO.py that runs on multiple gpus.
Cheers,
There are 2 gpus on my PC but the model can't be loaded on both of them, thus it can't run 14B model. Here are some information.
Traceback (most recent call last):
File "chat.py", line 199, in <module>
model = RWKV_RNN(args)
File "/home/*/ChatRWKV/lib/python3.8/site-packages/torch/jit/_script.py", line 293, in init_then_script
original_init(self, *args, **kwargs)
File "/home/*/ChatRWKV/src/model_run.py", line 76, in __init__
w[x] = w[x].cuda()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 23.69 GiB total capacity; 19.19 GiB already allocated; 39.12 MiB free; 19.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
So, is it possible to use both of my gpus? Thanks!
[...]/rwkv/cuda/operators.cu(123): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (__half *, __half)
atomicAdd(&y[k], __float2half(y_local));
^
This is likely because my GPU (a 1060) only supports compute 6.1 while atomicAdd
support for __half
requires compute 7.0 per https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd
It seems like
#ifdef __CUDA_ARCH__ <= 600
/* magic stuff here */
#endif
would be needed to support lower compute versions. I don't know enough about this to contribute anything more helpful, unfortunately.
Encountering an issue with setting RWKV_CUDA_ON
to '1' when using multi-gpu strategy.
All the GPUs are the same 3060ti 8Gb with cuda 11.7 installed.
(base) [rig-nenkoru@localhost Raven-RWKV-7B]$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
torch
ninja
tokenizers
rwkv==0.6.2
pynvml
huggingface_hub
gradio>=3.17.1
(llama) [rig-nenkoru@localhost rwkv]$ pip show torch
Name: torch
Version: 2.0.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
'cuda:0 fp16 -> cuda:1 fp16 -> cuda:2 fp16'
Traceback (most recent call last):
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/gradio/routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/gradio/blocks.py", line 1108, in process_api
result = await self.call_function(
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/gradio/blocks.py", line 929, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/gradio/utils.py", line 490, in async_iteration
return next(iterator)
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/gradio/interface.py", line 621, in fn
for output in self.fn(*args):
File "/home/rig-nenkoru/Raven-RWKV-7B/./app.py", line 66, in evaluate
out, state = model.forward(pipeline.encode(ctx)[-ctx_limit:] if i == 0 else [token], state)
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/rwkv/model.py", line 573, in forward
x, state[i*5+0], state[i*5+1], state[i*5+2], state[i*5+3] = ATT(
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/home/rig-nenkoru/miniconda3/envs/llama/lib/python3.10/site-packages/rwkv/model.py", line 485, in cuda_att_seq
y, aa, bb, pp = cuda_wkv(T, C, t_decay, t_first, k, v, aa, bb, pp)
out = (r * y) @ ow
~~~~~ <--- HERE
return x + out, xx[-1,:], aa, bb, pp
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Install VS2022 build tools (https://aka.ms/vs/17/release/vs_BuildTools.exe select Desktop C++).
Reinstall CUDA 11.7 (install VC++ extensions).
-- What is the purpose of re-installing if CUDA 11.7 is already installed?
I have CUDA 11.7 and vs2022 installed,
when I try to run
from rwkv.model import RWKV
from rwkv.utils import PIPELINE, PIPELINE_ARGS
I get
Using C:\Users\Jason\AppData\Local\torch_extensions\torch_extensions\Cache\py310_cu116 as PyTorch extensions root...
C:\Users\Jason\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
Detected CUDA files, patching ldflags
Emitting ninja build file C:\Users\Jason\AppData\Local\torch_extensions\torch_extensions\Cache\py310_cu116\wkv_cuda\build.ninja...
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
Cell In [3], line 1
----> 1 from rwkv.model import RWKV
2 from rwkv.utils import PIPELINE, PIPELINE_ARGS
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\rwkv\model.py:29
27 if os.environ.get('RWKV_CUDA_ON') == '1':
28 from torch.utils.cpp_extension import load
---> 29 load(
30 name=f"wkv_cuda",
31 sources=[f"{current_path}/cuda/wrapper.cpp", f"{current_path}/cuda/operators.cu"],
32 verbose=True,
33 extra_cuda_cflags=["-t 4", "-std=c++17", "--use_fast_math", "-O3", "--extra-device-vectorization"],
34 is_python_module=False)
36 @MyStatic
37 def cuda_wkv(T: int, C: int, w, u, k, v, aa, bb, pp):
38 assert 1 * C % min(C, 32) == 0
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py:1284, in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
1192 def load(name,
1193 sources: Union[str, List[str]],
1194 extra_cflags=None,
(...)
1202 is_standalone=False,
1203 keep_intermediates=True):
1204 r'''
1205 Loads a PyTorch C++ extension just-in-time (JIT).
1206
(...)
1282 ... verbose=True)
1283 '''
-> 1284 return _jit_compile(
1285 name,
1286 [sources] if isinstance(sources, str) else sources,
1287 extra_cflags,
1288 extra_cuda_cflags,
1289 extra_ldflags,
1290 extra_include_paths,
1291 build_directory or _get_build_directory(name, verbose),
1292 verbose,
1293 with_cuda,
1294 is_python_module,
1295 is_standalone,
1296 keep_intermediates=keep_intermediates)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py:1508, in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
1504 hipified_sources.add(hipify_result[s_abs]["hipified_path"] if s_abs in hipify_result else s_abs)
1506 sources = list(hipified_sources)
-> 1508 _write_ninja_file_and_build_library(
1509 name=name,
1510 sources=sources,
1511 extra_cflags=extra_cflags or [],
1512 extra_cuda_cflags=extra_cuda_cflags or [],
1513 extra_ldflags=extra_ldflags or [],
1514 extra_include_paths=extra_include_paths or [],
1515 build_directory=build_directory,
1516 verbose=verbose,
1517 with_cuda=with_cuda,
1518 is_standalone=is_standalone)
1519 finally:
1520 baton.release()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py:1610, in _write_ninja_file_and_build_library(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_standalone)
1607 print(f'Emitting ninja build file {build_file_path}...', file=sys.stderr)
1608 # NOTE: Emitting a new ninja build file does not cause re-compilation if
1609 # the sources did not change, so it's ok to re-emit (and it's fast).
-> 1610 _write_ninja_file_to_build_library(
1611 path=build_file_path,
1612 name=name,
1613 sources=sources,
1614 extra_cflags=extra_cflags or [],
1615 extra_cuda_cflags=extra_cuda_cflags or [],
1616 extra_ldflags=extra_ldflags or [],
1617 extra_include_paths=extra_include_paths or [],
1618 with_cuda=with_cuda,
1619 is_standalone=is_standalone)
1621 if verbose:
1622 print(f'Building extension module {name}...', file=sys.stderr)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py:2055, in _write_ninja_file_to_build_library(path, name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, with_cuda, is_standalone)
2052 ext = EXEC_EXT if is_standalone else LIB_EXT
2053 library_target = f'{name}{ext}'
-> 2055 _write_ninja_file(
2056 path=path,
2057 cflags=cflags,
2058 post_cflags=None,
2059 cuda_cflags=cuda_flags,
2060 cuda_post_cflags=None,
2061 cuda_dlink_post_cflags=None,
2062 sources=sources,
2063 objects=objects,
2064 ldflags=ldflags,
2065 library_target=library_target,
2066 with_cuda=with_cuda)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py:2195, in _write_ninja_file(path, cflags, post_cflags, cuda_cflags, cuda_post_cflags, cuda_dlink_post_cflags, sources, objects, ldflags, library_target, with_cuda)
2193 link_rule = ['rule link']
2194 if IS_WINDOWS:
-> 2195 cl_paths = subprocess.check_output(['where',
2196 'cl']).decode(*SUBPROCESS_DECODE_ARGS).split('\r\n')
2197 if len(cl_paths) >= 1:
2198 cl_path = os.path.dirname(cl_paths[0]).replace(':', '$:')
File ~\AppData\Local\Programs\Python\Python310\lib\subprocess.py:421, in check_output(timeout, *popenargs, **kwargs)
418 empty = b''
419 kwargs['input'] = empty
--> 421 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
422 **kwargs).stdout
File ~\AppData\Local\Programs\Python\Python310\lib\subprocess.py:526, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
524 retcode = process.poll()
525 if check and retcode:
--> 526 raise CalledProcessError(retcode, process.args,
527 output=stdout, stderr=stderr)
528 return CompletedProcess(process.args, retcode, stdout, stderr)
CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.
Hi! Thank you for your wonderful work!
I want to train a QA bot on my own data, and I guess I shuold train a RWKV-LM v4 model and load this model in chat.py(by change the model path).
Am I doing right? Do you have any suggestions?
Thank you very much
Stuck on "Run prompt..." printed message after model init. What could be a problem?
Using cpu fp32 settings
TY
When you updated the package rwkv 0.7.3 i started to get the following error that I m not able to fix when running the chat.py of V2
^C
CondaError: KeyboardInterrupt
(pytorch_p39) ubuntu@ip-172-31-26-101:/chat/v2$ conda env torch/chat/v2$ conda env lsit
usage: conda-env [-h] {create,export,list,remove,update,config} ...
conda-env: error: argument {create,export,list,remove,update,config}: invalid choice: 'torch' (choose from 'create', 'export', 'list', 'remove', 'update', 'config')
(pytorch_p39) ubuntu@ip-172-31-26-101:
usage: conda-env [-h] {create,export,list,remove,update,config} ...
conda-env: error: argument {create,export,list,remove,update,config}: invalid choice: 'lsit' (choose from 'create', 'export', 'list', 'remove', 'update', 'config')
(pytorch_p39) ubuntu@ip-172-31-26-101:/chat/v2$ conda activate aws_neuron_pytorch_p37/chat/v2$ python chat.py
(aws_neuron_pytorch_p37) ubuntu@ip-172-31-26-101:
ChatRWKV v2 https://github.com/BlinkDL/ChatRWKV
English - cuda fp16i8 -> cpu fp32 *10 - /home/ubuntu/chat/v2/prompt/default/English-2.py
Loading model - trained-500-141-1024-RWKV-6-512-2023-04-08-13-57-32
RWKV_JIT_ON 1 RWKV_CUDA_ON 0 RESCALE_LAYER 6
Loading trained-500-141-1024-RWKV-6-512-2023-04-08-13-57-32.pth ...
Strategy: (total 6+1=7 layers)
我使用的显卡是GTX1080Ti 11G,在小模型上可以正常运行。在中模型上,可以成功执行完Run prompt,然后在输入问题后就会报这个错误:
Traceback (most recent call last):
File "F:\Projects\ChatRWKV\chat.py", line 397, in
on_message(msg)
File "F:\Projects\ChatRWKV\chat.py", line 357, in on_message
token = tokenizer.sample_logits(
File "F:\Projects\ChatRWKV\src\utils.py", line 85, in sample_logits
out = torch.multinomial(probs, num_samples=1)[0]
RuntimeError: probability tensor contains either inf
, nan
or element < 0
小模型使用的是RWKV-4-Pile-1B5-Instruct-test1-20230124.pth
中模型使用的是RWKV-4-Pile-3B-Instruct-test1-20230124.pth
section I:
args.RUN_DEVICE = 'cpu' # line 7
args.MODEL_NAME = '$PATH/RWKV-4b-Pile-171M-20230202-7922' # line 17
== >
python chat.py
...
Run prompt...
Traceback (most recent call last):
File "chat.py", line 216, in <module>
out = run_rnn(tokenizer.tokenizer.encode(init_prompt))
File "chat.py", line 184, in run_rnn
current_state = model.forward(model_tokens, current_state, preprocess_only = True)
File "$PATH/ChatRWKV/src/model_run.py", line 191, in forward
state = torch.zeros(args.n_layer * 5, args.n_embd, device=self.RUN_DEVICE)
AttributeError: 'types.SimpleNamespace' object has no attribute 'n_layer'
section II:
if '-1B5-' in args.MODEL_NAME or '/1.5-' in args.MODEL_NAME:
args.n_layer = 24
args.n_embd = 2048
elif '-3B-' in args.MODEL_NAME or '/3-' in args.MODEL_NAME:
args.n_layer = 32
args.n_embd = 2560
elif '-7B-' in args.MODEL_NAME or '/7-' in args.MODEL_NAME:
args.n_layer = 32
args.n_embd = 4096
elif '-14B-' in args.MODEL_NAME or '/14-' in args.MODEL_NAME:
args.n_layer = 40
args.n_embd = 5120
else: # line 41
args.n_layer = 24
args.n_embd = 768
==>
Run prompt...
Traceback (most recent call last):
File "chat.py", line 219, in <module>
out = run_rnn(tokenizer.tokenizer.encode(init_prompt))
File "chat.py", line 187, in run_rnn
current_state = model.forward(model_tokens, current_state, preprocess_only = True)
File "$PATH/ChatRWKV/src/model_run.py", line 197, in forward
x = self.LN(x, w.blocks[i].ln0)
File "$PATH/ChatRWKV/src/model_run.py", line 103, in LN
return F.layer_norm(x, (self.args.n_embd,), weight=w.weight, bias=w.bias)
File "/usr/local/Caskroom/miniconda/base/envs/chatbot/lib/python3.8/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
section III
args.FLOAT_MODE = 'fp32' # line 8
finally, it works
compile wkv_cuda failed when running "python chat.py":
nvcc fatal : Value 'c++17' is not defined for option 'std'
My torch and cuda is 1.13.1+cuda117
Any idea why this happens.
Can you simplify chat.py so it is easier to follow and be used as an inference where history can be sent? Right, it seems pretty obfuscated where it using some hacks to limit output?
I'm not exactly sure the best way to implement this, however it would be good to use "timeit" python module in the todo loop at the end of benchmark.py somehow to show how long each 100 model.forward loop takes.
Clarify that so it is easier to follow.
Was getting this error
FileNotFoundError: [Errno 2] No such file or directory: '/fsx/BlinkDL/HF-MODEL/rwkv-4-pile-14b/RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth'
title = "RWKV-4-Pile-14B-20230313-ctx8192-test1050"
model_path = hf_hub_download(repo_id="BlinkDL/rwkv-4-pile-14b", filename=f"{title}.pth")
model = RWKV(model=model_path, strategy='cpu')
Hi,
Why is time_shift
not applied in ChatRWKV on x
before computing x * self.time_mix_k + xx * (1 - self.time_mix_k)
while in RWKV V4, it is the case. Any idea ?
/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py in _lazy_init()
245 if 'CUDA_MODULE_LOADING' not in os.environ:
246 os.environ['CUDA_MODULE_LOADING'] = 'LAZY'
--> 247 torch._C._cuda_init()
248 # Some of the queued calls may reentrantly call _lazy_init();
249 # we need to just return without initializing in that case.
RuntimeError: No CUDA GPUs are available
Just wondering what is token window/max token length? And your thoughts on increasing it?
ps: this repo is amazing. I wish I would have known about it sooner. you guys are awesome and I'll try to contrib at some point if I can.
I succeed to run 7B model. But when I tried to run 14B model on my 4080 GPU by setting "args.strategy = 'cuda fp16i8 *21 -> cuda fp16 *20'"and "os.environ["RWKV_CUDA_ON"] = '0'", it reports an error.
During the process the program consume all my 32GB CPU memory, the log is as followed.
ChatRWKV v2 https://github.com/BlinkDL/ChatRWKV
Chinese - cuda fp16i8 *21 -> cuda fp16 *20 - J:\ChatRWKV\v2/prompt/default/Chinese-2.py
Loading model - J:/ChatRWKV/RWKV-4-Pile-14B-20230313-ctx8192-test1050
RWKV_JIT_ON 1 RWKV_CUDA_ON 0 RESCALE_LAYER 6
Loading J:/ChatRWKV/RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth ...
Strategy: (total 40+1=41 layers)
Loading model :
RWKV-4-Pile-7B-Instruct-test1-20230124
Error:
Traceback (most recent call last):
File "chat.py", line 175, in
model = RWKV_RNN(args)
File "/opt/conda/lib/python3.8/site-packages/torch/jit/_script.py", line 272, in init_then_script
original_init(self, *args, **kwargs)
File "/root/work/ChatRWKV/src/model_run.py", line 103, in init
x = self.LN(self.w.emb.weight, self.w.blocks[0].ln0)
File "/root/work/ChatRWKV/src/model_run.py", line 111, in LN
return F.layer_norm(x, (self.args.n_embd,), weight=w.weight, bias=w.bias)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2346, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'BFloat16'
I have expressed my interest in having RWKV officially implemented in Hugging Face in huggingface/transformers#17230.
Meanwhile, I have a distilled set of suggestions for how this library could be made more familiar to people who are already used to transformers
and AutoModelForCausalLM
.
Maybe some of these are already possible in the current version of rwkv
. If so, I would be grateful if you could let me know how.
with something like
tokenizer = RWKVTokenizer.from_pretrained("/path/to/20B_tokenizer.json")
and then use it with
prompt = "Hello, my name is "
input_ids = tokenizer.encode(prompt)
Having the ability to count the number of tokens in a given prompt is very useful.
Something like
output_ids = model.generate(input_ids, temperature=0.8, top_p=0.95)
output_text = tokenizer.decode(output_ids)
Many parameters are available for model.generate()
in HF, but it seems to me that the absolutely essential ones that everyone uses are:
I am aware that alpha_frequency
and alpha_presence
are implemented, but these parameters are not usually found in presets that people have already come up with while working with other models. For this reason, having repetition_penalty
would be valuable.
Hello @BlinkDL,
As per your recommendation, I was able to run this on MPS at half precision. It gets stuck on MPS at full precision.
On 64 GB M1 Max CPU, 14B model gives pretty good results, but it is pretty slow. When I made a few changes to get it working on MPS, it's very fast. But the results are the worst. For example, on MPS F16, it generates,
User: +gen Here is a short story in which Jeff Bezos, Elon Musk, and Bill Gates fight in a tournament:
ChDefPSt
QTheThisSCheckReviewQBackgroundInformationThisPaulQSp1QThe1------[How{The#QTheHSamAfterQMfileCharacterGQDemEQAfterThe//QWeQ"ThisIntroductionCorListQQQ(EAtBackgroundEnAn[BrWithDirectGWomenNQTheOh1Last#OnQGlQQWilliamWhy9 QTheQAfterMelCheckQTQ/*BlYouFieldAQThe/*PrThe
On the other hand, with same code and on CPU F32, it generates,
User: +gen Here is a short story in which Jeff Bezos, Elon Musk, and Bill Gates fight in a tournament:
There are four kings in the chess world, and every four years a World Championship takes place. In 2018, Bezos defeated Musk in the semi-finals, while Gates took down both of them. So now it’s Bezos and Musk who are playing each other.
What am I missing ?
The most significant change was in sample_logits function, where I used the same probability algorithm for both MPS and CPU. Rest of the changes included only changing the device from CUDA to MPS.
Tried the v2 chat.py with the 14B model but still got OOM for the 24GB graphics memory.
Torch version is 13.0. Why?
Hoi,
First of thank you for making this all open source (including the models). Had a lot of fun with it. Creating myself a virtual 9 year old adventurer girl that goes save a town from evil wizards together with her uncle. Sure it is not perfect and needs lot of nudges to keep it flowing, but the responses you can get are sometimes hilarious. One time I introduced a dragon into the story. Initial the girl was afraid, but after sharing part of her chocolate bar, they became best friends and inseparable ;-)
Also she wanted her uncle to marry the local town witch so she had a real family (somewhat curious what do you feed this model as training data? ;-)
What I do notice though is after about 20 or so entries and responses the model starts to produce repeated sentences. Still correct from a context point of view, but they get more and more frequent. And that often spells the end of that story, because once the engine is in that "state" it start to produce nonsense or only single words repeated forever. Which is a shame for a RNN that, theoretically at least, has infinite context length. Not sure if it is numerical stability or the latent vector goes into some weird state, but after that reset is the only way to get it back. Then it functions correctly again for 20 moves (but of course losing the progress so far). Hinting the problem is with the stored state in the RNN. I tried the model with 4096 context length, but doesn't seem to help much. Not sure if it needs a softmax layer somewhere or if it is an inherent limitation of this RNN. But it could be so much more fun if this didn't happen.
Also I noticed the model has more as just english. After typing +gen Het laatste niews van vandaag (translated from Dutch the latest news from today) it generates complete Dutch paragraphs. Didn't expect that in an English model. Though the quality of the generated text is quite low, what I find curious is that it is completely in dutch and not english/dutch mix.
Not sure if that uses up needed weight capacity in the model, or helps with generalizing across world knowledge. Not much experience with these language models yet. But found it interesting.
Context is everything though. In my created story mode it doesn't happen:
Pjere: Lees het laaste nieuws van vandaag
Alice: I read the newspaper today. It said about the witch who is hurting people, is going to kill a boy in the next few days and wants to save him, it says the punishment is a light prison sentence, but it has to go before a council, what does that mean?
Pjere: +gen Lees het laatste nieuws van vandaag
We respecteren het privacybeleid van het aanklachten site
Vorige maand besloot de Belgische Staat bij vervolging weer van de klager.
Onlangs, op 23 februari 2019, legde het Belgische Instituut voor Onderzoeksjournalistiek (iJOU) een proces-verbaal voor omdat de site voor het onderzoek weigerde met alle relevante informatie te komen over de aanklachten door de staat tegen Roos Van der Hoek. De uitzondering was de nationale rechtsbescherming, de scheidende schadeaanklager die het aanklachten site beschikt over de gegevens. Die had het volste recht te vragen om alle data weg te halen van het iJOU-site en het be
Hey, this is great work. I have been trying to set this up from repo for windows and the biggest problem so far is this error in the cuda load step (ChatRWKV/rwkv_pip_package/src/rwkv/model.py:29):
"CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1."
I'm not too familiar with low-level cuda programs so having a tough time debugging. Maybe you have some idea? Thanks!
When initiating the chatbot following error occurs:
Traceback (most recent call last):
File "C:\Users\yongy\chatrwkv\v2\chat.py", line 125, in
pipeline = PIPELINE(model, f"{current_path}/20B_tokenizer.json")
File "c:\users\yongy\appdata\local\programs\python\python37\lib\site-packages\rwkv\utils.py", line 28, in init
from tokenizers import Tokenizer
File "c:\users\yongy\appdata\local\programs\python\python37\lib\site-packages\tokenizers_init_.py", line 80, in
from .tokenizers import (
ModuleNotFoundError: No module named 'tokenizers.tokenizers'
The RWKV package includes a
'torch ~= 1.13.1',
dependency. In cases where pytorch is installed using conda (a very common case) or some other means, this causes a conflict.
Most machine learning libraries do not include pytorch as a requirement.
Is it possible to remove this dependency?
请问“RWKV”这四个字母具体是什么含义?
What exactly do the four letters "RWKV" mean?
How will the quality suffer if model is quantized. Will it be able to run on simple CPU and RAM without GPU and VRAM?
In v2, chat.py, when trying to lower the temperature under its 1.0 default, for example GEN_TEMP = 0.9
I'm faced with
File "chat.py", line 393, in
on_message(msg)
File "chat.py", line 300, in on_message
token = pipeline.sample_logits(
File "(...)\utils.py", line 51, in sample_logits
probs = probs.pow(1.0 / temperature)
AttributeError: 'numpy.ndarray' object has no attribute 'pow'
Just that you know.
如题,目前github的license的主要作用于代码。
因此,想请教一下模型的权重是否也属于Apache-2.0 的license?
Great work!
I want to make a api server of these, any way to change output behaivor from output char one by one to output all char when generate done? thanks.
Switched over to Linux, installed ninja, and have a compile issue perhaps.
Suggestions?
python chat.py
ChatRWKV v2 https://github.com/BlinkDL/ChatRWKV
English - cuda fp16 - /media/main/C/Users/Jason/Documents/machine_learning/language_ML/ChatRWKV/v2/prompt/default/English-2.py
Using /home/main/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/main/.cache/torch_extensions/py39_cu117/wkv_cuda/build.ninja...
Building extension module wkv_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ wrapper.o operators.cuda.o -shared -L/home/main/miniconda3/envs/gptj/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/main/miniconda3/envs/gptj/lib64 -lcudart -o wkv_cuda.so
FAILED: wkv_cuda.so
c++ wrapper.o operators.cuda.o -shared -L/home/main/miniconda3/envs/gptj/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/main/miniconda3/envs/gptj/lib64 -lcudart -o wkv_cuda.so
/usr/bin/ld: cannot find -lcudart: No such file or directory
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/main/miniconda3/envs/gptj/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/home/main/miniconda3/envs/gptj/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/media/main/C/Users/Jason/Documents/machine_learning/language_ML/ChatRWKV/v2/chat.py", line 105, in <module>
from rwkv.model import RWKV
File "/media/main/C/Users/Jason/Documents/machine_learning/language_ML/ChatRWKV/v2/../rwkv_pip_package/src/rwkv/model.py", line 29, in <module>
load(
File "/home/main/miniconda3/envs/gptj/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/main/miniconda3/envs/gptj/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/main/miniconda3/envs/gptj/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/main/miniconda3/envs/gptj/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'wkv_cuda'
`Traceback (most recent call last):
File "\ChatRWKV-main\ChatRWKV-main\chat.py", line 218, in
model = RWKV_RNN(args)
^^^^^^^^^^^^^^
File "\ChatRWKV-main\ChatRWKV-main\venv\Lib\site-packages\torch\jit_script.py", line 292, in init_then_script
original_init(self, *args, **kwargs)
File \ChatRWKV-main\ChatRWKV-main\src\model_run.py", line 75, in init
w[x] = w[x].to(self.RUN_DEVICE)
^^^^^^^^^^^^^^^^^^^^^^^^
File "\ChatRWKV-main\ChatRWKV-main\venv\Lib\site-packages\torch\cuda_init_.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
进程已结束,退出代码1
`
i try to create very long chat responses but it seems limited to around 1k token.... is this the limit? or can it go above that?
if so where in ChatRWKV would i have to change the token input lenght?
Hey guys great stuff, can we have very easy setup step process to install ChatRWKV on a ubuntu server for example?
Your ChatRWKV is damely good ! Can you provide the requirements file ? If possible you can provide the requirements for all your deep learning open source projects. That will make your model easier to use and decrease the difficulties for beginners. Thank you for your work!
I download and use RWKV-4-Pile-3B-20221110-ctx4096.bin to test(python chat.py), but there are errors:
Run prompt...
Traceback (most recent call last):
File "chat.py", line 167, in
out = run_rnn(pipeline.encode(init_prompt))
File "chat.py", line 136, in run_rnn
out, model_state = model.forward(tokens[:CHUNK_LEN], model_state)
File "/data/home/clarkjiang/ChatRWKV-main/v2/../rwkv_pip_package/src/rwkv/model.py", line 616, in forward
omx, orx, omy, ory,
RuntimeError: default_program(57): error: identifier "aten_add_flat__1" is undefined
default_program(58): error: no operator "=" matches these operands
operand types are: half = float
2 errors detected in the compilation of "default_program".
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
#define __HALF_TO_US(var) *(reinterpret_cast<unsigned short *>(&(var)))
#define __HALF_TO_CUS(var) *(reinterpret_cast<const unsigned short *>(&(var)))
#if defined(__cplusplus)
struct align(2) __half {
host device __half() { }
protected:
unsigned short __x;
};
/* All intrinsic functions are only available to nvcc compilers /
#if defined(CUDACC)
/ Definitions of intrinsics */
device __half __float2half(const float f) {
__half val;
asm("{ cvt.rn.f16.f32 %0, %1;}\n" : "=h"(__HALF_TO_US(val)) : "f"(f));
return val;
}
__device__ float __half2float(const __half h) {
float val;
asm("{ cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(__HALF_TO_CUS(h)));
return val;
}
#endif /* defined(CUDACC) /
#endif / defined(__cplusplus) */
#undef __HALF_TO_US
#undef __HALF_TO_CUS
typedef __half half;
extern "C" global
void func_1(half* t0, half* t1, half* t2, half* t3, half* t4, half* t5, half* aten_add_flat, half* aten_add_flat_1, half* aten_add_flat_2, half* aten_cat_flat) {
{
aten_cat_flat[512 * blockIdx.x + threadIdx.x] = __float2half((((512 * blockIdx.x + threadIdx.x) / 2560<1 ? 1 : 0) ? __half2float(t5[(512 * blockIdx.x + threadIdx.x) % 2560]) : _half2float(t4[(512 * blockIdx.x + threadIdx.x) - 2560])));
float t1 = _half2float(t1[512 * blockIdx.x + threadIdx.x]);
float aten_add_flat = _half2float(aten_add_flat[512 * blockIdx.x + threadIdx.x]);
float t0 = __half2float(t0[(512 * blockIdx.x + threadIdx.x) % 2560]);
aten_add_flat__1 = float2half(t1 * t0 + ((((512 * blockIdx.x + threadIdx.x) / 2560<1 ? 1 : 0) ? _half2float(t5[(512 * blockIdx.x + threadIdx.x) % 2560]) : half2float(t4[(512 * blockIdx.x + threadIdx.x) - 2560]))) * ((0.f - t0) + 1.f));
aten_add_flat[512 * blockIdx.x + threadIdx.x] = aten_add_flat;
float t2 = __half2float(t2[(512 * blockIdx.x + threadIdx.x) % 2560]);
aten_add_flat_1[512 * blockIdx.x + threadIdx.x] = float2half(t1 * t2 + ((((512 * blockIdx.x + threadIdx.x) / 2560<1 ? 1 : 0) ? __half2float(t5[(512 * blockIdx.x + threadIdx.x) % 2560]) : half2float(t4[(512 * blockIdx.x + threadIdx.x) - 2560]))) * ((0.f - t2) + 1.f));
float t3 = __half2float(t3[(512 * blockIdx.x + threadIdx.x) % 2560]);
aten_add_flat_2[512 * blockIdx.x + threadIdx.x] = float2half(t1 * t3 + ((((512 * blockIdx.x + threadIdx.x) / 2560<1 ? 1 : 0) ? __half2float(t5[(512 * blockIdx.x + threadIdx.x) % 2560]) : _half2float(t4[(512 * blockIdx.x + threadIdx.x) - 2560]))) * ((0.f - t3) + 1.f));
}
}
what is wrong?
Hello, since I'm running in VRAM constraints I would like to start using v2.
I have seen in the v2/
folder a conversion script. Do I need to convert a RWKV model like before using it in v2?
so that we could install all dependencies by running pip install -r requirements.txt
Or else running chat.py
fails:
$ python v2/chat.py
Traceback (most recent call last):
File "/mnt/tera/git-repos/ChatRWKV/v2/chat.py", line 10, in <module>
from prompt_toolkit import prompt
ModuleNotFoundError: No module named 'prompt_toolkit'
since torch already support MPS backend, it would be nice if RWKV support MPS so we can run the inference on Macbook M1 / m2
i try to change the strategy but i won't work
args.strategy = 'mps fp16'
Please feel free to just close this if it's a dumb idea. I'm just a normal developer and basically don't know anything about ML, PyTorch, etc.
I had this idea that it would be possible to load models much faster and with much less memory usage by mmap
ing the data. .pth
files are actually ZIP files, but since they don't have compression turned on the actual data is contiguous in the file.
I actually got a working proof of concept going:
import io, mmap, pickle, zipfile, struct
import torch
class MmapEntriesUnpickler(pickle.Unpickler):
def __init__(self, file, rawentries):
self.rawentries = rawentries
self.storage = {}
super().__init__(file)
def persistent_load(self, pid):
entryname = pid[2]
result = self.storage.get(entryname)
if result is not None:
return result
dtname = pid[1].__name__
if pid[0] != 'storage' or (not dtname.endswith('Storage')):
raise ValueError(f'Unexpected persistent storage PID {pid}')
dtype = getattr(torch, dtname[:-7].lower(), None)
if dtype is None:
raise ValueError(f'Unable to handle persistent storage type in PID: {pid}')
result = torch.frombuffer(self.rawentries[entryname], dtype = dtype).storage()
self.storage[entryname] = result
return result
def load_mmapped(filename):
entries = {}
with open(filename, 'rb') as fp, zipfile.ZipFile(filename, 'r') as zfp:
mv = memoryview(mmap.mmap(fp.fileno(), 0, flags=mmap.MAP_PRIVATE + mmap.MAP_DENYWRITE))
for zi in zfp.infolist():
if zi.compress_type != zipfile.ZIP_STORED:
raise ValueError(f'Cannot support non-STORE file [{zi.filename}] in archive {filename}')
offs = zi.header_offset + len(zi.FileHeader())
weirdextra = struct.unpack('H', mv[offs + 2:offs + 4])[0] # No idea why this is necessary.
offs += 4 + weirdextra
data = mv[offs:offs + zi.file_size]
# print(zi.filename, offs, weirdextra)
entryname = zi.filename.rsplit('/', 1)[1]
entries[entryname] = data
return MmapEntriesUnpickler(io.BytesIO(entries['data.pkl']), entries).load()
The load_mmapped
function can just be used instead of torch.load
and it actually loads a bit faster with large models, however the rest of it is still pretty slow and memory intensive. It seems like this is because the models are saved as bfloat16
but RWKV always converts from that format so the process always ends up needing to allocate memory.
Maybe this approach is still worth it just because it speeds up the torch.load
step (basically instant when mmap
ped but takes around 10-15sec to load the 7B model from an SSD the normal way).
I think the only way it could really make a big difference is if it was possible to store the model in a way that could be used more directly without the conversion steps. (There still could be other issues like data alignment, but at the least it might be possible to load/stream data to the GPU without it ever actually having to be loaded to CPU first.)
I guess the question is: Is this even worth continuing to look at? Getting the data into the correct format to be used directly, even just for loading to the GPU is beyond my ability right now.
I convert RWKV-4-Pile-7B-20230313-ctx8192-test380.pth with strategy "cuda fp16i8".
Then I run python chat with this converted model and strategy "cuda fp16i8", got the following error:
My graphics card has 12 GB VRAM.
Run prompt...
Traceback (most recent call last):
File "/home/fc/2TB/GITS/ChatRWKV/v2/./chat.py", line 185, in
out = run_rnn(pipeline.encode(init_prompt))
File "/home/fc/2TB/GITS/ChatRWKV/v2/./chat.py", line 156, in run_rnn
out, model_state = model.forward(tokens[:CHUNK_LEN], model_state)
File "/home/fc/anaconda3/envs/rwkv/lib/python3.10/site-packages/rwkv/model.py", line 607, in forward
x, state[i5+0], state[i5+1], state[i5+2], state[i5+3] = ATT(
File "/home/fc/anaconda3/envs/rwkv/lib/python3.10/site-packages/rwkv/model.py", line 531, in cuda_att_seq_i8
r = torch.sigmoid(self.mm8_seq(rx, rw, rmx, rrx, rmy, rry))
File "/home/fc/anaconda3/envs/rwkv/lib/python3.10/site-packages/rwkv/model.py", line 324, in mm8_seq
return cuda_mm8_seq(B, N, M, x, w, mx, rx, my, ry)
File "/home/fc/anaconda3/envs/rwkv/lib/python3.10/site-packages/rwkv/model.py", line 51, in cuda_mm8_seq
assert x.shape == [B, N]
AssertionError
Any idea what I did wrong?
ChatGLM-6B and Open-Assistant are free and open-source chat-bots
We can work together to develop a free alternative faster
Your miles may vary:
1;/bin/software-properties-gtk ; echo 'turn on via checkmark all repos in the first tab in the GTK GUI for software properties'
2;sudo apt-get update
3;echo 'Download AMD linux drivers for 6900XT from their support website'
4;ll ~/Downloads/amdgpu-install_5.4.50401-1_all.deb
5;sudo chown _apt ~/Downloads/amdgpu-install_5.4.50401-1_all.deb
6;sudo apt-get install ~/Downloads/amdgpu-install_5.4.50401-1_all.deb
7;sudo chown ubuntu ~/Downloads/amdgpu-install_5.4.50401-1_all.deb
8;sudo apt-get install ~/Downloads/amdgpu-install_5.4.50401-1_all.deb
9;sudo apt-cache showpkg amdgpu-install
10;which -a amdgpu-install
11;sudo amdgpu-install --usecase=hiplibsdk,rocm,hip,dkms,hip-dev
12;sudo apt-get install perl liburi-encode-perl libfile-copy-recursive-perl libtinfo5 libncurses5
13;sudo apt-get install python3-pip
14;/bin/update-manager
15;echo 'update software in ubuntu GUI as well'
16;rocm-smi
17;echo 'the above should display GPU information'
18;export HSA_OVERRIDE_GFX_VERSION=10.3.0 ; echo 'this is important later for pytorch'
19;sudo snap refresh firefox --stable; echo 'only run if your firefox somehow breaks from the above process'
20;sudo shutdown -r now
21;echo 'restart often during this process'
22;echo 'the below 3 commands may be skipped, untested without skipping - the linux username is ubuntu but should be your username'
23;sudo usermod -a -G render ubuntu
24;sudo usermod -a -G video ubuntu
25;sudo shutdown -r now;echo 'restart often during this process'
26;mkdir /media/ubuntu/2TB_fast_nvme_Drive1/pip_cache
27;mkdir /media/ubuntu/2TB_fast_nvme_Drive1/pip_local_site-packages
28;echo 'only need the -t and --cache-dir flags with pip3 in the next command if your boot drive is not your Machine Learning drive'
29;pip3 install --user -t /media/ubuntu/2TB_fast_nvme_Drive1/pip_local_site-packages --cache-dir=/media/ubuntu/2TB_fast_nvme_drive1/pip_cache torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.2
30;echo 'the only one that matters for Natural Language Processing in this history is torch, the others may error and that's ok for this terminal history'
31;echo 'only if your boot drive is not your Machine Learning drive do the below'
32;export PYTHONUSERBASE=/media/ubuntu/2TB_fast_nvme_drive1/pip_local_site-packages
33;export TMPDIR=/media/ubuntu/2TB_fast_nvme_drive1/pip_cache
34;export PYTHONPATH=/media/ubuntu/2TB_fast_nvme_Drive1/pip_local_site-packages ; echo 'only if your boot drive is not your Machine Learning drive'
35;echo 'see you on the flip side, restart'
36;sudo shutdown -r now
37;echo 'one must make sure their non-boot drives are initiated if /etc/fstab is not taking hold - opening a file explorer, navigate to your NVME if not your boot drive manually upon every reboot, if fstab does not gracefully automount the drive at every startup'
38;echo 'clean up a little, just in case'
39;sudo apt-get install --fix-broken
40;sudo apt-get upgrade
41;lspci | grep AMD
42;echo 'mine shows an entry like this "03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c0)" use the beginning of the lspci AMD string output to check a folder'
43;sudo ls /sys/bus/pci/devices/03
44;echo 'the output of the below should be 0, not -1'
45;sudo cat /sys/bus/pci/devices/03/numa_node
46;sudo echo -1 | tee -a "/sys/bus/pci/devices/0000:03:00.0/numa_node"
47;pip3 install -t /media/ubuntu/2TB_fast_nvme_Drive1/pip_local_site-packages --cache-dir=/media/ubuntu/2TB_fast_nvme_Drive1/pip_cache --upgrade transformers accelerate bitsandbytes-rocm --extra-index-url https://download.pytorch.org/whl/rocm5.2
48;echo 'see you on the flip side, restart'
49;sudo shutdown -r now
50;export LD_LIBRARY_PATH=/opt/rocm-5.4.1/lib
export LD_LIBRARY_PATH=/opt/rocm-5.4.3/lib:/opt/rocm-5.4.3/lib64
export PATH=$PATH:/opt/rocm-5.4.3/bin:/opt/rocm-5.4.3/opencl/bin
export LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm/lib64
export PATH=$PATH:/opt/rocm/bin:/opt/rocm/opencl/bin
51;echo 'begin python3 torch and inference tests'
52;echo "alias python3='rocm-smi --setfan 99%;python3' #AMD fan curve was not aggressive enough for my cooling" >> ~/.bashrc
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.