Hi, I downloaded gpt-2-tensorflow2.0, so I could get gpt-2 working, but when I tried to install the requirements, I got this:
Requirement already satisfied: setuptools==41.0.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 1)) (41.0.1)
Collecting ftfy==5.6
Using cached ftfy-5.6.tar.gz (58 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 14, in <module>
File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 20, in <module>
from setuptools.dist import Distribution, Feature
File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 34, in <module>
from setuptools.depends import Require
File "/usr/local/lib/python3.10/dist-packages/setuptools/depends.py", line 7, in <module>
from .py33compat import Bytecode
File "/usr/local/lib/python3.10/dist-packages/setuptools/py33compat.py", line 55, in <module>
unescape = getattr(html, 'unescape', html_parser.HTMLParser().unescape)
AttributeError: 'HTMLParser' object has no attribute 'unescape'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
, so I installed the requirements manually, and some still gave an error, but then I just installed the newest version, but when I tried to train the model, it gave me this:
2023-09-16 21:01:25.812726: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-09-16 21:01:25.812751: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
No. of tf records:- 2
2023-09-16 21:01:27.331253: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 21:01:27.331492: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-09-16 21:01:27.331561: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2023-09-16 21:01:27.331623: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2023-09-16 21:01:27.331684: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2023-09-16 21:01:27.331761: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2023-09-16 21:01:27.331820: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2023-09-16 21:01:27.331910: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2023-09-16 21:01:27.331978: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2023-09-16 21:01:27.331992: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Latest checkpoint restored...............
Running in eager mode.............
Traceback (most recent call last):
File "/home/umikali/gpt-2/gpt-2-tensorflow2.0/train_gpt2.py", line 77, in <module>
train()
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/umikali/gpt-2/gpt-2-tensorflow2.0/train_gpt2.py", line 72, in train
model.fit([train_dataset, test_dataset], graph_mode)
File "/home/umikali/gpt-2/gpt-2-tensorflow2.0/gpt2_model.py", line 282, in fit
step, loss, perplexity = train_func(inputs, targets)
File "/home/umikali/gpt-2/gpt-2-tensorflow2.0/gpt2_model.py", line 173, in _train_step
predictions, _ = self(inputs, training=True)
File "/usr/local/lib/python3.10/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/umikali/gpt-2/gpt-2-tensorflow2.0/gpt2_model.py", line 82, in call
embedded_x = self.embedding(x)
File "/home/umikali/gpt-2/gpt-2-tensorflow2.0/layers/embedding_layer.py", line 30, in call
return self.embedding(inputs, scale=scale)
File "/home/umikali/gpt-2/gpt-2-tensorflow2.0/layers/embedding_layer.py", line 41, in embedding
embeddings = tf.nn.embedding_lookup(self.embedding_weights, inputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "embedding_layer" (type EmbeddingLayer).
indices[14,4] = 31890 is not in [0, 24512) [Op:ResourceGather]
Call arguments received by layer "embedding_layer" (type EmbeddingLayer):
• inputs=tf.Tensor(shape=(32, 241), dtype=int32)
• mode=embedding
• scale=False
, I tried doing it again i a virtual environment with sudo, but nothing worked. I haven't yet tried in a docker container tho.
EDIT: I tried in a container and it still didn't work