Coder Social home page Coder Social logo

recai's People

Contributors

dependabot[bot] avatar eltociear avatar leavingseason avatar micheallei avatar microsoft-github-operations[bot] avatar microsoftopensource avatar xuhwang avatar ycjcl868 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recai's Issues

Getting Issue when running RecLM-emb

When I am running the training command, bash shell/run_single_node.sh it create training.log file with failed training and errors.

bash shell/run_single_node.sh
In this I get error i.e

  1. tourch.dtype it want only from this option ["auto", "bfloat16", "float16", "float32"]. But in .sh file it is given as none for non llama model. SO I updated it as  "float32".
  2. Here PEFT_MODEL_NAME=None is passed but when they read then it get None as string do condiion not worked, So I updated in file manually. In src/model.py line 54 I added this."peft_model_name=None"
  3. Know I run this command in both gGPU macheiene and google colab but getting issue.

Error 1:   

Traceback (most recent call last):
File "/home/ubuntu/RecLM-emb/train.py", line 131, in
main()
File "/home/ubuntu/RecLM-emb/train.py", line 27, in main
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File "/home/ubuntu/my_env/lib/python3.9/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "", line 115, in init
File "/home/ubuntu/my_env/lib/python3.9/site-packages/transformers/training_args.py", line 1372, in post_init
and (self.device.type != "cuda")
File "/home/ubuntu/my_env/lib/python3.9/site-packages/transformers/training_args.py", line 1795, in device
return self._setup_devices
File "/home/ubuntu/my_env/lib/python3.9/site-packages/transformers/utils/generic.py", line 54, in get
cached = self.fget(obj)
File "/home/ubuntu/my_env/lib/python3.9/site-packages/transformers/training_args.py", line 1739, in _setup_devices
self.distributed_state = PartialState(
File "/home/ubuntu/my_env/lib/python3.9/site-packages/accelerate/state.py", line 230, in init
torch.cuda.set_device(self.device)
File "/home/ubuntu/my_env/lib/python3.9/site-packages/torch/cuda/init.py", line 326, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal

Error 2: 

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 27118) of binary: /home/ubuntu/my_env/bin/python3.9
Traceback (most recent call last):
File "/home/ubuntu/my_env/bin/torchrun", line 8, in
sys.exit(main())
File "/home/ubuntu/my_env/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/ubuntu/my_env/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/ubuntu/my_env/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/ubuntu/my_env/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/my_env/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
To resolve this and testing I tried to pass only one file from all 10 file in traing, and also pass only 100 data of that file  to reduce the traing data but itsas not worked for me.

Is the "query_api.py" file missing

RecLM-emb/shell/test_data_pipeline.sh, line 56, shows python preprocess/gpt_api/query_api. py, but there is no query_api. py file in the RecLM-emb/preprocess/gpt_api/ directory, only the api. py file. May I ask if the file is missing?

File not found error in base model tuning in Knowlege plugin

File "/home/deepak/recommendation system/RecAI/Knowledge_Plugin/preprocess/step2-Base_models/RecModel/data_loaders/DataLoader.py", line 54, in init
with open(self.path + f"/{dataset}.test_candidate.txt", "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../data/ml100k01-1-5/ml100k01-1-5.test_candidate.txt'

RecLM-emb question

bash shell/data_pipeline.sh
bash shell/test_data_pipeline.sh

The above script will consume a lot of cost to call OpenAI. Can you provide the file generated after executing the script? Looking forward to your reply.

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.