Coder Social home page Coder Social logo

runpod / runpodctl Goto Github PK

View Code? Open in Web Editor NEW
216.0 216.0 31.0 1.54 MB

๐Ÿงฐ | RunPod CLI for pod management

Home Page: https://www.runpod.io/

License: GNU General Public License v3.0

Go 94.64% Makefile 0.54% Python 2.17% Shell 2.65%
command-line docker file-transfer runpod

runpodctl's People

Contributors

0xdevalias avatar allysonrosenthal avatar chitalian avatar direlines avatar ef0xa avatar flash-singh avatar furkangozukara avatar github-actions[bot] avatar justinmerrell avatar pantafive avatar pw-git avatar rachfop avatar strangehelix avatar testwill avatar yamanahlawat avatar yhlong0 avatar zhl146 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

runpodctl's Issues

Is there any way to pass in an argument to the pod with this

When I'm doing runpodctl start pod {podId} is there any way to pass in a command argument to the pod? Like send the docker command, or something that would be appended to the docker command, or set a bash environment variable, any other way I can pass an argument string to the pod from my remote command line where I'm invoking runpodctl? My goal here is to be able to start a pod remotely and point it at a target URL that it should process. I know I can set a startup docker command from within the web interface, but I'm hoping to be able to do something like that from the command line.

Error: Worker concurrency cannot go beyond the maximum limit

Failed to deploy project: Your worker concurrency cannot go beyond the maximum limit of (20). Please contact support if you wish to scale past this number.

Perhaps, this could be checked on before having to wait a few minutes when deploying a new endpoint.
I could see how that experience would frustrate a user.

ValueError: Attempting to unscale FP16 gradients.

I ran this command.

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path openlm-research/open_llama_7b \
    --do_train \
    --dataset train \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 2000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16
[INFO|training_args.py:1345] 2023-12-07 06:09:02,164 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1798] 2023-12-07 06:09:02,164 >> PyTorch: setting up devices
[INFO|trainer.py:1760] 2023-12-07 06:09:03,760 >> ***** Running training *****
[INFO|trainer.py:1761] 2023-12-07 06:09:03,761 >>   Num examples = 78,303
[INFO|trainer.py:1762] 2023-12-07 06:09:03,761 >>   Num Epochs = 3
[INFO|trainer.py:1763] 2023-12-07 06:09:03,761 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:1766] 2023-12-07 06:09:03,761 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:1767] 2023-12-07 06:09:03,761 >>   Gradient Accumulation steps = 4
[INFO|trainer.py:1768] 2023-12-07 06:09:03,761 >>   Total optimization steps = 14,682
[INFO|trainer.py:1769] 2023-12-07 06:09:03,762 >>   Number of trainable parameters = 4,194,304
  0%|                                                                                                                                                                                               | 0/14682 [00:00<?, ?it/s][WARNING|logging.py:290] 2023-12-07 06:09:03,766 >> You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
Traceback (most recent call last):
  File "/workspace/LLaMA-Factory/src/train_bash.py", line 14, in <module>
    main()
  File "/workspace/LLaMA-Factory/src/train_bash.py", line 5, in main
    run_exp()
  File "/workspace/LLaMA-Factory/src/llmtuner/train/tuner.py", line 26, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/workspace/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 68, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1591, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1950, in _inner_training_loop
    self.accelerator.clip_grad_norm_(
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2040, in clip_grad_norm_
    self.unscale_gradients()
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2003, in unscale_gradients
    self.scaler.unscale_(opt)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.

ๆˆ‘ๆ˜จๅคฉไฝฟ็”จๆ™‚ๆ˜ฏๆญฃๅธธ็š„๏ผŒไฝ†็•ถๆˆ‘ไปŠๅคฉๆ”น่ฎŠไบ†่ณ‡ๆ–™้›†ๅคงๅฐๅพŒๅ‡บ็พไบ†้€™ๅ€‹ๅ•้กŒ๏ผŒ่ซ‹ๅ•ๆ˜ฏ็™ผ็”Ÿไบ†็”š้บผไบ‹ๅ‘ข?

It won't finish sending and/or receiving

Using runpodctl v1.8.0.

I have been trying to send a 172MB file in the last hour without any success. I keep retrying to no avail.

Sometimes when I send it will just stop in the middle of the job, and it stays like that, like frozen.

$ runpodctl send samples.zip                         
Sending 'samples.zip' (172.4 MB) 
Code is: 1100-yahoo-boat-friend-0
On the other computer run

runpodctl receive 1100-yahoo-boat-friend-0

Sending (->XX.XX.XXX:40806)
samples.zip  90% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  | (156/172 MB, 478.021 kB/s) [3m42s:34s]

... so it never finishing downloading at my end.

But then there's another problem, when the sending actually completes with 100%, my receiving end (say my PC) will stop receiving in the middle of it. Again, like frozen. It's like a communication breakdown and thing doesn't know what to do next so it stays in a frozen state.

Fix Windows install URL in Readme

Fix Windows install URL in Readme

The Windows install URL in the Readme is outdated and no longer works

wget https://github.com/runpod/runpodctl/releases/download/v1.9.0/runpodctl-windows-amd64.exe -O runpodctl.exe
Needs to be updated to
wget https://github.com/runpod/runpodctl/releases/download/v1.14.2/runpodctl-windows-amd64.exe -O runpodctl.exe

runpodctl project start - bug

Not really a ๐Ÿ› bug but not the expected behavior:
When you run runpodctl project start you get a No 'runpod.toml' found in the current directory. error - when it should be like unknown command or alias for create.

rp % runpodctl version
runpodctl v1.14.1
rp % runpodctl project start
No 'runpod.toml' found in the current directory.
Please navigate to your project directory and try again.
rp % runpodctl project h
Develop and deploy projects entirely on RunPod's infrastructure.

Usage:
  runpodctl project [command]

Available Commands:
  build       builds Dockerfile for current project
  create      Creates a new project
  deploy      deploys your project as an endpoint
  dev         Start a development session for the current project

Flags:
  -h, --help   help for project

Use "runpodctl project [command] --help" for more information about a command.
rp % runpodctl project cre
ate
Welcome to the RunPod Project Creator!
--------------------------------------

Provide a name for your project:
   > 

Add homebrew formular

Hi!

I think it's much better to add homebrew installation option.

How do you think?

Thanks

Fix help command strings

The RunPod CLI tool to manage resources on runpod.io and develop serverless applications.

Usage:
  runpodctl [command]

Aliases:
  runpodctl, runpod

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  config      Manage CLI configuration
  create      create a resource
  exec        Execute commands in a pod
  get         get resource
  help        Help about any command
  project     Manage RunPod projects
  receive     receive file(s), or folder
  remove      remove a resource
  send        send file(s), or folder
  ssh         SSH keys and commands
  start       start a resource
  stop        stop a resource
  update      update runpodctl

Flags:
  -h, --help      help for runpodctl
  -v, --version   Print the version of runpodctl

Some start with caps others do not. Should be consistent.
Also dont use (s):

Don't put optional plurals in parentheses. Instead, use either plural or singular constructions and keep things consistent throughout your documentation. Choose what is most appropriate for your documentation and your audience. If it's important in a specific context to indicate both, use one or more.

https://developers.google.com/style/plurals-parentheses

panic: runtime error: index out of range [4] with length 4

Can't receive data from runpod (docker image with no scp support)

$ runpodctl receive 1208-goat-boat-screen

panic: runtime error: index out of range [4] with length 4

goroutine 1 [running]:
cli/cmd/croc.glob..func1(0xc4c9e0, {0xc0000f1620, 0x1, 0x1})
	/home/runner/work/runpodctl/runpodctl/cmd/croc/receive.go:47 +0x3d3
github.com/spf13/cobra.(*Command).execute(0xc4c9e0, {0xc0000f1600, 0x1, 0x1})
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0xc4bae0)
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
cli/cmd.Execute({0x9571a4, 0xc0000001a0})
	/home/runner/work/runpodctl/runpodctl/cmd/root.go:26 +0x4a
main.main()
	/home/runner/work/runpodctl/runpodctl/main.go:8 +0x27
root@bed533d5304a:/workspace/stable-diffusion-webui# ls -l
total 680

Feature request? runpodctl receive 5261-goat-module-brasil-8 custom-name.zip

Is it possible to receive a file and change its name upon receiving it ?

For instance, say I'm sending samples.zip but on my receiving end I'd like it to unzip it in a folder named samples-2.

runpodctl receive 5261-goat-module-brasil-8 samples-2

More specifically say I need to review from my pc a remote folder that has changing data in it such as logs or images which are being created every n minutes and I'd like to keep track of changes in different folders.

does this thing even work?

every single call to both the api and using runpodctl ends with errors like:
context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Sending folders does not work on Windows

Instead of recreating the folder structure in the pod, runprodctl creates a bund of individual files with names like "

<filename>" (notice that \ is not a folder seperator on linux).

The same functionality works fine when sending from linux. Does not work in Windows.

2023-06-13_18h21_45

Please add binaries for linux and android on arm

This should be as simple as setting goarch=arm and arm64 and goos=android and linux, but my fork failed to build for some reason.I'm not familiar with this release please, that might be part of it?

update docker image for existing pod using runpodctl

Hi, is there any way to update container image for my running pod, just like edit pod option?

It seems that it's only possible to create a new pod with a new gpu using create command, but not with the gpu I already owned.

I hope there's a way to update container image only, not changing pod id & gpu.

The file transfer never ends... always stuck at 90%...

I tried to use runpodctl to upload dataset around 100G to runpod. to receive the files, I had to start the pod...however it has take the whole day, which means I pay the gpus for the whole day but get no chance to use it because runpodctl always fails..

Stard command gives error response

image

The pod starts running but shows the error below.
runpodctl start pod 4v0nxxxxx
Error: Something went wrong. Please try again later or contact support.

Create CPU Pod using runpodctl

Hello,

I would like to create non-gpu Pod for quick experimenting, before running GPU Pod. I cannot create CPU Pod, because runpodctl requires gpuType.

runpodctl send exits without any info

This was working for me just fine and then randomly out of the blue, running runpodctl send <file> just exits without saying anything.

This happens both locally and on the pod itself. Is there any way to get some verbose output / logging info so I can help you troubleshoot?

I'm running it on a macbook m2, just installed it today v1.9.0. Same behavior on the pod itself so I don't know if it matters.

runpodctl receive error

panic: runtime error: index out of range [4] with length 4

goroutine 1 [running]:
cli/cmd/croc.glob..func1(0xc4c9e0, {0xc000121610, 0x1, 0x1})
/home/runner/work/runpodctl/runpodctl/cmd/croc/receive.go:47 +0x3d3
github.com/spf13/cobra.(*Command).execute(0xc4c9e0, {0xc0001215f0, 0x1, 0x1})
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0xc4bae0)
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
cli/cmd.Execute({0x9571a4, 0xc0000001a0})
/home/runner/work/runpodctl/runpodctl/cmd/root.go:26 +0x4a
main.main()
/home/runner/work/runpodctl/runpodctl/main.go:8 +0x27

Get the balance information via runpodctl

Hi is it possible to get balance information via runpodctl or via different call graphql or via SDK ?
I need this information for automatic monitoring purpose, and send alert when balance is low.

API key is required for `version` cmd

When you install runpodctl, you can't confirm a successful installation by checking the version. You get an error telling you to run runpodctl config. After adding an API key, runpodctl version works as expected.

Unable to use podRentInterruptable

Using the web interface I'm able to Deploy a spot Instance instead of an On Demand instance. It would be nice to be able to do this using the command line tool too.

I tried naively replacing podFindAndDeployOnDemand with podRentInterruptable, but this failed. I have no idea if this was a permission problem, a server problem, or a client problem. (If I could get it to work, I'd provide a pull request.) I can see the current spot price using runpodctl get cloud. Once I create a pod through the web, I am able to see it and stop it using the command line interface.
I found this documentation.

Transfer randomly pauses

I was sending a stable diffusion model which is 2 gigabytes, but around 90%, the transfer just stopped. This happened the other day too, but at 80%.

How to get connect information from runpodctl ?

After creating a pod with runpodctl, how can I get the same connection information that I get on the console to access the pod. I am talking about the ssh connection info (e.g: ssh [email protected] -i ~/.ssh/id_ed1111111)

Right now I have to login to the console to get this information. What is the preffered way to get this from the CLI ?

Get ssh parameters using runpodctl

is there a way to obtain hostname and port of Pod's ssh using runpodctl? I would like to automate benchmarking my models, but I need to automate ssh connection.

Support modification of serverless templates

In order to be able to orchestrate Serverless Runpod.io deployments as part of a continuous deployment workflow it would be desirable to be able to update the Serverless template using runpodctl. Specifically to change the Container image setting on the template to point to a new version of the image.

Pointing the template to the :latest label runs the risk of docker pull caches being out of sync and running an old version of the image. And it makes rollback difficult too.

Ideally I'd like it to be possible to execute a runpodctl command and point it to an existing Serverless template to a new image URL

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.