russellcanfield / wingman-ai Goto Github PK

An open source AI coding assistant VSCode extension. Works with Ollama, HuggingFace, OpenAI and Anthropic

License: MIT License

JavaScript 0.07% TypeScript 99.78% HTML 0.15%

wingman-ai's Introduction

Wingman - AI Coding Assistant

The Wingman-AI extension brings high quality AI assisted coding right to your computer, it's 100% free and private which means data never leaves your machine!

Like the extension? Check out Squadron AI our AI-assisted code reviewer.

🚀 Getting Started

Choosing an AI Provider

We recommend starting with Ollama using the Deepseek model(s), see why here or here.

Install this extension from the VS Code Marketplace: Wingman-AI
Install Ollama
Install the supported local models by running the following command(s): Example:
- ollama pull deepseek-coder:6.7b-base-q8_0
- ollama pull deepseek-coder:6.7b-instruct-q8_0

That's it! This extension will validate that the models are configured correctly in it's VSCode settings upon launch. If you wish to customize which models run see the FAQ section.

Features

Code Completion

The AI will look for natural pauses in typing to decide when to offer code suggestions (keep in mind the speed is limited by your machine). The code completion feature will also analyze comments you type and generate suggestions based on that context.

Code Completion Disable / HotKey

We understand that sometimes the code completion feature can be too aggressive, which may strain your system's resources during local development. To address this, we have introduced an option to disable automatic code completion. However, we also recognize the usefulness of on-demand completion. Therefore, we've implemented a hotkey that allows you to manually trigger code completion at your convenience.

When you need assistance, simply press Shift + Ctrl + Space. This will bring up a code completion preview right in the editor and a quick action will appear. If you're satisfied with the suggested code, you can accept it by pressing Enter. This provides you with the flexibility to use code completion only when you want it, without the overhead of automatic triggers.

Interactive Chat

Talk to the AI naturally! It will use open files as context to answer your question, or simply select a section of code to use as context. Chat will also analyze comments you type and ge

AI Providers

Ollama

Ollama is a free and open-source AI model provider, allowing users to run their own local models.

Why Ollama?

Ollama was chosen for it's simplicity, allowing users to pull a number of models in different configurations and update them at will. Ollama will pull optimized models based on your system architecture, however if you do not have a GPU accelerated machine, models will be slower.

Setting up Ollama

Follow the directions on the Ollama website. Ollama has a number of open source models available that are capable of writing high quality code. See getting started for how to pull and customize models.

Supported Models

The extension uses a separate model for chat and code completion. This is due to the fact that different types of models have different strengths, mixing and matching offers the best result.

NOTE - You can use any quantization for a supported model, you are not limited.

Example: deepseek-coder:6.7b-instruct-q4_0

Supported Models for Code Completion:

Deepseek Coder v2-base (tested with: deepseek-coder-v2:16b-lite-base-q4_0)
Deepseek-base (tested with: deepseek-coder:6.7b-base-q8_0)
Codellama-code (tested with: codellama:7b-code-q4_K_M)
Magicoder-DS (tested with wojtek/magicoder:6.7b-s-ds-q8_0)
CodeQwen1.5 (tested with codeqwen:7b-code-v1.5-q5_1)
Codestral (tested with codestral:22b-v0.1-q5_K_M)

Supported Models for Chat:

Deepseek Coder v2-instruct (tested with: deepseek-coder-v2:16b-lite-instruct-q4_0)
Deepseek-Instruct (tested with: deepseek-coder:6.7b-instruct-q8_0)
Codellama-Instruct (tested with: codellama:7b-instruct)
Phind-CodeLlama - (tested with: phind-codellama:34b-v2-q2_K)
Magicoder-DS (tested with wojtek/magicoder:6.7b-s-ds-q8_0)
Llama3-Instruct (tested with llama3:8b-instruct-q6_K)
CodeQwen1.5-Code (tested with codeqwen:7b-chat-v1.5-q8_0)
Codestral (tested with codestral:22b-v0.1-q5_K_M)

OpenAI

OpenAI is supported! You can use the following models:

GPT4-o
GPT4-Turbo
GPT4

NOTE - Unlike using Ollama, your data is not private and will not be sanitized prior to being sent.

Anthropic

Anthropic is supported! You can use the following models:

Claude 3.5 Sonnet
Claude 3 Opus

NOTE - Unlike using Ollama, your data is not private and will not be sanitized prior to being sent.

Hugging Face

Hugging Face supports hosting and training models, but also supports running many models (under 10GB) for free! All you have to do is create a free account.

Setting up Hugging Face

Once you have a Hugging Face account and an API key, all you need to do is open the VSCode settings pane for this extension "Wingman" (see FAQ).

Once it's open, select "HuggingFace" as the AI Provider and add your API key under the HuggingFace section:

Supported Models

The extension uses a separate model for chat and code completion. This is due to the fact that different types of models have different strengths, mixing and matching offers the best result.

Supported Models for Code Completion:

CodeLlama (tested with: codellama/CodeLlama-7b-hf)
Starcoder2 (tested with bigcode/starcoder2-15b)

Supported Models for Chat:

Mixtral v0.1 (tested with mistralai/Mixtral-8x7B-Instruct-v0.1)
Mistral v0.2 (tested with: mistralai/Mistral-7B-Instruct-v0.2)

NOTE - Unlike using Ollama, your data is not private and will not be sanitized prior to being sent.

FAQ

How can I change which models are being used? This extension uses settings like any other VSCode extension, see the examples below.

The AI models feel slow, why? As of pre-release 0.0.6 we've added an indicator in the bottom status bar to show you when an AI model is actively processing. If you aren't using GPU accelerated hardware, you may need to look into Quantization].

Why do some models have "q2" or "q4" after the name? Information on model Quantization

Troubleshooting

This extension leverages Ollama due to it's simplicity and ability to deliver the right container optimized for your running environment. However good AI performance relies on your machine specs, so if you do not have the ability to GPU accelerate, responses may be slow. During startup the extension will verify the models you have configured in the VSCode settings pane for this extension, the extension does have some defaults:

Code Model - deepseek-coder:6.7b-base-q8_0

Chat Model - deepseek-coder:6.7b-instruct-q8_0

The models above will require enough RAM to run them correctly, you should have at least 12GB of ram on your machine if you are running these models. If you don't have enough ram, then choose a smaller model but be aware that it won't perform as well. Also see information on model Quantization.

Release Notes

To see the latest release notes - check out our releases page.

If you like the extension, please leave a review! If you don't, open an issue and we'd be happy to assist!

Enjoy!

wingman-ai's People

Contributors

Stargazers

Watchers

Forkers

aszazel dkzdev phpk swoldanski christsantiris bodhihu ai-vscode ai-generation ditto190 aryy403 ai-devpilot

wingman-ai's Issues

VueJS2 and Linux Bash files not running

We are having problems with getting wingman with ollama integrated working with .vue and .sh files. We are able to get them to work
with other filetypes, like .js, .json, and .php to name a few. We run the latter filetypes and we get the greyed out code suggestions and a loading circle on the bottom right hand corner of the screen, but with the former we get no response at all. We are running in a VSCode extension.

Chat - stop button adds additional row

When using chat, if you click the "stop" button it will add 2 entries into the chat.

Enhance code completion on/off

Code completion currently supports being disabled and enabled. Instead of a disabled option, let’s convert it to always on or hotkey.

Feature - Text area for chat input

Support multi line better

Ollama provider fails when not running, even when other provider is selected

Ollama should not attempt to validate models unless it is the selected provider.

Feature - Code Completion Toggle

Add a toggle in the settings to allow users to turn on/off code completion to control traffic and potential data leakage in situations where you dont want to send parts or the entire file over to openai as context.

Support other Ollama models

Remove restriction and allow other values in chatModel and codeModel inside "Wingman.Ollama".
In version v0.3.1 only specific models allowed. This is very limiting.

Allow use of llama3 models.

Not able to run the extension by installing it manually.

i just want to make same UI changes fit my preference by the i tried to install the extension from the repo and ran it, everything worked okay by none of the functions were working like chat or code completion. I have ollama installed too. I am trying this on my macbook m1. perhaps it would be of great help if you could list the steps of installation through this repo instead of VS Code.

Support Anthropic as a provider

Support Claude models via Anthropic provider support.

Auto completion not working and needs some improvements

I'm using a computer with AMD Ryzen 5 5600G with integrated Radeon Graphics × 6, and 32GB RAM, on Linux Mint 21.
This is the code I'm using to try Wingman-AI:

def isPrime(n):
    """ Test if n is a prime number """
    if 
        
def hello(i):
    for j in range(i):
        print(f"{j} is a value")

At the isPrime function, I've been trying to type either if or for to see if Wingman-AI generates a code completion.

From what I understand, while typing code, if I pause, Wingman-AI is supposed to suggest a code completion. In my case, it isn't working. Here's the output:

27/02/2024, 22:01:01 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    for i <｜fim▁hole｜>\n\n        \n\n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":-1,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:01:01 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:01:01 - [info] Ollama - Code Completion execution time: 0.298 seconds
27/02/2024, 22:01:02 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    for i in<｜fim▁hole｜>\n\n        \n\n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":-1,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:01:02 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:01:02 - [info] Ollama - Code Completion execution time: 0.143 seconds
27/02/2024, 22:01:02 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    for i in <｜fim▁hole｜>\n\n        \n\n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":-1,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:01:21 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:01:21 - [info] Ollama - Code Completion execution time: 18.563 seconds

I noticed "num_predict":-1 and checked Wingman config. Sure enough, Code max tokens is -1 by default. Made the value 100, closed vscode, opened it again and tried. This time:

27/02/2024, 22:16:51 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    <｜fim▁hole｜>\n        \n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":100,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:16:51 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:16:51 - [info] Ollama - Code Completion execution time: 0.112 seconds
27/02/2024, 22:16:52 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    if <｜fim▁hole｜>\n        \n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":100,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:17:30 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:17:30 - [info] Ollama - Code Completion execution time: 37.988 seconds

So what would I need to do to get code completions?

Few other humble suggestions:

Conservative use of CPU required:

To save on power, Users could prefer to have Wingman-AI query Ollama for a code completion only when using a key combination like perhaps Ctrl+i or something. If Wingman-AI is going to try generating completions everytime the User pauses, it consumes a huge amount of CPU power even when the User doesn't want it to. This not only adds up in terms of the electricity bill, it also puts the User's CPU fan under constant duress/wear. I understand that some people would prefer not having to press a key combo, so this could be a setting that Users could choose. Either to use a pause or a key combo.

Readme update required:

It'd help to update the readme to show users how they can view the output logs and show a screenshot of the wingman config tab, to show how easy it is to change settings. Also, most users won't know what the consequence of changing settings like the code context window, code max tokens, chat context window etc is. So it'd be nice to explain those, and also explain why there's a separate code model and a separate chat model.

Create a better UX for Ollama mode selection

Develop a better more user friendly UX for selecting supported models. Right now you can input an unsupported model name

Light color theme issue

Wingman AI generated code barely visible in sidebar when light color theme is used in VS Code.
I have tried few different light themes and all look the same. Bleached.

Feature - TSC

do the thing.

Remove throw on invalid config

Currently Wingman is setup to throw when invalid configuration scenarios are detected. The original intent was to prevent the user going to Chat and it being unresponsive.

since we have evolved Wingman to have a config pane we can no longer fail the extension, consider removing throw and maybe enhance the existing error state so the config screen still loads

also consider the state of the extension when it does fail to start up. We probably want to consider hiding quick fix/code action options - or on use have them emit an error dialog. Same with chat or load a chat shell that indicates it’s not loaded properly.

No response after prompting Wingman AI

I'm using VS Code version 1.85.1 on Linux Mint 21 Cinnamon. I've installed Ollama using curl -fsSL https://ollama.com/install.sh | sh, downloaded this model ollama pull deepseek-coder:6.7b-base-q8_0 and verfied it's working on http://localhost:11434/. I have 32GB RAM. http://localhost:11434/api/show and http://localhost:11434/api/generate show a 404 page not found error.
In the Wingman AI extension pane where I can type a prompt, when I type a prompt, the loading icon just keeps circling forever. No code is generated. On the system monitor, I can see that CPU isn't being used. I don't have a separate graphics card, and since Ollama detected that there's no NVidia graphics card, it had printed a message that it'd run in CPU mode.
So is this merely an issue with the fact that there's no NVidia graphics card, or that the Ollama API paths could have changed? I don't want to use the HuggingFace API or OpenAI API, so won't be trying those.

Update: I downloaded ollama pull deepseek-coder:6.7b-instruct-q8_0 also. Still same problem.

Are there any plans to support JetBrains plugins?

Publish on OpenVSX for VSCodium users

I use VSCodium, a fully open source fork of VSCode and the extension store for VSCodium is OpenVSX. Please publish and maintain this extension on OpenVSX for us VSCodium users.

https://vscodium.com/
https://open-vsx.org/

Feature - Ollama add support for model keep alive

https://github.com/ollama/ollama/blob/main/docs/api.md#parameters

[FEATURE REQUEST] Code Completion Tokens Limit

Hi,

I am using Wingman AI on my 2018 Macbook Pro for VSCode. Since it runs deepseek-coder on CPU, it takes a while to generate the code. For the autocompletion, on like 1 second pause, I want to see the generated text.

According to the codebase, the current set token limit is 1024, which is quite high. I want to request it to be user-adjustable, for me, 30-50 tokens would be the best limit and they will get generated in 2 seconds as well. Currently, it takes over 40 seconds to see the autocompletion. That is not practical at all.

Can you please implement it? Thank you!

Secondly, can we make code contributions?

Add ability to exclude context

Wingman AI by default uses current file as context for the request to LLM.

It would be nice to just send general request/question without any context because:

Context slows down response from LLM for general questions.
If you do not want any context you are forced to select blank line(s) in text editor.

Refactor - enhance context

Refactor currently is in development but offers sub-par performance on complex refactors.

2 issues contribute - one is lack of context understanding. The other is potentially brittle markdown parsing - consider prompt tweak here.

russellcanfield / wingman-ai Goto Github PK

wingman-ai's Introduction

Wingman - AI Coding Assistant

Like the extension? Check out Squadron AI our AI-assisted code reviewer.

🚀 Getting Started

Choosing an AI Provider

Features

Code Completion

Code Completion Disable / HotKey

Interactive Chat

AI Providers

Ollama

Why Ollama?

Setting up Ollama

Supported Models

OpenAI

Anthropic

Hugging Face

Setting up Hugging Face

Supported Models

FAQ

Troubleshooting

Release Notes

wingman-ai's People

Contributors

Stargazers

Watchers

Forkers

wingman-ai's Issues

Conservative use of CPU required:

Readme update required:

Recommend Projects

Recommend Topics

Recommend Org