Coder Social home page Coder Social logo

openbmb / repoagent Goto Github PK

View Code? Open in Web Editor NEW
158.0 9.0 19.0 13.76 MB

An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.

License: Apache License 2.0

Python 97.40% Makefile 1.64% Shell 0.96%
agent chatglm gpt gpt-4 langchain llama llms qwen rag chatgpt

repoagent's People

Contributors

dependabot[bot] avatar innovation64 avatar logic-10 avatar pooruss avatar sailaoda avatar umpire2018 avatar yeyn19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

repoagent's Issues

`generate_overall_structure` processed an overly broad range of files.

Description

The generate_overall_structure method in our codebase is currently processing a wider range of files than necessary. This behavior is leading to the inclusion of files from directories like .venv and others that are not relevant to our intended use case.

Code Snippet:

def generate_overall_structure(self):
    repo_structure = {}
    for root, dirs, files in os.walk(self.repo_path):
        for file in files:
            if file.endswith('.py'):
                relative_file_path = os.path.relpath(os.path.join(root, file), self.repo_path)
                repo_structure[relative_file_path] = self.generate_file_structure(relative_file_path)
    return repo_structure

Observed Behavior

The method traverses all directories within self.repo_path, including those like .venv. It adds all Python files to the repo_structure dictionary, regardless of whether they are part of the virtual environment or other non-essential directories.

Expected Behavior

Ideally, the method should ignore directories that are not relevant to the repository's core functionality, such as .venv, __pycache__, and others typically found in a Python project's .gitignore file.

Suggested Fix

We might need to integrate a filtering mechanism that aligns with the patterns specified in .gitignore, or explicitly define a list of directories to ignore during the traversal process.

Additional Context

This issue can lead to unnecessary bloating of the repo_structure and may also cause performance issues if the method processes a large number of irrelevant files.

image

KeyError: `default_completion_kwargs` raisd in ai_doc\chat_engine.py

Description:

Encountered a KeyError when accessing default_completion_kwargs in chat_engine.py.

Code Snippet:

model = self.config["default_completion_kwargs"]["model"]

Error Message:

  File "ai_doc\chat_engine.py", line 103, in generate_doc
    model = self.config["default_completion_kwargs"]["model"]
            ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'default_completion_kwargs'

Expected Behavior:

The default_completion_kwargs key should be present in the config dictionary.

Additional Improvement:

Propose to include a new function find_engine_or_model to efficiently search for 'engine' or 'model' keys in nested dictionaries. The function returns the first occurrence of either key.

def find_engine_or_model(data):
    for first_level_key, first_level_value in data['api_keys'].items():
        for item in first_level_value:
            if 'engine' in item:
                return item['engine']
            elif 'model' in item:
                return item['model']
    return None

Questions: tree sitter, git, ollama

Hello, interesting project and architecture.
I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?

Also, why did you decide to use pre-commit hooks instead of pullling git repository with a scheduler. Llama index github reader could be leveraged in that case.

Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?

Thanks

AttributeError in runner.py When Handling Exceeded Context Length

Description:
Encountered an AttributeError in runner.py after multiple attempts to process a long code snippet using the gpt-3.5-turbo-16k model.

Error Messages:
Repeated errors indicating the model's maximum context length was exceeded:

Error: The model's maximum context length is exceeded. Reducing the length of the messages. Attempt 1 of 5
...
Error: The model's maximum context length is exceeded. Reducing the length of the messages. Attempt 5 of 5

Followed by an AttributeError:

   File "ai_doc\runner.py", line 341, in <module>
    runner.run()
  File "ai_doc\runner.py", line 165, in run
    self.process_file_changes(repo_path, file_path, is_new_file)
  File "ai_doc\runner.py", line 217, in process_file_changes
    json_data[file_handler.file_path] = self.update_existing_item(json_data[file_handler.file_path], file_handler, changes_in_pyfile)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "ai_doc\runner.py", line 298, in update_existing_item
    future.result()
  File "Python\Python311\Lib\concurrent\futures\_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "Python\Python311\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "Python\Python311\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "ai_doc\runner.py", line 308, in update_object
    obj["md_content"] = response_message.content
                        ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'content'

Suspected Issue:
The response_message object is None, likely due to the previous errors where the model's maximum context length was exceeded. The code attempts to access the content attribute of a NoneType object, leading to the AttributeError.

Suggested Solution:

  • Investigate why the model's maximum context length is being exceeded and attempt to reduce the input size accordingly.
  • Implement a check to ensure response_message is not None before attempting to access its content attribute. This could prevent the AttributeError and provide a clearer indication of the underlying issue.

使用pip安装完之后 运行repoagent configure 报错

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.12/bin/repoagent", line 5, in
from repo_agent.main import app
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/repo_agent/main.py", line 28, in
repo_path: Annotated[str, typer.Option(prompt="Enter the path to your local repository")] = settings.repo_path ,
^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/dynaconf/base.py", line 145, in getattr
value = getattr(self._wrapped, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/dynaconf/base.py", line 328, in getattribute
return super().getattribute(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Settings' object has no attribute 'REPO_PATH'

win 11 repoagent configure 失败

报错信息

(venv) PS C:\git\RepoAgent> repoagent configure
C:\git\RepoAgent\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_validation.py:26: UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.
  warnings.warn(
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\git\RepoAgent\venv\Scripts\repoagent.exe\__main__.py", line 4, in <module>
  File "C:\git\RepoAgent\repo_agent\main.py", line 12, in <module>
    from repo_agent.chat_with_repo import main as run_chat_with_repo
  File "C:\git\RepoAgent\repo_agent\chat_with_repo\__init__.py", line 3, in <module>
    from .main import main
  File "C:\git\RepoAgent\repo_agent\chat_with_repo\main.py", line 3, in <module>
    from repo_agent.settings import setting
  File "C:\git\RepoAgent\repo_agent\settings.py", line 87, in <module>
    setting = Setting.model_validate(_config_data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\git\RepoAgent\venv\Lib\site-packages\pydantic\main.py", line 509, in model_validate
    return cls.__pydantic_validator__.validate_python(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for Setting
chat_completion.openai_api_key
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing

查了下已经修复了但是没有release

microsoft/onnxruntime@7b46b31

How to use AzureOpenAI instead of OpenAI

Hello, I loved the project and its workflow

I wanted to use AzureOpenAI instead of OpenAi. Can you please explain the process and template code for the same.

Facing problem with project_hierarchy.json file

I found the project_hierarchy.json file in file_handler.py repo and other repo as well. In the setting file, it is also initialised but i am not cleared from where it is written? I am facing issue with this.
Please help me to clear my doubt.

Thanks

是否实现自举呢?

Repo Agent本身的文档是否是由Repo Agent生成的?如果是的话在哪里可以看到?

PermissionError raised in `ai_doc/file_handler.py write_file` function

修复文件路径处理和目录创建中的权限错误

问题描述

在项目的 write_file 函数中,我们遇到了一个关于文件路径处理的问题。当尝试结合两个路径参数来创建一个文件时,如 /workspaces/AI_doc/Markdown_Docs/ai_doc/runner.md,函数错误地处理了这些路径,导致尝试在根目录下创建目录,从而出现权限错误。

具体表现为 PermissionError: [Errno 13] Permission denied: '/Markdown_Docs'。这是因为第二个路径被误解释为绝对路径,而不是预期的相对路径。

问题代码

def write_file(self, file_path, content):
    """
    写入文件内容

    Args:
        repo_path (str): 仓库路径
        file_path (str): 文件路径
        content (str): 文件内容
    """
    file_path = os.path.join(self.repo_path, file_path)
    os.makedirs(os.path.dirname(file_path), exist_ok=True)
    with open(file_path, 'w') as file:
        file.write(content)

修改建议

为解决这个问题,建议如下修改:

  1. 路径格式检查:确保 file_path 是相对路径。如果以 / 开头,则去除这个前导字符。

  2. 改进路径连接:在 os.path.join(self.repo_path, file_path) 调用中,正确地处理路径,确保所有路径部分正确地组合在一起。

  3. 目录创建逻辑:在尝试写入文件之前,确保所有中间目录已被创建,避免权限错误。

修复后代码

import os

def create_directory(base_path, file_path):
    # 确保file_path是相对路径
    if file_path.startswith('/'):
        # 移除开头的 '/'
        file_path = file_path[1:]

    # 使用os.path.join连接路径
    full_path = os.path.join(base_path, file_path)

    # 提取目录部分
    directory_path = os.path.dirname(full_path)

    # 创建目录
    os.makedirs(directory_path, exist_ok=True)

    return directory_path

# 调用函数
base_path = '/workspaces/AI_doc'
file_path = '/Markdown_Docs/ai_doc/runner.md'
created_directory = create_directory(base_path, file_path)
print(f"Created directory: {created_directory}")

pre-commit运行异常的解决方案设计

问题

在install部分我有写到pre commit 的特性是有文件更改之后提交就会显示falied,需要手动 no verify再commit一次,这个体验就降低了一点,有没有更好的办法?

API设计

看起来似乎是需要一个类似于black的pre-commit实现的效果,即自动运行文档生成命令以后并返回正确的结果。

因此实际上这里主要是需要把命令行的返回值做很准确的调整。比如black的主函数自己准确地控制了给命令行返回的值。

解决方案

  1. 在命令行内部把每次更新的结果提交到暂存区;
  2. 控制命令行的结果为返回正常。

chat with repo workflow issue

Chat with Repo 项目要求

核心概念

  • 目标: 创建一个能够与代码仓库进行交互的聊天系统。
  • 总结 : 根据用户的问题做匹配,匹配到对应的文档、代码、引用关系。然后将匹配到的结果送给大模型,让大模型去做思考,最后生成回答。
  • 灵感来源: LangChian 的 RAG over code | 🦜️🔗 Langchain

具体要求

  1. 动态更新文档chunks对应的向量

    • 文档变更监控: 由于MD文件内容可能会频繁变更,系统必须使用工具监控文档的更改,并相应地更新文档chunks的向量表示。
    • 向量存储与版本控制: 必须有一个高效的向量存储系统来管理文档与其向量表示之间的一致性。
  2. 组织文档和代码块

    • 检索方法: 进行embedding search,将用户查询转换成向量,再与向量数据库中的内容进行比较。基于相似度选择最相关的几个块进行返回,确保回答的准确性和相关性。
  3. 代码整合

    • 加入原始代码: 文档对应的代码也应进行向量化处理,以便整合到检索过程中。
    • 多路召回: 除了向量化处理外,还包括关键字检索等传统搜索方法,以及可能的语义搜索和模式匹配技术,以增强检索的全面性和准确度。
  4. 处理引用关系

    • 代码块引用关系: 召回的代码块应包含其在项目中的具体位置信息,以及它与其他代码块或文档的引用关系。这有助于构建更加全面和连贯的上下文。(当前已实现)
  5. 大模型的总结与回答

    • 综合回答: 大模型应该能够进行综合分析,理解复杂的代码和文档关系,并基于召回的内容形成对用户查询的综合回答。

Handling `UnicodeDecodeError` During File Read Operation

Description:

Encountered a UnicodeDecodeError while attempting to read content from a file that contains a mix of English and Chinese characters. The content was initially saved with utf-8 encoding but resulted in encoding errors when read back from the file.

Error Message:

  File "AI_doc\ai_doc\runner.py", line 341, in <module>
    runner.run()
  File "AI_doc\ai_doc\runner.py", line 165, in run
    self.process_file_changes(repo_path, file_path, is_new_file)
  File "AI_doc\ai_doc\runner.py", line 225, in process_file_changes
    markdown = file_handler.convert_to_markdown_file(file_path=file_handler.file_path)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    json_data = json.load(f)
                ^^^^^^^^^^^^
  File "Python\Python311\Lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 207: invalid start byte

Issue Details:
The error occurred in the convert_to_markdown_file method, which suggests that the file content may have been incorrectly encoded or that the file contains a mix of encodings that are not properly handled by the standard utf-8 decoder.

Content Example:

        "synthesize_voice": {
            "type": "FunctionDef",
            "name": "synthesize_voice",
            "md_content": "**synthesize_voice����**���ú����Ĺ����ǽ�ָ���������ϳ�Ϊ�����������ϳɵ��������浽ָ�����ļ����С�\n\n�ú�������ϸ�������������£�\n\n- ���ȣ���voice_name_details��������ȡ�������ƺ��Ա���Ϣ���������ƺ��Ա���Ϣ֮��ʹ����������\"��\"��\"��\"���зָ���ͨ��rsplit�������������ƺ��Ա���Ϣ���룬��ʹ��rstrip����ȥ���Ա���Ϣĩβ���������š�Ȼ��ʹ��replace�������Ա���Ϣ�е�\"Ů��\"�滻Ϊ\"Ů\"����\"��ͯ\"�滻Ϊ\"ͯ\"���Լ��Ա�ı�ʾ��ʽ��\n\n- ���������������������õ�speech_config�����speech_synthesis_voice_name�����У��Ա��������ϳ�ʱʹ��ָ����������\n\n- Ȼ��ʹ��ѭ������������Դ����ij��ԡ�\n\n- ��ÿ�γ����У����ȳ�ʼ��SpeechSynthesizer���󣬲�����speech_config������\n\n- Ȼ��ʹ��os.path.join����������ļ��к��������ơ��Ա�ƴ�ӳ������Ƶ�ļ���·����\n\n- ���ţ�����AudioConfig���󣬽��ļ�·������filename������\n\n- Ȼ��ʹ��ָ����audio_config������ʼ��SpeechSynthesizer����\n\n- ����SpeechSynthesizer�����speak_text_async��������Ҫ�ϳɵ��ı���Ϊ�������룬��ʹ��get������ȡ�ϳɽ����\n\n- ���ϳɽ����reason���ԣ�����ϳɳɹ������ӡ�ϳɳɹ�����ʾ��Ϣ�������ء�\n\n- ����ϳɱ�ȡ�������ӡȡ����ԭ�򣬲�����ȡ����ԭ�������Ӧ�Ĵ�����\n\n- ��������쳣�����ӡ�쳣��Ϣ������ָ���������ӳ�ʱ���������ԡ�\n\n- ����ﵽ������Դ�����Ȼ�޷��ϳ����������ӡ�ϳ�ʧ�ܵ���ʾ��Ϣ��\n\n**ע��**��ʹ�øú���ʱ��Ҫע�����¼��㣺\n- ��Ҫ�ṩ�ϳ��������������ƺ��Ա���Ϣ��\n- ��Ҫ�ṩSpeechConfig������Ϊ���������ڸö��������ú��ʵ�������Ϣ��\n- ��Ҫ�ṩ����ļ��е�·����\n- ��Ҫָ��������Դ����������ӳ�ʱ�䡣\n\n**���ʾ��**������ɹ��ϳ��������������浽��ָ�����ļ����С�",
            "code_start_line": 42,
            "code_end_line": 85,
            "parent": null,
            "have_return": true,
            "code_content": "def synthesize_voice(voice_name_details, speech_config, output_folder, max_retries, retry_delay):\n    # Extract voice name and gender from the details\n    voice_name, gender = voice_name_details.rsplit('��', 1)\n    gender = gender.rstrip('��')\n    gender = gender.replace('Ů��', 'Ů').replace('��ͯ', 'ͯ')  # Simplify gender notation\n\n    # Set the voice name in the speech config.\n    speech_config.speech_synthesis_voice_name = f\"zh-CN-{voice_name}\"\n\n    for attempt in range(max_retries):\n        try:\n            # Initialize speech synthesizer.\n            synthesizer = SpeechSynthesizer(speech_config=speech_config)\n\n            # Get the path to the output audio file.\n            file_path = os.path.join(output_folder, f\"{voice_name}_{gender}.wav\")\n\n            audio_config = AudioConfig(filename=file_path)\n\n            # Use the synthesizer with the specified audio configuration\n            synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)\n\n            # Synthesize the voice name to a file.\n            result = synthesizer.speak_text_async(example_text).get()\n\n            # Check the result and break the loop if successful.\n            if result.reason == ResultReason.SynthesizingAudioCompleted:\n                print(f\"Speech synthesized for voice {voice_name} and saved to {file_path}\")\n                return\n            elif result.reason == ResultReason.Canceled:\n                cancellation_details = result.cancellation_details\n                print(f\"Speech synthesis canceled: {cancellation_details.reason}\")\n                if cancellation_details.reason == CancellationReason.Error:\n                    if cancellation_details.error_details:\n                        print(f\"Error details: {cancellation_details.error_details}\")\n                        raise Exception(cancellation_details.error_details)\n        except Exception as e:\n            print(f\"An error occurred: {e}. Retrying in {retry_delay} seconds.\")\n            time.sleep(retry_delay)\n        \n\n    print(f\"Failed to synthesize voice {voice_name} after {max_retries} attempts.\")\n",
            "name_column": 4
        }

A snippet from the file content includes function definitions and comments in both English and Chinese. The original content has been corrupted with a series of ���� characters, which are indicative of encoding issues.

Solution Discussed:
To address this issue, Maybe using charset_normalizer to read the file in subsequent logic operations. This approach involves using charset_normalizer to re-read the file content successfully, detecting the correct encoding, and decoding the file content properly.

Proposed Changes to Workflow:

  • Integrate charset_normalizer into the file-reading step of the workflow to handle files with mixed or uncertain encodings.
  • Replace instances of direct file reading with charset_normalizer to ensure content is correctly decoded before processing.
  • Ensure all files are saved with a consistent encoding (utf-8 recommended) to prevent similar issues in the future.

Additional Context:
This solution aims to normalize the file content during the read operation without changing the initial file-saving behavior. By processing the encoding on read, we can handle files from various sources and encoding states more robustly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.