openbmb / repoagent Goto Github PK
View Code? Open in Web Editor NEWAn LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.
License: Apache License 2.0
An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.
License: Apache License 2.0
The generate_overall_structure
method in our codebase is currently processing a wider range of files than necessary. This behavior is leading to the inclusion of files from directories like .venv
and others that are not relevant to our intended use case.
Code Snippet:
def generate_overall_structure(self):
repo_structure = {}
for root, dirs, files in os.walk(self.repo_path):
for file in files:
if file.endswith('.py'):
relative_file_path = os.path.relpath(os.path.join(root, file), self.repo_path)
repo_structure[relative_file_path] = self.generate_file_structure(relative_file_path)
return repo_structure
The method traverses all directories within self.repo_path
, including those like .venv
. It adds all Python files to the repo_structure dictionary, regardless of whether they are part of the virtual environment or other non-essential directories.
Ideally, the method should ignore directories that are not relevant to the repository's core functionality, such as .venv
, __pycache__
, and others typically found in a Python project's .gitignore file.
We might need to integrate a filtering mechanism that aligns with the patterns specified in .gitignore, or explicitly define a list of directories to ignore during the traversal process.
This issue can lead to unnecessary bloating of the repo_structure and may also cause performance issues if the method processes a large number of irrelevant files.
Description:
Encountered a KeyError when accessing default_completion_kwargs
in chat_engine.py
.
Code Snippet:
model = self.config["default_completion_kwargs"]["model"]
Error Message:
File "ai_doc\chat_engine.py", line 103, in generate_doc
model = self.config["default_completion_kwargs"]["model"]
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'default_completion_kwargs'
Expected Behavior:
The default_completion_kwargs key should be present in the config dictionary.
Additional Improvement:
Propose to include a new function find_engine_or_model to efficiently search for 'engine' or 'model' keys in nested dictionaries. The function returns the first occurrence of either key.
def find_engine_or_model(data):
for first_level_key, first_level_value in data['api_keys'].items():
for item in first_level_value:
if 'engine' in item:
return item['engine']
elif 'model' in item:
return item['model']
return None
Hello, interesting project and architecture.
I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?
Also, why did you decide to use pre-commit hooks instead of pullling git repository with a scheduler. Llama index github reader could be leveraged in that case.
Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?
Thanks
This function actually add all the untracked files to git.
I guess this is not the expected behavior.
Description:
Encountered an AttributeError
in runner.py
after multiple attempts to process a long code snippet using the gpt-3.5-turbo-16k
model.
Error Messages:
Repeated errors indicating the model's maximum context length was exceeded:
Error: The model's maximum context length is exceeded. Reducing the length of the messages. Attempt 1 of 5
...
Error: The model's maximum context length is exceeded. Reducing the length of the messages. Attempt 5 of 5
Followed by an AttributeError
:
File "ai_doc\runner.py", line 341, in <module>
runner.run()
File "ai_doc\runner.py", line 165, in run
self.process_file_changes(repo_path, file_path, is_new_file)
File "ai_doc\runner.py", line 217, in process_file_changes
json_data[file_handler.file_path] = self.update_existing_item(json_data[file_handler.file_path], file_handler, changes_in_pyfile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "ai_doc\runner.py", line 298, in update_existing_item
future.result()
File "Python\Python311\Lib\concurrent\futures\_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "Python\Python311\Lib\concurrent\futures\_base.py", line 401, in __get_result
raise self._exception
File "Python\Python311\Lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "ai_doc\runner.py", line 308, in update_object
obj["md_content"] = response_message.content
^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'content'
Suspected Issue:
The response_message
object is None
, likely due to the previous errors where the model's maximum context length was exceeded. The code attempts to access the content
attribute of a NoneType
object, leading to the AttributeError
.
Suggested Solution:
response_message
is not None
before attempting to access its content
attribute. This could prevent the AttributeError
and provide a clearer indication of the underlying issue.Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.12/bin/repoagent", line 5, in
from repo_agent.main import app
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/repo_agent/main.py", line 28, in
repo_path: Annotated[str, typer.Option(prompt="Enter the path to your local repository")] = settings.repo_path ,
^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/dynaconf/base.py", line 145, in getattr
value = getattr(self._wrapped, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/dynaconf/base.py", line 328, in getattribute
return super().getattribute(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Settings' object has no attribute 'REPO_PATH'
报错信息
(venv) PS C:\git\RepoAgent> repoagent configure
C:\git\RepoAgent\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_validation.py:26: UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.
warnings.warn(
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\git\RepoAgent\venv\Scripts\repoagent.exe\__main__.py", line 4, in <module>
File "C:\git\RepoAgent\repo_agent\main.py", line 12, in <module>
from repo_agent.chat_with_repo import main as run_chat_with_repo
File "C:\git\RepoAgent\repo_agent\chat_with_repo\__init__.py", line 3, in <module>
from .main import main
File "C:\git\RepoAgent\repo_agent\chat_with_repo\main.py", line 3, in <module>
from repo_agent.settings import setting
File "C:\git\RepoAgent\repo_agent\settings.py", line 87, in <module>
setting = Setting.model_validate(_config_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\git\RepoAgent\venv\Lib\site-packages\pydantic\main.py", line 509, in model_validate
return cls.__pydantic_validator__.validate_python(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for Setting
chat_completion.openai_api_key
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
查了下已经修复了但是没有release
Hello, I loved the project and its workflow
I wanted to use AzureOpenAI instead of OpenAi. Can you please explain the process and template code for the same.
Please share you thoughts exploration process here. @OctoberFox11
I found the project_hierarchy.json file in file_handler.py repo and other repo as well. In the setting file, it is also initialised but i am not cleared from where it is written? I am facing issue with this.
Please help me to clear my doubt.
Thanks
Repo Agent本身的文档是否是由Repo Agent生成的?如果是的话在哪里可以看到?
在项目的 write_file
函数中,我们遇到了一个关于文件路径处理的问题。当尝试结合两个路径参数来创建一个文件时,如 /workspaces/AI_doc
和 /Markdown_Docs/ai_doc/runner.md
,函数错误地处理了这些路径,导致尝试在根目录下创建目录,从而出现权限错误。
具体表现为 PermissionError: [Errno 13] Permission denied: '/Markdown_Docs'
。这是因为第二个路径被误解释为绝对路径,而不是预期的相对路径。
def write_file(self, file_path, content):
"""
写入文件内容
Args:
repo_path (str): 仓库路径
file_path (str): 文件路径
content (str): 文件内容
"""
file_path = os.path.join(self.repo_path, file_path)
os.makedirs(os.path.dirname(file_path), exist_ok=True)
with open(file_path, 'w') as file:
file.write(content)
为解决这个问题,建议如下修改:
路径格式检查:确保 file_path
是相对路径。如果以 /
开头,则去除这个前导字符。
改进路径连接:在 os.path.join(self.repo_path, file_path)
调用中,正确地处理路径,确保所有路径部分正确地组合在一起。
目录创建逻辑:在尝试写入文件之前,确保所有中间目录已被创建,避免权限错误。
import os
def create_directory(base_path, file_path):
# 确保file_path是相对路径
if file_path.startswith('/'):
# 移除开头的 '/'
file_path = file_path[1:]
# 使用os.path.join连接路径
full_path = os.path.join(base_path, file_path)
# 提取目录部分
directory_path = os.path.dirname(full_path)
# 创建目录
os.makedirs(directory_path, exist_ok=True)
return directory_path
# 调用函数
base_path = '/workspaces/AI_doc'
file_path = '/Markdown_Docs/ai_doc/runner.md'
created_directory = create_directory(base_path, file_path)
print(f"Created directory: {created_directory}")
在install部分我有写到pre commit 的特性是有文件更改之后提交就会显示falied,需要手动 no verify再commit一次,这个体验就降低了一点,有没有更好的办法?
看起来似乎是需要一个类似于black的pre-commit实现的效果,即自动运行文档生成命令以后并返回正确的结果。
因此实际上这里主要是需要把命令行的返回值做很准确的调整。比如black的主函数自己准确地控制了给命令行返回的值。
动态更新文档chunks对应的向量
组织文档和代码块
代码整合
处理引用关系
大模型的总结与回答
Description:
Encountered a UnicodeDecodeError
while attempting to read content from a file that contains a mix of English and Chinese characters. The content was initially saved with utf-8
encoding but resulted in encoding errors when read back from the file.
Error Message:
File "AI_doc\ai_doc\runner.py", line 341, in <module>
runner.run()
File "AI_doc\ai_doc\runner.py", line 165, in run
self.process_file_changes(repo_path, file_path, is_new_file)
File "AI_doc\ai_doc\runner.py", line 225, in process_file_changes
markdown = file_handler.convert_to_markdown_file(file_path=file_handler.file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
json_data = json.load(f)
^^^^^^^^^^^^
File "Python\Python311\Lib\json\__init__.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 207: invalid start byte
Issue Details:
The error occurred in the convert_to_markdown_file
method, which suggests that the file content may have been incorrectly encoded or that the file contains a mix of encodings that are not properly handled by the standard utf-8
decoder.
Content Example:
"synthesize_voice": {
"type": "FunctionDef",
"name": "synthesize_voice",
"md_content": "**synthesize_voice����**���ú����Ĺ����ǽ�ָ���������ϳ�Ϊ�����������ϳɵ��������浽ָ�����ļ����С�\n\n�ú�������ϸ�������������£�\n\n- ���ȣ���voice_name_details��������ȡ�������ƺ��Ա���Ϣ���������ƺ��Ա���Ϣ֮��ʹ����������\"��\"��\"��\"���зָ���ͨ��rsplit�������������ƺ��Ա���Ϣ���룬��ʹ��rstrip����ȥ���Ա���Ϣĩβ���������š�Ȼ��ʹ��replace�������Ա���Ϣ�е�\"Ů��\"�滻Ϊ\"Ů\"����\"��ͯ\"�滻Ϊ\"ͯ\"���Լ��Ա�ı�ʾ��ʽ��\n\n- ���������������������õ�speech_config�����speech_synthesis_voice_name�����У��Ա��������ϳ�ʱʹ��ָ����������\n\n- Ȼ��ʹ��ѭ������������Դ����ij��ԡ�\n\n- ��ÿ�γ����У����ȳ�ʼ��SpeechSynthesizer��������speech_config������\n\n- Ȼ��ʹ��os.path.join����������ļ��к��������ơ��Ա�ƴ�ӳ������Ƶ�ļ���·����\n\n- ���ţ�����AudioConfig�����ļ�·������filename������\n\n- Ȼ��ʹ��ָ����audio_config������ʼ��SpeechSynthesizer����\n\n- ����SpeechSynthesizer�����speak_text_async��������Ҫ�ϳɵ��ı���Ϊ�������룬��ʹ��get������ȡ�ϳɽ����\n\n- ���ϳɽ����reason���ԣ�����ϳɳɹ������ӡ�ϳɳɹ�����ʾ��Ϣ�������ء�\n\n- ����ϳɱ�ȡ�������ӡȡ����ԭ������ȡ����ԭ�������Ӧ�Ĵ�����\n\n- ��������쳣�����ӡ�쳣��Ϣ������ָ���������ӳ�ʱ���������ԡ�\n\n- ����ﵽ������Դ�����Ȼ���ϳ����������ӡ�ϳ�ʧ�ܵ���ʾ��Ϣ��\n\n**ע��**��ʹ�øú���ʱ��Ҫע�����¼��㣺\n- ��Ҫ�ṩ�ϳ��������������ƺ��Ա���Ϣ��\n- ��Ҫ�ṩSpeechConfig������Ϊ���������ڸö��������ú��ʵ�������Ϣ��\n- ��Ҫ�ṩ����ļ��е�·����\n- ��Ҫָ��������Դ����������ӳ�ʱ�䡣\n\n**���ʾ��**������ɹ��ϳ��������������浽��ָ�����ļ����С�",
"code_start_line": 42,
"code_end_line": 85,
"parent": null,
"have_return": true,
"code_content": "def synthesize_voice(voice_name_details, speech_config, output_folder, max_retries, retry_delay):\n # Extract voice name and gender from the details\n voice_name, gender = voice_name_details.rsplit('��', 1)\n gender = gender.rstrip('��')\n gender = gender.replace('Ů��', 'Ů').replace('��ͯ', 'ͯ') # Simplify gender notation\n\n # Set the voice name in the speech config.\n speech_config.speech_synthesis_voice_name = f\"zh-CN-{voice_name}\"\n\n for attempt in range(max_retries):\n try:\n # Initialize speech synthesizer.\n synthesizer = SpeechSynthesizer(speech_config=speech_config)\n\n # Get the path to the output audio file.\n file_path = os.path.join(output_folder, f\"{voice_name}_{gender}.wav\")\n\n audio_config = AudioConfig(filename=file_path)\n\n # Use the synthesizer with the specified audio configuration\n synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)\n\n # Synthesize the voice name to a file.\n result = synthesizer.speak_text_async(example_text).get()\n\n # Check the result and break the loop if successful.\n if result.reason == ResultReason.SynthesizingAudioCompleted:\n print(f\"Speech synthesized for voice {voice_name} and saved to {file_path}\")\n return\n elif result.reason == ResultReason.Canceled:\n cancellation_details = result.cancellation_details\n print(f\"Speech synthesis canceled: {cancellation_details.reason}\")\n if cancellation_details.reason == CancellationReason.Error:\n if cancellation_details.error_details:\n print(f\"Error details: {cancellation_details.error_details}\")\n raise Exception(cancellation_details.error_details)\n except Exception as e:\n print(f\"An error occurred: {e}. Retrying in {retry_delay} seconds.\")\n time.sleep(retry_delay)\n \n\n print(f\"Failed to synthesize voice {voice_name} after {max_retries} attempts.\")\n",
"name_column": 4
}
A snippet from the file content includes function definitions and comments in both English and Chinese. The original content has been corrupted with a series of ����
characters, which are indicative of encoding issues.
Solution Discussed:
To address this issue, Maybe using charset_normalizer
to read the file in subsequent logic operations. This approach involves using charset_normalizer to re-read the file content successfully, detecting the correct encoding, and decoding the file content properly.
Proposed Changes to Workflow:
charset_normalizer
into the file-reading step of the workflow to handle files with mixed or uncertain encodings.charset_normalizer
to ensure content is correctly decoded before processing.utf-8
recommended) to prevent similar issues in the future.Additional Context:
This solution aims to normalize the file content during the read operation without changing the initial file-saving behavior. By processing the encoding on read, we can handle files from various sources and encoding states more robustly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.