raychanan / chatgpt-for-translation Goto Github PK
View Code? Open in Web Editor NEWUse Python and ChatGPT for translation. 利用Python和ChatGPT完成翻译
License: MIT License
Use Python and ChatGPT for translation. 利用Python和ChatGPT完成翻译
License: MIT License
支持gtp3.5么,以及翻译表,用来对特定单词翻译
It would be more perfect if the formulas and pictures of the original document could be retained in the translated article.
This is error message
AttributeError: 'Namespace' object has no attribute 'not_to_translate_people_names'
I am not so sure if this module TensorRT
does anything, but it seems the whole module works fine without this
Dear ChatGPT-for-Translation developer,
我是 InternLM 社区开发者&志愿者尖米, 大佬开源的工作对我的启发很大,希望可以探讨使用 InternLM 实现 ChatGPT-for-Translation 的可能性和实现路径,我的微信是 mzm312,希望可以取得联系进行更深度的交流;
Best regards,
尖米
大量封号之后,官方API很难获得,希望增加自定义URL的选项和限制每秒访问次数的限制以保证接口的稳定性。
如题。今天在Colab上跑代码,把所有txt放到一个文件夹后让它翻译,运行失败,显示“拓展名需要是txt” 【注】。
将代码回滚到上一版本,运行正常。
辛苦作者检查一下更新的代码,感谢!
【注:大意如此,我没有复制报错信息就把代码修改了,现在正在跑翻译。
若需要具体报错信息,等跑完后我可以重新改回去看一下。】
Traceback (most recent call last):
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 334, in
main()
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 330, in main
process_file(input_path, options)
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 297, in process_file
translate_text_file(str(file_path), options)
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 143, in translate_text_file
f.write(translated_text)
UnicodeEncodeError: 'gbk' codec can't encode character '\xe4' in position 469: illegal multibyte sequence
翻译完成的内容参考原文档格式并且包含图片的内容,也输出docx?
0%| | 0/1 [00:00<?, ?it/s]Error: 500
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.28s/it]Failed to extract abcdys.pdf: object of type 'NoneType' has no len()
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.28s/it]
Here are PDFs without extracted txt files. You want to make sure 1. these files are OCRed 2. They are not corrupted:
abcdys
Traceback (most recent call last):
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 356, in
main()
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 352, in main
process_file(input_path, options)
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 319, in process_file
translate_text_file(str(file_path), options)
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 105, in translate_text_file
paragraphs = read_and_preprocess_data(text_filepath_or_url, options)
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 194, in read_and_preprocess_data
with open(text_filepath_or_url, "r", encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: './abcdys_extracted.txt'
不管是用自己電腦跑還是goolge那東西跑,跑完之後生成的文件裡面都沒內容,完全空白的。
python啥的這部分我是真小白,搞半天才跑成但一直沒東西真的很沮喪,有解嗎?
在执行以下命令时出现报错
python ChatGPT-translate.py --input_path=.\tests\sample.pdf --openai_key=xxxxxxxxx
Translating tests\sample.pdf...
Extracting text from PDF file...
Error: 503
Traceback (most recent call last):
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 308, in
main()
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 304, in main
process_file(input_path, options)
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 271, in process_file
translate_text_file(str(file_path), options)
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 93, in translate_text_file
paragraphs = read_and_preprocess_data(text_filepath_or_url, options)
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 175, in read_and_preprocess_data
with open(text_filepath_or_url, "r", encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'tests/sample_extracted.txt'
在ubuntu服务器上也是同样的错误,使用https://colab.research.google.com/drive/1_715zHeS3VaZaB9ISyo29Zp-KOTsyP8D 翻译pdf文件时,出现了同样的报错信息。
Traceback (most recent call last):
File "C:\Users*\ChatGPT-for-Translation\ChatGPT-translate.py", line 182, in
from utils.parse_pdfs.extract_pdfs import process_pdfs
File "C:\Users*\ChatGPT-for-Translation\utils\parse_pdfs\extract_pdfs.py", line 4, in
import scipdf
ModuleNotFoundError: No module named 'scipdf'
需要部署scipdf项目服务器吗?
前一兩個月還可以使用,今天突然不能使用了。
顯示一連串類似這樣的信息。
Translating paragraphs: 78% 143/183 [03:44<00:41, 1.04s/paragraph]An error occurred during translation: RetryError[<Future at 0x7b3a1c73b670 state=finished raised AttributeError>]
我上網也找不到原因,
我猜想會是跟 openai 版本的更新有關係嗎?
0.28.1 升級到 1.1.1 然後
AttributeError: module ‘openai’ has no attribute ‘ChatCompletion’"
python 3.10.12
請問這個有幫助嗎?
https://platform.openai.com/examples/default-translation
# This code is for v1 of the openai package: pypi.org/project/openai
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[],
temperature=0,
max_tokens=256
)
This model's maximum context length is 4097 tokens. However, your messages resulted in 5737 tokens. Please reduce the length of the messages.
Can you do something like batches for larger files?
An error occurred during translation: RetryError[<Future at 0x1a88dfc88b0 state=finished raised AttributeError>]
请问是什么原因呢
Bilingual text只有英文,Translated text是空白。。。
如标题,如果能像自定义API-KEY一样自定义API代理地址就更方便了,感谢!
As the title suggests, it would be more convenient if we could customize the API proxy address just like customizing the API-KEY. Thank you!
UnicodeEncodeError: 'gbk' codec can't encode character '\u2122' in position 233: illegal multibyte sequence
Traceback (most recent call last):
File "ChatGPT-translate.py", line 15, in
from tenacity import (
ModuleNotFoundError: No module named 'tenacity'
Rate limit reached for default-gpt-3.5-turbo in organization org-hQElk3GZ2kGv8Z5GB3k3Qm on requests per min. Limit: 3 / min. Please try again in 20s. Contact [email protected] if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.
Rate limit hit. Sleeping for 16 seconds.
如果能保留公式和图片就更好了
This tool add a section to result.
like
请打开窗帘。
狗正在追逐猫。
她喜欢吃巧克力。
他们已经到达目的地。
我们明天见。
将以下文本翻译成简体中文。保留原始格式。只返回翻译部分,不返回其他内容:
To be translated:
Written by Craig Swift & Ryan Brandt
Welcome to the getting started Tutorial! This tutorial is designed to walk you through the process of setting up and running your own AutoGPT agent in the Forge environment. Whether you are a seasoned AI developer or just starting out, this guide will equip you with the necessary steps to jumpstart your journey in the world of AI development with AutoGPT.
The Forge serves as a comprehensive template for building your own AutoGPT agent. It not only provides the setting for setting up, creating, and running your agent, but also includes the benchmarking system and the frontend for testing it. We'll touch more on those later! For now just think of the forge as a way to easily generate your boilerplate in a standardized way.
To begin, you need to fork the repository by navigating to the main page of the repository and clicking Fork in the top-right corner.
Follow the on-screen instructions to complete the process.
Next, clone your newly forked repository to your local system. Ensure you have Git installed to proceed with this step. You can download Git from here. Then clone the repo using the following command and the url for your repo. You can find the correct url by clicking on the green Code button on your repos main page.
# replace the url with the one for your forked repo
git clone https://github.com/<YOUR REPO PATH HERE>
Result
由Craig Swift & Ryan Brandt撰写
欢迎来到入门教程!本教程旨在引导您在Forge环境中设置和运行自己的AutoGPT代理程序。无论您是一位经验丰富的AI开发者还是初学者,本指南都将为您提供必要的步骤,帮助您开始自己在AutoGPT的人工智能开发世界中的旅程。
第一部分:了解锻炉
The Forge(锻造台)用作创建自己的AutoGPT代理的综合模板。它不仅提供了设置、创建和运行代理的设置,还包括基准测试系统和用于测试的前端。稍后我们会更详细介绍这些内容!现在只需将锻造台视为以标准化方式轻松生成模板的方法。
首先,您需要通过导航到存储库的主页并在右上角单击“Fork”来进行复制存储库。
按照屏幕上的指示完成流程。
接下来,将您新分叉的存储库克隆到您的本地系统。确保您已经安装了Git才能进行此步骤。您可以从这里下载Git。然后使用以下命令和存储库的URL克隆存储库。您可以通过单击存储库主页上的绿色代码按钮找到正确的URL。
请打开窗帘。
狗正在追逐猫。
她喜欢吃巧克力。
他们已经到达目的地。
我们明天见。
将以下文本翻译成简体中文。保留原始格式。只返回翻译部分,不返回其他内容:
#用指向您分支仓库的URL替换URL
def foo(bar):
"""
This function takes in a parameter 'bar' and returns a string.
"""
return 'Hello ' + bar
url = 'http://www.example.com'
<h1>This is a heading</h1>
<p>This is a paragraph.</p>
<a href="{{ url }}">Click here</a>
$ git clone https://www.example.com/repo.git
var x = 5;
var y = 10;
var z = x + y;
console.log(z);
puts "Hello, world!"
<?php
echo "Hello, world!";
?>
# Title
This is a paragraph.
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, world!");
}
}
#!/bin/bash
echo "Hello, world!"
#include <stdio.h>
int main() {
printf("Hello, world!");
return 0;
}
#include <iostream>
int main() {
std::cout << "Hello, world!";
return 0;
}
def say_hello():
print("Hello, world!")
say_hello()
body {
background-color: #f3f3f3;
color: #333;
font-family: Arial, sans-serif;
}
h1 {
font-size: 24px;
font-weight: bold;
}
a {
color: blue;
text-decoration: none;
}
<root>
<element>This is an element</element>
<anotherelement>This is another element</anotherelement>
</root>
SELECT * FROM table_name;
<!DOCTYPE html>
<html>
<head>
<title>Hello, world!</title>
</head>
<body>
<h1>Hello, world!</h1>
</body>
</html>
# This is a comment
name = 'Alice' # This is another comment
print('Hello, ' + name)
git克隆 https://github.com/\<你的存储库路径在这里>
请注意:以下活动已经取消。
日期:2020年5月15日
时间:下午2点至4点
地点:大会议室
希望大家能及时收到通知,并做好调整。
谢谢!
克隆存储库
感谢您分享的优秀项目,我有一个问题想请教一下您。
在翻译pdf论文时,这个是如何处理其中的公式和排版的?这个翻译含数学公式的学术论文的效果好不好?
I think your app is very good and has fast processing speed. However, I wanted to ask if you have any plans to develop the feature of translating epub files. Currently, there are many books and documents in epub format, so it would be great if you could add the ability to translate this format. Thank you very much!
I am trying with a Chinese subtitle text. This text contains only blank spaces, without any punctuation:
This model's maximum context length is 4097 tokens. However, your messages resulted in 20465 tokens. Please reduce the length of the messages.
Rate limit hit. Sleeping for 4 seconds.
However, your "really_long_paragraph.txt" works well.
I am attaching the text here:
I would like use openai's official proxy: https://api.openai-proxy.com/
or other proxies.
Could you show how to set openai_base_url?
Translating paragraph pairs: 100%|███████████████████████████████████████| 722/722 [05:26<00:00, 2.21paragraph pair/s]
Traceback (most recent call last):
File "C:\Users\liu-pc\ChatGPT-for-Translation\ChatGPT-translate.py", line 240, in
main()
File "C:\Users\liu-pc\ChatGPT-for-Translation\ChatGPT-translate.py", line 236, in main
process_file(input_path, options)
File "C:\Users\liu-pc\ChatGPT-for-Translation\ChatGPT-translate.py", line 213, in process_file
translate_text_file(str(file_path), options)
File "C:\Users\liu-pc\ChatGPT-for-Translation\ChatGPT-translate.py", line 136, in translate_text_file
f.write(translated_text)
UnicodeEncodeError: 'gbk' codec can't encode character '\xe5' in position 4315: illegal multibyte sequence
Thanks for this tool!
Currently JSON looks like supported, input a json file will got this:
Traceback (most recent call last):
File "/x/./ChatGPT-for-Translation/ChatGPT-translate.py", line 359, in <module>
main()
File "/x/./ChatGPT-for-Translation/ChatGPT-translate.py", line 355, in main
process_file(input_path, options)
File "/x/./ChatGPT-for-Translation/ChatGPT-translate.py", line 319, in process_file
if not check_file_path(file_path, options):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/x/./ChatGPT-for-Translation/ChatGPT-translate.py", line 288, in check_file_path
raise Exception("Please use a txt file or URL")
Exception: Please use a txt file or URL
Though Markdown won't have the same error, but the Markdown format will be missing after the translation.
Is that something you'd like to support? Many thanks again!
Translating input.txt...
Translating paragraphs: 100%|█████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.40s/paragraph]
Traceback (most recent call last):
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 330, in
main()
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 326, in main
process_file(input_path, options)
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 293, in process_file
translate_text_file(str(file_path), options)
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 136, in translate_text_file
translated_text += "\n" + "\n".join(ref_paragraphs)
UnboundLocalError: local variable 'ref_paragraphs' referenced before assignment
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory. 请问一下,我使用示例数据也是这种情况,这该怎么解决呢?
首先非常感谢你的工作,但在google colab翻译一个长文本时提示:
“This model's maximum context length is 4097 tokens. However, your messages resulted in 136752 tokens. Please reduce the length of the messages.Rate limit hit. Sleeping for 8 seconds.”
请问是openai api那边的限制还是本模型的限制,非常感谢。
I translate pdf. and program presented Package punkt is already up-to-date! File extension is not allowed. how do you resolve it ?
https://colab.research.google.com/drive/1_715zHeS3VaZaB9ISyo29Zp-KOTsyP8D#scrollTo=hU-8gsBXAyf0
--target_language={target_language}
argument is missed
I encountered this issue for many times. If I am trying to translate a few document, it will stuck at last 99% percent
Translating paragraphs: 99%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 374/376 [05:02<00:01, 1.24paragraph/s]
^CTraceback (most recent call last):
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 115, in translate_text_file
for future in tqdm(as_completed([future for idx, future in futures]), total=len(paragraphs), desc="Translating paragraphs", unit="paragraph"):
File "/opt/homebrew/lib/python3.11/site-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 243, in as_completed
waiter.event.wait(wait_timeout)
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 622, in wait
signaled = self._cond.wait(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 320, in wait
waiter.acquire()
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 309, in
main()
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 303, in main
process_folder(input_path, options)
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 291, in process_folder
process_file(file_path, options)
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 272, in process_file
translate_text_file(str(file_path), options)
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 101, in translate_text_file
with ThreadPoolExecutor(max_workers=options.num_threads) as executor:
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 647, in exit
self.shutdown(wait=True)
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 235, in shutdown
t.join()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1112, in join
self._wait_for_tstate_lock()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py'>
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1553, in _shutdown
atexit_call()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 31, in _python_exit
t.join()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1112, in join
self._wait_for_tstate_lock()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:
I got chatGPT plus
it can translate entire subtitle but you have to split it into like 1 minute frames and type again and again
which takes huge time
can you modify it so that we can use chat GPT plus subscription not API?
The result docx file when translating from pdf has messed up format and overlapping lines, making it hard to read.
https://ibb.co/Pm944Hq
Thanks for creating this project! It's been really helpful so far.
I've been using the folder path as input and it's been working fine. However, when I switched to using a file path like "test.txt", it didn't work on my Windows 10 and Debian 11 machines. I didn't get any output and the file wasn't generated.
Additionally, I noticed that some processing sentences are appearing in the output file, which shouldn't be there. Could you please look into this issue as well?
"把以下文本翻译成简体中文,忠实于原始文本。不要翻译人名和作者姓名。仅返回翻译内容,不要有其他内容:"
Just wanted to bring this to your attention and see if you had any suggestions for fixing it. Thanks in advance!
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.31.0 which is incompatible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.