raychanan / chatgpt-for-translation Goto Github PK

View Code? Open in Web Editor NEW

385.0 385.0 82.0 31.32 MB

Use Python and ChatGPT for translation. 利用Python和ChatGPT完成翻译

License: MIT License

Python 100.00%

chatgpt-for-translation's People

Contributors

Stargazers

Watchers

Forkers

tongzhao9417 lw9726 adambear www6v chenminwz2023 devnhatho hyyz17200 kukupigs 0xdigiscore fanan-uyun wesmare peterdavehello pp2024 lai-flow kinson08 5918189 chunfengdu azt143 owcako xingxin666 xingxin2017 c00renut flowerjunjie hwdong-net lejvn michaelchen911 ribuluolwj yuith wwjcmp zhongpei erikthought calviinhsu nomoremonkey bear1990jx myrena bnationsdev antare74 evaying0110 dst1213 yijun4 distributorship cloudinfo-github rainberg gxiao2890 majeste116 cg-lin vuthric liqigku ifredom jihanlingyu haanamomo baifengbai zahra1379namaki mflmn kailaiwang1 xlaoshu zszczh qwe030609 hawkit sarahzxi cinemacafe creeperssun scott1984 doutianbao florencioq linda0608 jst86 jiackylee duhengqi erikluo vettalwu jiangwen0105 humantang khcloud-python crosswordapp lvm0306 dadaomamao g-nogueira rainser duanluong

chatgpt-for-translation's Issues

Formulas and pictures

It would be more perfect if the formulas and pictures of the original document could be retained in the translated article.

似乎源代碼的ThreadPoolExecutor Function參數出了問題

This is error message
AttributeError: 'Namespace' object has no attribute 'not_to_translate_people_names'

[Error] TF-TRT Warning: Could not find TensorRT

I am not so sure if this module TensorRT does anything, but it seems the whole module works fine without this

[Feature Request] Support InternLM

Dear ChatGPT-for-Translation developer,

我是 InternLM 社区开发者&志愿者尖米, 大佬开源的工作对我的启发很大，希望可以探讨使用 InternLM 实现 ChatGPT-for-Translation 的可能性和实现路径，我的微信是 mzm312，希望可以取得联系进行更深度的交流；

Best regards,
尖米

希望支持第三方API和设置速率限制

大量封号之后，官方API很难获得，希望增加自定义URL的选项和限制每秒访问次数的限制以保证接口的稳定性。

代码更新后txt翻译失败

如题。今天在Colab上跑代码，把所有txt放到一个文件夹后让它翻译，运行失败，显示“拓展名需要是txt” 【注】。
将代码回滚到上一版本，运行正常。

辛苦作者检查一下更新的代码，感谢！

【注：大意如此，我没有复制报错信息就把代码修改了，现在正在跑翻译。
若需要具体报错信息，等跑完后我可以重新改回去看一下。】

Traceback (most recent call last):
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 334, in
main()
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 330, in main
process_file(input_path, options)
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 297, in process_file
translate_text_file(str(file_path), options)
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 143, in translate_text_file
f.write(translated_text)
UnicodeEncodeError: 'gbk' codec can't encode character '\xe4' in position 469: illegal multibyte sequence

带公式的pdf效果如何

能支持docx文档吗？

翻译完成的内容参考原文档格式并且包含图片的内容，也输出docx？

Failed to extract xx.pdf: object of type 'NoneType' has no len()

0%| | 0/1 [00:00<?, ?it/s]Error: 500
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.28s/it]Failed to extract abcdys.pdf: object of type 'NoneType' has no len()
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.28s/it]

Here are PDFs without extracted txt files. You want to make sure 1. these files are OCRed 2. They are not corrupted:
abcdys
Traceback (most recent call last):
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 356, in
main()
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 352, in main
process_file(input_path, options)
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 319, in process_file
translate_text_file(str(file_path), options)
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 105, in translate_text_file
paragraphs = read_and_preprocess_data(text_filepath_or_url, options)
File "/Users/charlesthomas/gh/ChatGPT-for-Translation/ChatGPT-translate.py", line 194, in read_and_preprocess_data
with open(text_filepath_or_url, "r", encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: './abcdys_extracted.txt'

進度條跑完後生成的txt以及docx裡面都沒東西

不管是用自己電腦跑還是goolge那東西跑，跑完之後生成的文件裡面都沒內容，完全空白的。
python啥的這部分我是真小白，搞半天才跑成但一直沒東西真的很沮喪，有解嗎？

翻译PDF文件时出现报错

在执行以下命令时出现报错
python ChatGPT-translate.py --input_path=.\tests\sample.pdf --openai_key=xxxxxxxxx
Translating tests\sample.pdf...
Extracting text from PDF file...
Error: 503
Traceback (most recent call last):
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 308, in
main()
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 304, in main
process_file(input_path, options)
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 271, in process_file
translate_text_file(str(file_path), options)
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 93, in translate_text_file
paragraphs = read_and_preprocess_data(text_filepath_or_url, options)
File "C:\ChatGPT-for-Translation\ChatGPT-translate.py", line 175, in read_and_preprocess_data
with open(text_filepath_or_url, "r", encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'tests/sample_extracted.txt'

在ubuntu服务器上也是同样的错误，使用https://colab.research.google.com/drive/1_715zHeS3VaZaB9ISyo29Zp-KOTsyP8D 翻译pdf文件时，出现了同样的报错信息。

No module named 'scipdf'

Traceback (most recent call last):
File "C:\Users*\ChatGPT-for-Translation\ChatGPT-translate.py", line 182, in
from utils.parse_pdfs.extract_pdfs import process_pdfs
File "C:\Users*\ChatGPT-for-Translation\utils\parse_pdfs\extract_pdfs.py", line 4, in
import scipdf
ModuleNotFoundError: No module named 'scipdf'
需要部署scipdf项目服务器吗？

RetryError state=finished raised AttributeError

前一兩個月還可以使用，今天突然不能使用了。
顯示一連串類似這樣的信息。

Translating paragraphs: 78% 143/183 [03:44<00:41, 1.04s/paragraph]An error occurred during translation: RetryError[<Future at 0x7b3a1c73b670 state=finished raised AttributeError>]

我上網也找不到原因，
我猜想會是跟 openai 版本的更新有關係嗎？
0.28.1 升級到 1.1.1 然後
AttributeError: module ‘openai’ has no attribute ‘ChatCompletion’"

python 3.10.12

請問這個有幫助嗎？
https://platform.openai.com/examples/default-translation

# This code is for v1 of the openai package: pypi.org/project/openai
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4",
  messages=[],
  temperature=0,
  max_tokens=256
)

跑完后生成的文件空白，跑的过程中控制台输出An error occurred during translation: RetryError[<Future at 0x25dc6ab8450 state=finished raised InvalidRequestError>]

Please reduce the length of the messages.

This model's maximum context length is 4097 tokens. However, your messages resulted in 5737 tokens. Please reduce the length of the messages.

Can you do something like batches for larger files?

运行时报错

An error occurred during translation: RetryError[<Future at 0x1a88dfc88b0 state=finished raised AttributeError>]
请问是什么原因呢

翻译的结果是空白

Bilingual text只有英文，Translated text是空白。。。

希望能支持自定义openaiAPI代理地址

如标题，如果能像自定义API-KEY一样自定义API代理地址就更方便了，感谢！
As the title suggests, it would be more convenient if we could customize the API proxy address just like customizing the API-KEY. Thank you!

encoding issue

UnicodeEncodeError: 'gbk' codec can't encode character '\u2122' in position 233: illegal multibyte sequence

为啥报错了呀

Traceback (most recent call last):
File "ChatGPT-translate.py", line 15, in
from tenacity import (
ModuleNotFoundError: No module named 'tenacity'

是不是现在chatgpt API 有限制了，我这边翻译不了了，一定要GPT plus嘛？

Rate limit reached for default-gpt-3.5-turbo in organization org-hQElk3GZ2kGv8Z5GB3k3Qm on requests per min. Limit: 3 / min. Please try again in 20s. Contact [email protected] if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.
Rate limit hit. Sleeping for 16 seconds.

能支持JSON文件吗

保留公式和图片

如果能保留公式和图片就更好了

Translation added extra text for code section

This tool add a section to result.

like
请打开窗帘。
狗正在追逐猫。
她喜欢吃巧克力。
他们已经到达目的地。
我们明天见。
将以下文本翻译成简体中文。保留原始格式。只返回翻译部分，不返回其他内容：

To be translated:

AutoGPT Forge Part 1: A Comprehensive Guide to Your First Steps

Written by Craig Swift & Ryan Brandt

Welcome to the getting started Tutorial! This tutorial is designed to walk you through the process of setting up and running your own AutoGPT agent in the Forge environment. Whether you are a seasoned AI developer or just starting out, this guide will equip you with the necessary steps to jumpstart your journey in the world of AI development with AutoGPT.

Section 1: Understanding the Forge

The Forge serves as a comprehensive template for building your own AutoGPT agent. It not only provides the setting for setting up, creating, and running your agent, but also includes the benchmarking system and the frontend for testing it. We'll touch more on those later! For now just think of the forge as a way to easily generate your boilerplate in a standardized way.

Section 2: Setting up the Forge Environment

To begin, you need to fork the repository by navigating to the main page of the repository and clicking Fork in the top-right corner.

Follow the on-screen instructions to complete the process.

Cloning the Repository

Next, clone your newly forked repository to your local system. Ensure you have Git installed to proceed with this step. You can download Git from here. Then clone the repo using the following command and the url for your repo. You can find the correct url by clicking on the green Code button on your repos main page.

# replace the url with the one for your forked repo
git clone https://github.com/<YOUR REPO PATH HERE>

Result

AutoGPT Forge 第1部分：全面指南帮你迈出第一步

由Craig Swift & Ryan Brandt撰写
欢迎来到入门教程！本教程旨在引导您在Forge环境中设置和运行自己的AutoGPT代理程序。无论您是一位经验丰富的AI开发者还是初学者，本指南都将为您提供必要的步骤，帮助您开始自己在AutoGPT的人工智能开发世界中的旅程。
第一部分：了解锻炉
The Forge（锻造台）用作创建自己的AutoGPT代理的综合模板。它不仅提供了设置、创建和运行代理的设置，还包括基准测试系统和用于测试的前端。稍后我们会更详细介绍这些内容！现在只需将锻造台视为以标准化方式轻松生成模板的方法。

第二节：设置铸造环境

首先，您需要通过导航到存储库的主页并在右上角单击“Fork”来进行复制存储库。

按照屏幕上的指示完成流程。

克隆存储库

接下来，将您新分叉的存储库克隆到您的本地系统。确保您已经安装了Git才能进行此步骤。您可以从这里下载Git。然后使用以下命令和存储库的URL克隆存储库。您可以通过单击存储库主页上的绿色代码按钮找到正确的URL。

请打开窗帘。
狗正在追逐猫。
她喜欢吃巧克力。
他们已经到达目的地。
我们明天见。
将以下文本翻译成简体中文。保留原始格式。只返回翻译部分，不返回其他内容：
＃用指向您分支仓库的URL替换URL

def foo(bar):
    """
    This function takes in a parameter 'bar' and returns a string.
    """
    return 'Hello ' + bar
url = 'http://www.example.com'

<h1>This is a heading</h1>
<p>This is a paragraph.</p>
<a href="{{ url }}">Click here</a>

$ git clone https://www.example.com/repo.git

var x = 5;
var y = 10;
var z = x + y;
console.log(z);

puts "Hello, world!"

<?php
echo "Hello, world!";
?>

# Title
This is a paragraph.

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, world!");
    }
}

#!/bin/bash
echo "Hello, world!"

#include <stdio.h>
int main() {
   printf("Hello, world!");
   return 0;
}

#include <iostream>
int main() {
    std::cout << "Hello, world!";
    return 0;
}

def say_hello():
    print("Hello, world!")
say_hello()

body {
    background-color: #f3f3f3;
    color: #333;
    font-family: Arial, sans-serif;
}
h1 {
    font-size: 24px;
    font-weight: bold;
}
a {
    color: blue;
    text-decoration: none;
}

<root>
    <element>This is an element</element>
    <anotherelement>This is another element</anotherelement>
</root>

SELECT * FROM table_name;

<!DOCTYPE html>
<html>
<head>
    <title>Hello, world!</title>
</head>
<body>
    <h1>Hello, world!</h1>
</body>
</html>

# This is a comment
name = 'Alice'  # This is another comment
print('Hello, ' + name)

git克隆 https://github.com/\<你的存储库路径在这里>
请注意：以下活动已经取消。
日期：2020年5月15日
时间：下午2点至4点
地点：大会议室
希望大家能及时收到通知，并做好调整。
谢谢！
克隆存储库

关于翻译pdf论文相关问题

感谢您分享的优秀项目，我有一个问题想请教一下您。
在翻译pdf论文时，这个是如何处理其中的公式和排版的？这个翻译含数学公式的学术论文的效果好不好？

Translate epub

I think your app is very good and has fast processing speed. However, I wanted to ask if you have any plans to develop the feature of translating epub files. Currently, there are many books and documents in epub format, so it would be great if you could add the ability to translate this format. Thank you very much!

num_threads修改为1了也还是报错

split paragraph function seems not work with pure chinese text

I am trying with a Chinese subtitle text. This text contains only blank spaces, without any punctuation:

This model's maximum context length is 4097 tokens. However, your messages resulted in 20465 tokens. Please reduce the length of the messages.
Rate limit hit. Sleeping for 4 seconds.

However, your "really_long_paragraph.txt" works well.

I am attaching the text here:

longchinese.txt

遇到了一个报错

Proxy

I would like use openai's official proxy: https://api.openai-proxy.com/
or other proxies.
Could you show how to set openai_base_url?

UnicodeEncodeError: 'gbk' codec can't encode character '\xe5' in position 4315: illegal multibyte sequence

Translating paragraph pairs: 100%|███████████████████████████████████████| 722/722 [05:26<00:00, 2.21paragraph pair/s]
Traceback (most recent call last):
File "C:\Users\liu-pc\ChatGPT-for-Translation\ChatGPT-translate.py", line 240, in
main()
File "C:\Users\liu-pc\ChatGPT-for-Translation\ChatGPT-translate.py", line 236, in main
process_file(input_path, options)
File "C:\Users\liu-pc\ChatGPT-for-Translation\ChatGPT-translate.py", line 213, in process_file
translate_text_file(str(file_path), options)
File "C:\Users\liu-pc\ChatGPT-for-Translation\ChatGPT-translate.py", line 136, in translate_text_file
f.write(translated_text)
UnicodeEncodeError: 'gbk' codec can't encode character '\xe5' in position 4315: illegal multibyte sequence

Support Markdown and JSON files?

Thanks for this tool!

Currently JSON looks like supported, input a json file will got this:

Traceback (most recent call last):
  File "/x/./ChatGPT-for-Translation/ChatGPT-translate.py", line 359, in <module>
    main()
  File "/x/./ChatGPT-for-Translation/ChatGPT-translate.py", line 355, in main
    process_file(input_path, options)
  File "/x/./ChatGPT-for-Translation/ChatGPT-translate.py", line 319, in process_file
    if not check_file_path(file_path, options):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/x/./ChatGPT-for-Translation/ChatGPT-translate.py", line 288, in check_file_path
    raise Exception("Please use a txt file or URL")
Exception: Please use a txt file or URL

Though Markdown won't have the same error, but the Markdown format will be missing after the translation.

Is that something you'd like to support? Many thanks again!

UnboundLocalError: local variable 'ref_paragraphs' referenced before assignment

Translating input.txt...
Translating paragraphs: 100%|█████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.40s/paragraph]
Traceback (most recent call last):
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 330, in
main()
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 326, in main
process_file(input_path, options)
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 293, in process_file
translate_text_file(str(file_path), options)
File "D:\EdgeDownload\ChatGPT-for-Translation\ChatGPT-translate.py", line 136, in translate_text_file
translated_text += "\n" + "\n".join(ref_paragraphs)
UnboundLocalError: local variable 'ref_paragraphs' referenced before assignment

OSError: [E050] Can't find model 'en_core_web_sm'.

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory. 请问一下，我使用示例数据也是这种情况，这该怎么解决呢？

成功运行，但字数受限

首先非常感谢你的工作，但在google colab翻译一个长文本时提示：
“This model's maximum context length is 4097 tokens. However, your messages resulted in 136752 tokens. Please reduce the length of the messages.Rate limit hit. Sleeping for 8 seconds.”
请问是openai api那边的限制还是本模型的限制，非常感谢。

Package punkt is already up-to-date! File extension is not allowed.

I translate pdf. and program presented Package punkt is already up-to-date! File extension is not allowed. how do you resolve it ？

支持pdf吗

Error in ChatGPT for Translation Example.ipynb

https://colab.research.google.com/drive/1_715zHeS3VaZaB9ISyo29Zp-KOTsyP8D#scrollTo=hU-8gsBXAyf0

--target_language={target_language} argument is missed

Translation Stuck at last section

I encountered this issue for many times. If I am trying to translate a few document, it will stuck at last 99% percent

Translating paragraphs: 99%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 374/376 [05:02<00:01, 1.24paragraph/s]
^CTraceback (most recent call last):
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 115, in translate_text_file
for future in tqdm(as_completed([future for idx, future in futures]), total=len(paragraphs), desc="Translating paragraphs", unit="paragraph"):
File "/opt/homebrew/lib/python3.11/site-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 243, in as_completed
waiter.event.wait(wait_timeout)
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 622, in wait
signaled = self._cond.wait(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 320, in wait
waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 309, in
main()
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 303, in main
process_folder(input_path, options)
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 291, in process_folder
process_file(file_path, options)
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 272, in process_file
translate_text_file(str(file_path), options)
File "/Users/cqy/ChatGPT-for-Translation/ChatGPT-translate.py", line 101, in translate_text_file
with ThreadPoolExecutor(max_workers=options.num_threads) as executor:
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 647, in exit
self.shutdown(wait=True)
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 235, in shutdown
t.join()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1112, in join
self._wait_for_tstate_lock()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py'>
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1553, in _shutdown
atexit_call()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 31, in _python_exit
t.join()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1112, in join
self._wait_for_tstate_lock()
File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:

We need a version that supports browser login not via API

I got chatGPT plus

it can translate entire subtitle but you have to split it into like 1 minute frames and type again and again

which takes huge time

can you modify it so that we can use chat GPT plus subscription not API?

PDF translation resulted in scrumbled lines in docx

The result docx file when translating from pdf has messed up format and overlapping lines, making it hard to read.
https://ibb.co/Pm944Hq

File path input is not working

Thanks for creating this project! It's been really helpful so far.

I've been using the folder path as input and it's been working fine. However, when I switched to using a file path like "test.txt", it didn't work on my Windows 10 and Debian 11 machines. I didn't get any output and the file wasn't generated.

Additionally, I noticed that some processing sentences are appearing in the output file, which shouldn't be there. Could you please look into this issue as well?
"把以下文本翻译成简体中文，忠实于原始文本。不要翻译人名和作者姓名。仅返回翻译内容，不要有其他内容："

Just wanted to bring this to your attention and see if you had any suggestions for fixing it. Thanks in advance!

Error while run "pip install -r requirements.txt --quiet" on colab

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.31.0 which is incompatible.