vivek3141 / ghostbuster Goto Github PK
View Code? Open in Web Editor NEWGhostbuster: Detecting Text Ghostwritten by Large Language Models (NAACL 2024)
Home Page: https://arxiv.org/abs/2305.15047
License: Other
Ghostbuster: Detecting Text Ghostwritten by Large Language Models (NAACL 2024)
Home Page: https://arxiv.org/abs/2305.15047
License: Other
Hello!
Thanks for sharing this project and I look forward to exploring its applications. In the Readme, there are instructions on how to call ghostbuster on a text file and just wanted to check if you could provide instructions to import ghostbuster in a .py
file and what function I might be able to call to have it evaluate strings I store in-memory in Pandas?
Since 2024-01-04, according to https://platform.openai.com/docs/deprecations, OpenAI has deprecated the use of ada
and davinci
. The replacements are now babbage-002
for ada
and davinci-002
for davinci
.
I've encountered an issue in the classify.py
file where the length discrepancy between the trigram
and ada
tokens is causing errors. The specific error message is: ValueError: operands could not be broadcast together with shapes (116,) (115,)
. This misalignment is likely due to the recent changes in models.
Here is my input.txt
for reference:
In conclusion, "12 Years a Slave" effectively analyzes the themes of collectivism and individualism to portray the institution of slavery as a worldwide issue. The film gracefully showcases the dehumanization of black people through systematic prejudices, generalizations, stereotyping, and discrimination. By highlighting the power of collectivism and individualism in resisting and transcending slavery, it serves as a call to action for a global recognition of the need to dismantle systemic oppression. The film's portrayal of this historical struggle encourages viewers to confront their own biases and advocate for a more inclusive and equitable world.
I've attempted to replace the tokenizer in utils/symbolic.py
with cl100k_base
, p50k_base
, and r50k_base
, but the issue persists.
I would appreciate your assistance in resolving this matter. Thank you for your time and consideration.
I am encountering a deprecated model error for text_ada and DaVinci 1.0. I attempted to upgrade to babbage and DaVinci 2.0, but it appears that this requires more extensive refactoring across different files. I need assistance in understanding all the changes that need to be made.
Steps to Reproduce
Use text_ada or DaVinci 1.0 models in the current codebase.
Attempt to upgrade to babbage and DaVinci 2.0.
Observe the deprecated model error and issues in refactoring.
Thanks for sharing this code with everyone! The documentation says that you can run it against a file using the --file option but that seems to be missing in run.py. There are also a couple other small errors that popped up trying to get this to run. If there is an openai.config file in the directory, it tries to load a hardcoded version of it from a specific home directory instead of the one in the local directory. It also tries to read a file called best_features.txt from the model directory which doesn't exist (I copied features.txt to this name and it seems to have fixed it).
Edit: I see it is actually just a documentation issue, the right program should be classify.py.
Hi,
first of all, thanks for this great research. I am really impressed about the result your team got. The main question I have is, do you have any licenses for the guideline of using models or data that you uploaded?
Please let us know the ids of human and ChatGPT texts that were in the test split, especially -- we'd love to use this data as a challenging baseline for other detectors!
Hello! Do you have any plans to open source the Ghostbuster web application? Thank you.
I want to perform inference only using the ngrams, is there any flag for that and if so, can anyone help with it
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.