Coder Social home page Coder Social logo

gptclonebench's Introduction

GPTCloneBench

GPTCloneBench is a clone detection benchmark based on SemanticCloneBench [1] and GPT [2,3,4,5]. This work is accepted at the ICSME2023 conference. We published another study following similar methodologies of GPTCloneBench named "Unveiling the potential of large language models in generating semantic and cross-language clones" in IWSC2023.

Please find the semantic clones (stand-alone + system injected clones) here: https://doi.org/10.5281/zenodo.10198952

For Cross-language clones: In this git repository, follow these two files:

  • cross_language.zip
  • cross_language_part_2.zip

Cross language clones are given in as stand-alone clones. They are not injected in a system.

System requirement

To install necessary libraries, please run the following command:

pip install -r requirements.txt

To run NiCad on generated Clone, you need to install TXL and NiCad.

To generate clones, you need to have SemanticCloneBench [1]. Follow this link to download SemanticCloneBench: https://drive.google.com/open?id=1KicfslV02p6GDPPBjZHNlmiXk-9IoGWl

To manually validate GPT clones, we have utilized tool from Jeffrey Svajlenko: https://github.com/jeffsvajlenko/ValidateClones

You need OpenAI API key to run the system. This link provided details on how to obtain OpenAI API key: https://www.maisieai.com/help/how-to-get-an-openai-api-key-for-chatgpt

Please follow the link to generate your own secret API key.

Generate and validate GPTCloneBench

To generate semantic clone, follow the following steps:

  1. Clone this repository.
  2. Copy SemanticCloneBench into this folder.
  3. run python create_clones_for_gptclonebench.py. Follow the prompts to generate clones.
  4. run python file_creation_for_validateClones.py to create input file for manual validation.
  5. run python crossL_file_creation_for_validateClones.py to create input file for manual validation for cross language clones.

Benchmark Validator (Undergraduate Interns):

  1. Chi Phuong Vu

    GitHub ID: 115325256, Email: [email protected]

  2. Olaoluwa Dayo-Olaide

    Email: [email protected]

  3. Souvik Ukil

    Email: [email protected]

  4. Aryan Mehta

    GitHub ID: 90737338, Email: [email protected] or [email protected]

  5. Dipika Ayshi

    Email: [email protected]

  6. Chi Cai

    GitHub id: 68583124, Email: [email protected]

License

Benchmark: The benchmark is distributed under the Creative Commons, Attribution-NonCommercial-NoDerivatives. This license includes the benchmark database and its derivatives. For attribution, please cite this page and our publications below. This data is provided free of charge for non-commercial and academic benchmarking and experimentation use. If you would like to contribute to the benchmark, please contact us. If you believe your intended usage may be restricted by the license, please contact us, and we can discuss the possibilities. BibTex for the GPTCloneBench (initial version) and Unveiling the potential of large language models in generating semantic and cross-language clones:

@inproceedings{gptclonebench2023,
  title={GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench},
  author={Alam, Ajmain Inqiad and Roy, Palash Ranjan and Al-omari, Farouq and Roy, Chanchal Kumar and Roy, Banani and Schneider, Kevin},
  booktitle={Proceedings of the 39th International Conference in Software Maintenance and Evolution (ICSME 2023)},
  year={2023},
  organization={October 2023, Bogota, Colombia (to appear)}
}

@INPROCEEDINGS{10473618,
  author={Roy, Palash R. and Alam, Ajmain I. and Al-omari, Farouq and Roy, Banani and Roy, Chanchal K. and Schneider, Kevin A.},
  booktitle={2023 IEEE 17th International Workshop on Software Clones (IWSC)}, 
  title={Unveiling the Potential of Large Language Models in Generating Semantic and Cross-Language Clones}, 
  year={2023},
  volume={},
  number={},
  pages={22-28},
  keywords={Computer languages;Codes;Statistical analysis;Conferences;Semantics;Cloning;Linguistics;Language Models;Software Clone;Semantic Clone;Cross-language Clone;GPT;Semantic-CloneBench;Software Engineering},
  doi={10.1109/IWSC60764.2023.00011}}

Contact

Ajmain Inqiad Alam: [email protected] / [email protected]

Palash Ranjan Roy: [email protected] / [email protected]

Farouq Al-omari: [email protected]

Chanchal K. Roy: [email protected]

Banani Roy: [email protected]

Kevin Schneider: [email protected]

BibTeX Citation

1. @inproceedings{al2020semanticclonebench,
    title={Semanticclonebench: A semantic code clone benchmark using crowd-source knowledge},
    author={Al-Omari, Farouq and Roy, Chanchal K and Chen, Tonghao},
    booktitle={2020 IEEE 14th International Workshop on Software Clones (IWSC)},
    pages={57--63},
    year={2020},
    organization={IEEE}
  }

2. @article{brown2020language,
    title={Language models are few-shot learners},
    author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others},
    journal={Advances in neural information processing systems},
    volume={33},
    pages={1877--1901},
    year={2020}
}

3. @misc{morrison_2022, 
    title={GPT-3 developer OpenAI releases new Davinci Generative Text Model}, 
    url={https://techmonitor.ai/technology/ai-and-automation/gpt-3-openai-davinci-generative-text}, 
    journal={Tech Monitor}, 
    author={Morrison, Ryan}, 
    year={2022}, 
    month={Nov}
 }

4. @misc{jain_2022,
    title={OpenAI turns to Davinci to make GPT-3 Better},
    url={https://analyticsindiamag.com/openai-turns-to-davinci-to-make-gpt-3-better/},
    journal={Analytics India Magazine},
    author={Jain, Ayush},
    year={2022},
    month={Nov}
} 

5. @misc{monge_2022,
    title={New GPT-3 model: Text-DAVINCI-003 is awesome},
    url={https://medium.com/technology-hits/new-gpt-3-model-text-davinci-003-is-awesome-ada11ef660a9},
    journal={Medium},
    publisher={Technology Hits},
    author={Monge, Jim Clyde},
    year={2022},
    month={Dec}
} 

gptclonebench's People

Contributors

ajmain-inqiad avatar roy101 avatar

Stargazers

Algernon-qaq avatar haiyang avatar  avatar Chi avatar

Watchers

chanchal roy avatar Banani Roy avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.