Coder Social home page Coder Social logo

Comments (6)

ravenscroftj avatar ravenscroftj commented on July 18, 2024

Thanks for the ticket, I think this could be bit of a tricky one to debug because the GGML GPT-J tokenizer is implemented from scratch versus the Huggingface Codegen tokenizer and the latter also has a bunch of token merging logic which I don't think GGML's tokenizer has (I will try to confirm).

I can't comment on whether this is likely to significantly impact the performance of the model - that would need testing empirically.

Was there a specific use case you have in mind that this is blocking?

from turbopilot.

thakkarparth007 avatar thakkarparth007 commented on July 18, 2024

Hey, yeah I was planning to use this for benchmarking 4bit performance of codegen models. Most of the prompts I have are over 1500 tokens or more, and these overflow 2048 tokens when tokenized incorrectly. I guess one way to get around this is to accept pretokenized inputs.

from turbopilot.

ravenscroftj avatar ravenscroftj commented on July 18, 2024

Ah OK that makes sense thanks for clarifying. I will look into the tokenizer behaviour properly probably over the weekend but in the mean time I will see if I can add a rest endpoint to codegen server that accepts an array of tokens as a json list. Then you can pretokenize your input using the huggingface tokenizer. I'll keep you posted!

from turbopilot.

thakkarparth007 avatar thakkarparth007 commented on July 18, 2024

Thanks! I just created a PR here to allow pretokenized inputs: ravenscroftj/ggml#2

It seems to work fine for me.

from turbopilot.

ravenscroftj avatar ravenscroftj commented on July 18, 2024

That's really cool thank you for your contribution - I have accepted the MR. I will leave this ticket open as a reminder to look into the tokenizer behaviour anyway.

Sidenote - I'd be really interested in your evaluation of the 4 bit model if you're willing to share it!

from turbopilot.

thakkarparth007 avatar thakkarparth007 commented on July 18, 2024

Thanks!

I have performed a preliminary evaluation of the 6B-4bit model on Python. I ran the model on ~2000 code completion scenarios in Python (I have a custom dataset) and found about 15% degradation in the exact match metric at first line. Here's how the graph looks like:
image

I manually looked at some of the mispredictions and they seemed okay to me, but were getting penalized because it wasn't an exact match. I think one interesting thing to do would be to check how different the probabilities of the 16bit and 4bit predictions are

from turbopilot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.