Coder Social home page Coder Social logo

botisan-ai / gpt3-tokenizer Goto Github PK

View Code? Open in Web Editor NEW
172.0 8.0 19.0 2.11 MB

Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.

License: MIT License

TypeScript 99.96% JavaScript 0.04%
typescript gpt3 tokenizer javascript chatgpt codex gpt-3 openai nodejs

gpt3-tokenizer's Introduction

GPT3 Tokenizer

Build NPM Version NPM Downloads

This is a isomorphic TypeScript tokenizer for OpenAI's GPT-3 model. Including support for gpt3 and codex tokenization. It should work in both NodeJS and Browser environments.

Usage

First, install:

yarn add gpt3-tokenizer

In code:

import GPT3Tokenizer from 'gpt3-tokenizer';

const tokenizer = new GPT3Tokenizer({ type: 'gpt3' }); // or 'codex'
const str = "hello ๐Ÿ‘‹ world ๐ŸŒ";
const encoded: { bpe: number[]; text: string[] } = tokenizer.encode(str);
const decoded = tokenizer.decode(encoded.bpe);

Reference

This library is based on the following:

The main difference between this library and gpt-3-encoder is that this library supports both gpt3 and codex tokenization (The dictionary is taken directly from OpenAI so the tokenization result is on par with the OpenAI Playground). Also Map API is used instead of JavaScript objects, especially the bpeRanks object, which should see some performance improvement.

License

MIT

gpt3-tokenizer's People

Contributors

adamnyberg avatar dependabot[bot] avatar jonwardopenai avatar kopertop avatar lhr0909 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpt3-tokenizer's Issues

Please help me adding GPT3-Tokenizer

Hello,

I've been attempting to add the GPT3-Tokenizer, but I've been unable to get it to function properly. At present, I am utilizing the template provided by Nutlope for my Twitter Bio, which can be found at .

Furthermore, I am in the process of learning React and NextJS.

Thank you!

Issues importing the package

No matching export in "browser-external:util" for import "TextEncoder" in Vite (SvelteKit).

 > node_modules/gpt3-tokenizer/dist/gpt3-tokenizer.esm.js:1:9: error: No matching export in "browser-external:util" for import "TextEncoder"
    1 โ”‚ import { TextEncoder, TextDecoder } from 'util';
      โ•ต          ~~~~~~~~~~~

 > node_modules/gpt3-tokenizer/dist/gpt3-tokenizer.esm.js:1:22: error: No matching export in "browser-external:util" for import "TextDecoder"
    1 โ”‚ import { TextEncoder, TextDecoder } from 'util';
      โ•ต                       ~~~~~~~~~~~

12:57:34 AM [vite] error while updating dependencies:
Error: Build failed with 2 errors:
node_modules/gpt3-tokenizer/dist/gpt3-tokenizer.esm.js:1:9: error: No matching export in "browser-external:util" for import "TextEncoder"
node_modules/gpt3-tokenizer/dist/gpt3-tokenizer.esm.js:1:22: error: No matching export in "browser-external:util" for import "TextDecoder"
    at failureErrorWithLog (/src/tymek-cz/node_modules/esbuild/lib/main.js:1493:15)
    at /src/tymek-cz/node_modules/esbuild/lib/main.js:1151:28
    at runOnEndCallbacks (/src/tymek-cz/node_modules/esbuild/lib/main.js:941:63)
    at buildResponseToResult (/src/tymek-cz/node_modules/esbuild/lib/main.js:1149:7)
    at /src/tymek-cz/node_modules/esbuild/lib/main.js:1258:14

Edit: this might be related: vitejs/vite#6493

Unsafe use of `this.cache.hasOwnProperty`

I started receiving errors about this.cache.hasOwnProperty is not a function. Digging into the code it looks like tokenizer.ts uses a bit of unsafe code considering this.cache is a map that allows any passed in value to be used as a token:

    if (this.cache.hasOwnProperty(token)) {
      return this.cache[token];
    }

Instead, this should be:

    if (Object.prototype.hasOwnProperty.call(this.cache, token)) {
      return this.cache[token];
    }

Issues module constructor

I am getting this error after implementation
TypeError: gpt3_tokenizer__WEBPACK_IMPORTED_MODULE_3__.GPT3Tokenizer is not a constructor

GPT3Tokenizer is not a constructor

Hi, I get an error when trying to instantiate GPT3Tokenizer.

const tokenizer = new GPT3Tokenizer({ type: 'gpt3' });
                  ^

TypeError: GPT3Tokenizer is not a constructor

gpt3-tokenizer v1.1.4
Node.js v18.12.1

Fix types

The types of the library are wrong: https://arethetypeswrong.github.io/?p=gpt3-tokenizer%401.1.3

Because of this issue the example in the documentation does not work in some environments, at the moment you need to use const tokenizer = new GPT3Tokenizer.default({ type: 'gpt3' }); on those instead.

Is a common issue with TypeScript projects... so common someone had to make a website to detect that issue. If the types are hard to fix (the website above makes fixing it look simple) just add a note in documentation. It's a bit frustrating testing out a new library, the first example in the documentation does not work and having to debug why, then you find the solution and question yourself why you are a JavaScript programmer and how those kind of issues still exist even with TypeScript... ๐Ÿ˜‘

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.