Coder Social home page Coder Social logo

anthropic-tokenizer-typescript's Introduction

Anthropic TypeScript Tokenizer

NPM version

โš ๏ธ This package can be used to count tokens for Anthropic's older models. As of the Claude 3 models, this algorithm is no longer accurate, but can be used as a very rough approximation. We suggest that you rely on usage in the response body wherever possible.

Installation

npm install --save @anthropic-ai/tokenizer
# or
yarn add @anthropic-ai/tokenizer

Usage

import { countTokens } from '@anthropic-ai/tokenizer';

function main() {
  const text = 'hello world!';
  const tokens = countTokens(text);
  console.log(`'${text}' is ${tokens} tokens`);
}
main();

Status

This package is in beta. Its internals and interfaces are not stable and subject to change without a major semver bump; please reach out if you rely on any undocumented behavior.

We are keen for your feedback; please email us at [email protected] or open an issue with questions, bugs, or suggestions.

Requirements

The following runtimes are supported:

  • Node.js version 12 or higher.
  • Deno v1.28.0 or higher (experimental). Use import { countTokens } from "npm:@anthropic-ai/tokenizer".

If you are interested in other runtime environments, please open or upvote an issue on GitHub.

anthropic-tokenizer-typescript's People

Contributors

jenan-anthropic avatar rattrayalex avatar robertcraigie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

whatif-dev

anthropic-tokenizer-typescript's Issues

Support for Vercel Serverless and Edge

Hello,
The underlying package being used (https://github.com/dqbd/tiktoken) seems to run into issues in a Vercel Serverless environment. Our application currently is built on NextJS 13 and we are seeing this error in our logs:
Error: Missing tiktoken_bg.wasm

We saw this issue before when we tried using the dqpd/tiktoken library directly. We had to switch to using js-tiktoken to resolve this issue.

Per the README in the GitHub repo it seems like this is the difference between the two:
tiktoken (formally hosted at @dqbd/tiktoken): WASM bindings for the original Python library, providing full 1-to-1 feature parity.
js-tiktoken: Pure JavaScript port of the original library with the core functionality, suitable for environments where WASM is not well supported or not desired (such as edge runtimes).

I was wondering if it was possible to build a version using the js-tiktoken library for better portability and for folks on environments where WASM is not easy to work with. The error and fix (i.e. creation of js-tiktoken) can be seen here: transitive-bullshit/agentic#570

Thanks!

tokenizer not working on browser

Tried this on a CRA app with WASM enabled.

image
We could easily fix this by adding an esm build besides the cjs. For now, I had to copy over the token count logic to my app and import the json from the package.

Feature request: Support Claude 3

We're working with a couple of long context window prompts where knowing the token counts of prompt components is useful before we make the requests.

However, testing our workflows with Claude 3 has been tricky because we have no way to get these estimates until after we've made the request, which costs us a few API credits. It would be extremely useful if the tokenizer could add support for the new series of models.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.