Coder Social home page Coder Social logo

aiseei / reader Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jina-ai/reader

0.0 0.0 0.0 452 KB

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

Home Page: https://jina.ai/reader

License: Apache License 2.0

JavaScript 0.87% TypeScript 99.13%

reader's Introduction

Reader

Your LLMs deserve better input.

Reader converts any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/. Get improved output for your agent and RAG systems at no cost.

Feel free to use https://r.jina.ai/* in production. It is free, stable and scalable. We are maintaining it actively as one of the core products of Jina AI.

banner-reader-api.png

Updates

  • 2024-04-15: Reader now supports image reading! It captions all images at the specified URL and adds Image [idx]: [caption] as an alt tag (if they initially lack one). This enables downstream LLMs to interact with the images in reasoning, summarizing etc. See example here.

Usage

Standard mode

Simply prepend https://r.jina.ai/ to any URL. For example, to convert the URL https://en.wikipedia.org/wiki/Artificial_intelligence to an LLM-friendly input, use the following URL:

https://r.jina.ai/https://en.wikipedia.org/wiki/Artificial_intelligence

Streaming Mode

Streaming mode is useful when you find that the standard mode provides an incomplete result. This is because streaming mode will wait a bit longer until the page is fully rendered. Use the accept-header to toggle the streaming mode:

curl -H "Accept: text/event-stream" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page

The data comes in a stream; each subsequent chunk contains more complete information. The last chunk should provide the most complete and final result.

For example, compare these two curl commands below. You can see streaming one gives you complete information at last, whereas standard mode does not. This is because the content loading on this particular site is triggered by some js after the page is fully loaded, and standard mode returns the page "too soon".

curl -H 'x-no-cache: true' https://access.redhat.com/security/cve/CVE-2023-45853
curl -H "Accept: text/event-stream" -H 'x-no-cache: true' https://r.jina.ai/https://access.redhat.com/security/cve/CVE-2023-45853

Note: -H 'x-no-cache: true' is used only for demonstration purposes to bypass the cache.

Streaming mode is also useful if your downstream LLM/agent system requires immediate content delivery or needs to process data in chunks to interleave I/O and LLM processing times. This allows for quicker access and more efficient data handling:

Reader API:  streamContent1 ----> streamContent2 ----> streamContent3 ---> ... 
                          |                    |                     |
                          v                    |                     |
Your LLM:                 LLM(streamContent1)  |                     |
                                               v                     |
                                               LLM(streamContent2)   |
                                                                     v
                                                                     LLM(streamContent3)

Note that in terms of completeness: ... > streamContent3 > streamContent2 > streamContent1, each subsequent chunk contains more complete information.

JSON mode

This is still very early and the result is not really a "useful" JSON. It contains three fields url, title and content only. Nonetheless, you can use accept-header to control the output format:

curl -H "Accept: application/json" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page

Install

You will need the following tools to run the project:

  • Node v18 (The build fails for Node version >18)
  • Firebase CLI (npm install -g firebase-tools)

For backend, go to the backend/functions directory and install the npm dependencies.

git clone [email protected]:jina-ai/reader.git
cd backend/functions
npm install

What is thinapps-shared submodule?

You might notice a reference to thinapps-shared submodule, an internal package we use to share code across our products. While it’s not open-sourced and isn't integral to the Reader's functions, it mainly helps with decorators, logging, secrets management, etc. Feel free to ignore it for now.

That said, this is the single codebase behind https://r.jina.ai, so everytime we commit here, we will deploy the new version to the https://r.jina.ai.

Having trouble on some websites?

Please raise an issue with the URL you are having trouble with. We will look into it and try to fix it.

License

Reader is backed by Jina AI and licensed under Apache-2.0.

reader's People

Contributors

nomagick avatar hanxiao avatar chxru avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.