Coder Social home page Coder Social logo

local-pdf-ai's Introduction

In this tutorial we'll build a fully local chat-with-pdf app using LlamaIndexTS, Ollama, Next.JS.

LocalPDFChat.mp4

Stack used:

  • LlamaIndex TS as the RAG framework
  • Ollama to locally run LLM and embed models
  • nomic-text-embed with Ollama as the embed model
  • phi2 with Ollama as the LLM
  • Next.JS with server actions
  • PDFObject to preview PDF with auto-scroll to relevant page
  • LangChain WebPDFLoader to parse the PDF

Install Ollama

We'll use Ollama to run the embed models and llms locally.

Install Ollama

$ curl -fsSL https://ollama.com/install.sh | sh

Download nomic and phi model weights

For this guide, I've used phi2 as the LLM and nomic-embed-text as the embed model.

To use the model, first we need to download their weights.

$ ollama pull phi

$ ollama pull nomic-embed-text

But feel free to use any model you want.

FilePicker.tsx - Drag-n-drop the PDF

This component is the entry-point to our app.

It's used for uploading the pdf file, either clicking the upload button or drag-and-drop the PDF file.

  return (
    <div
      className='flex flex-col gap-7 justify-center items-center h-[80vh]'>
      <Label htmlFor="pdf" className="text-xl font-bold tracking-tight text-gray-600 cursor-pointer">
        Select PDF to chat
      </Label>
      <Input
        onDragOver={() => setStatus("Drop PDF file to chat")}
        onDragLeave={() => setStatus("")}
        onDrop={handleFileDrop}
        id="pdf"
        type="file"
        accept='.pdf'
        className="cursor-pointer"
        onChange={(e) => {
          if (e.target.files) {
            setSelectedFile(e.target.files[0])
            setPage(1)
          }
        }}
      />
      <div className="text-lg font-medium">{status}</div>
    </div>
  )

After successfully upload, it sets the state variable selectedFile to the newly uploaded file.

Preview.tsx - Preview of the PDF

Once the state variable selectedFile is set, ChatWindow and Preview components are rendered instead of FilePicker

First we get the base64 string of the pdf from the File using FileReader. Next we use this base64 string to preview the pdf.

Preview component uses PDFObject package to render the PDF.

It also takes page as prop to scroll to the relevant page. It's set to 1 initially and then updated as we chat with the PDF.

  useEffect(() => {
    const options = {
      title: fileToPreview.name,
      pdfOpenParams: {
        view: "fitH",
        page: page || 1,
        zoom: "scale,left,top",
        pageMode: 'none'
      }
    }
    console.log(`Page: ${page}`)
    const reader = new FileReader()
    reader.onload = () => {
      setb64String(reader.result as string);
    }
    reader.readAsDataURL(fileToPreview)
    pdfobject.embed(b64String as string, "#pdfobject", options)
  }, [page, b64String])

  return (
    <div className="flex-grow roundex-xl" id="pdfobject">
    </div>
  )

ProcessPDF() Next.JS server action

We also have to process the PDF for RAG.

We first use LangChain WebPDFLoader to parse the uploaded PDF. We use WebPDFLoader because it runs on the browser and don't require node.js.

const loader = new WebPDFLoader(
  selectedFile,
  { parsedItemSeparator: " " }
);
const lcDocs = (await loader.load()).map(lcDoc => ({
  pageContent: lcDoc.pageContent,
  metadata: lcDoc.metadata,
}))

RAG using LlamaIndex TS

Next, we pass the parsed documents to a Next.JS server action that initiates the RAG pipeline using LlamaIndex TS

if (lcDocs.length == 0) return;
const docs = lcDocs.map(lcDoc => new Document({
    text: lcDoc.pageContent,
    metadata: lcDoc.metadata
}))

we create LlamaIndex Documents from the parsed documents.

Vector Store Index

Next we create a VectorStoreIndex with those Documents, passing configuration info like which embed model and llm to use.

  const index = await VectorStoreIndex.fromDocuments(docs, {
    serviceContext: serviceContextFromDefaults({
      chunkSize: 300,
      chunkOverlap: 20,
      embedModel, llm
    })
  })

We use Ollama for LLM and OllamaEmbedding for embed model

const embedModel = new OllamaEmbedding({
  model: 'nomic-embed-text'
})

const llm = new Ollama({
  model: "phi",
  modelMetadata: {
    temperature: 0,
    maxTokens: 25,
  }
})

Vector Index Retriever

We then create a VectorIndexRetriever from the index, which will be used to create a chat engine.

  const retriever = index.asRetriever({
    similarityTopK: 2,
  })
  if (chatEngine) {
    chatEngine.reset()
  }

ChatEngine

Finally, we create a LlamaIndex ContextChatEngine from the Retriever

  chatEngine = new ContextChatEngine({
    retriever,
    chatModel: llm
  })

we pass in the LLM as well.

ChatWindow.tsx

This component is used to handle the Chat Logic

  <ChatWindow
    isLoading={isLoading}
    loadingMessage={loadingMessage}
    startChat={startChat}
    messages={messages}
    setSelectedFile={setSelectedFile}
    setMessages={setMessages}
    setPage={setPage}
  />

chat() server action

This server action used the previously created ChatEngine to generate chat response.

In addition to the text response it also returns the source nodes used to generate the response, which we'll use later to updated which page to show on the PDF preview.

const queryResult = await chatEngine.chat({
  message: query
})
const response = queryResult.response
const metadata = queryResult.sourceNodes?.map(node => node.metadata)
return { response, metadata };

Update the page to preview from metadata

We use the response and metadata from the above server action (chat()) to update the messages, and update the page to show in the PDF preview.

  setMessages(
    [
      ...messages,
      { role: 'human', statement: input },
      { role: 'ai', statement: response }
    ]
  )
  // console.log(metadata)
  if (metadata.length > 0) {
    setPage(metadata[0].loc.pageNumber)
  }
  setLoadingMessage("Got response from AI.")

Few gotchas

There're a few things to consider for this project:

  • You'll need a powerful machine with decent GPU to run Ollama for faster and better responses.
  • We need to disable fs on browser otherwise pdf-parse will not work. We need to put this in the webpack section of next.config.js
if (!isServer) {
  config.resolve.fallback = {
    fs: false,
    "node:fs/promises": false,
    assert: false,
    module: false,
    perf_hooks: false,
  };
}
  • Next.JS server actions don't support sending intermediate results, hence couldn't make streaming work.

Thanks for reading. Stay tuned for more.

I tweet about these topics and anything I'm exploring on a regular basis. Follow me on twitter

local-pdf-ai's People

Contributors

rsrohan99 avatar

Stargazers

. avatar Echo Hui avatar  avatar ZF avatar Johannes avatar Carlos Santín avatar  avatar Adji Fatou  avatar Tommy Mönnich avatar Fernando Mumbach avatar Niklas avatar victor immanuel avatar  avatar  avatar rye avatar Vince Fulco--Bighire.tools avatar Nenad Kalicanin avatar  avatar  avatar  avatar  avatar Leo avatar P avatar Michael Y. Choi avatar emailfrom avatar Héctor Pérez avatar Gurumurthi V Ramanan avatar Jinhui.Lin avatar  avatar  avatar Sai Nikhilesh Reddy avatar Tam Nguyen avatar  avatar Marco Camacho avatar  avatar  avatar Yan Liu 刘燕 avatar Bryan Lim avatar Christian Seeberger avatar Sohail Hosseini avatar sinal avatar Patryk Golba avatar  avatar Mike avatar Aseet Patel avatar  avatar JackZeng avatar Prabhu Kiran Konda avatar  avatar

Watchers

 avatar  avatar

local-pdf-ai's Issues

How to run the project

Hello, I am very interested in your project, but I'm not familiar with nextjs, so, I want to know the detail process to run the project, could you help me? Thanks a lot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.