Coder Social home page Coder Social logo

blurredai's Introduction

cover

BlurredAI

Hide your data from GPT-4 with local models on your laptop!

Inspiration

Have you ever had sensitive questions or data that you didn't feel comfortable sending to ChatGPT? Probably all the time for your corporate activity—work emails, legal texts, internal finances. This extends beyond corporate use cases though; think about your medical history, tax forms, relationship problems. This is a clear gap: remote models are already sufficiently powerful for most tasks, but they're untrustworthy to deal with confidential information and hosting similarly powerful models locally can be expensive.

What it does

Our project, BlurredAI, enables a collaboration between small & large models. A lightweight LLM, hosted locally on your laptop, privatizes your queries on private data before sending them to powerful remote models such as GPT-4. It then reconstructs the original context/data locally from the remote response, ensuring privacy without compromising the quality of insights. Check our demo video for concrete examples!

How we built it

There are a few core components to BlurredAI:

  1. Hosting large language models locally, essential for privacy guarantees.
  2. Adapting local models to different workflows, such as privatizing emails, parsing legal texts, understanding spreadsheets.
  3. A natural, intuitive UI where users can chat with and ask questions about their sensitive texts, documents, or PDFs.

For #1 we used TogetherAI for local development and (plan to use) Ollama for hosting on laptops. For #2, we crafted numerous prompts for different privacy workflows. For #3, we employed Streamlit for the frontend and Python, including PDF & CSV parsers, for the backend.

Challenges we ran into

  • Local models often aren't powerful enough to coherently privatize sensitive data, presenting an efficiency-utility trade-off.
  • Adapting the local models to various workflows; for example, the work done by the local model varies greatly between making spreadsheets private and making legal texts private.

Accomplishments that we're proud of

The novelty of our idea is what sets us apart. Unlike other companies, we are not taking the user's data. We don't even have a database. The user (business/individual) needs to simply clone our open-source repository and run the app locally.

What we learned

  • Privacy is multi-faceted and context-dependent. In some cases, it means redacting names, numbers, and emails, while in others it involves providing plausible deniability to the user.
  • Open-source LLMs for local hosting are still brittle, lacking strong reasoning capabilities without specific adaptations for different workflows.
  • Beyond standard privacy/cryptography tools, user-acceptable private inference can be achieved through a combination of appropriate anonymization, distributed computing, and careful human-computer interface design.

What's next for BlurredAI

  • Publishing as a PyPi package to simplify installation.
  • Extending to early adopters (e.g., mid-size companies like Esri) based on our user research.
  • Enhancing the local model for privatization (e.g., text redaction, rephrasing, shifting points of view) through better prompts, local fine-tuning, or integrating larger open-source models locally.

blurredai's People

Contributors

rchtgpt avatar xinyi-zhao avatar kenziyuliu avatar

Stargazers

Trung Le avatar Boqian Ma avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.