Coder Social home page Coder Social logo

realtime-semantic-search's Introduction

Real-time Semantic Search

Introduction

Prerequisites

Confluent

  • Confluent Cloud Account
  • Terraform (in order to build everything in Confluent Cloud)
  • Docker (in order to run the Kafka Producer and Apache Flink)
  • An OpenAI Account with credentials (in order to generate the text embeddings)

Rockset

  • Rockset Account

Confluent

Since you'll need some secrets through this walkthrough, the first thing you should do is create a file for your secrets. This repo will ignore the file env.sh, so using that is a safe bet. Start by cloning the repo, then creating the file with the following command.

git clone https://github.com/rocksetlabs/realtime-semantic-search && cd realtime-semantic-search
echo "# Confluent Cloud\nexport CONFLUENT_CLOUD_API_KEY="key"\nexport CONFLUENT_CLOUD_API_SECRET="secret"\n# OpenAI API Key\nexport OPENAI_API_KEY="key"" > env.sh

With the secrets file created, go to Confluent Cloud and create Cloud API Keys (guide here) and paste the values into the secrets file for the key and secret respectively. Next, paste in the value for your OpenAI API Key as well so the Flink processor will be able to create enbeddings. With all that complete, source everything to the console so the applications can use them.

source env.sh

With all the secrets available to the console, you can switch to the Terraform directory where you'll build the necessary Confluent Cloud resources. Follow these next few steps to create everything, and then wait until it's done before moving on.

cd terraform && terraform init
terraform plan

When prompted, approve the plan by entering "yes", or provide the "-auto-approve" flag to the apply command.

terraform apply 

Wait for Terraform to finish creating all the resources, then navigate back to the base directory.

cd ..

With everything created, you can start building and launching the services that will create and process the product data. Start by building images for the three services. This might take some time but you should only have to do it once.

docker compose build

With the images built, start the Flink processor so that it will be up and ready when you produce data in the next step.

docker compose up processor -d

Give the service a moment to come online (nothing more than 60 seconds), then launh the producer.

docker compose up producer -d

Now you can go to the Confluent Cloud console and begin looking at the messages. You should see the raw product data in a topic named products.metadata and the product data with the embeddings added in a topic named products.embeddings. With all that completed, now you should move on to setting up the Rockset components to consume the real-time data to power your semantic search.

Update Commands

Once you've set up the Rockset components and have done some querying of the data, you can use the following in order to update a single record to see how real-time the pipeline really is. Start by launching the updater service that can produce the updated records.

docker compose up updater -d

From there, you use the following to commands in order to update the product data for "Battle Hunter". To update "Battle Hunter" to it's original description, use the following.

docker compose exec updater java -cp /usr/app/semantic-search-1.0.0.jar com.github.zacharydhamilton.producer.MetadataProducer topic=products.metadata clientId=metadata-producer metadataFile=/usr/app/data/battle_hunter_original.json.gz

To update the description to a better, more realistic description, use the following.

docker compose exec updater java -cp /usr/app/semantic-search-1.0.0.jar com.github.zacharydhamilton.producer.MetadataProducer topic=products.metadata clientId=metadata-producer metadataFile=/usr/app/data/battle_hunter_updated.json.gz

You can cycle between these as many times as you like. Each will produce a single event to Confluent, and Rockset will store only the latest value you produced.

Rcokset

Clean-up

Confluent

  • Make sure you stop all the Docker services. docker compose down
  • Make sure you navigate back to the terraform/ directory and destroy the components in Confluent Cloud. terraform destroy

Rockset

realtime-semantic-search's People

Contributors

pdruley avatar

Stargazers

Maygol Kananizadeh avatar

Watchers

Zachary Hamilton avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.