neumtry / neumai Goto Github PK
View Code? Open in Web Editor NEWNeum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Home Page: https://neum.ai
License: Apache License 2.0
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Home Page: https://neum.ai
License: Apache License 2.0
python----3.10.12
neumai----0.0.33
error:
Traceback (most recent call last):
File "/Users/xxxxxx/xxxxxx/neum_test.py", line 4, in
from neumai.Chunkers.RecursiveChunker import RecursiveChunker
File "/Users/xxxxxx/miniforge3/lib/python3.10/site-packages/neumai/Chunkers/init.py", line 3, in
from .CustomChunker import CustomChunker
File "/Users/xxxxxx/miniforge3/lib/python3.10/site-packages/neumai/Chunkers/CustomChunker.py", line 4, in
from neumai_tools.SemanticHelpers import semantic_chunking
ModuleNotFoundError: No module named 'neumai_tools'
TypeError: LanceDBSink.search() got an unexpected keyword argument 'filters'
I have opened this issue to propose to add marqo tensor search as a sink to NeumAI. I will link a PR which adds it.
@ddematheu @kevinco26
Currently we are using a dictionary to provide filters on metadata, this approach lacks the following:
A simple solution to start with:
"field1 <= value1, field2 != value2"
instead of a dictionay ({"field1": "value1", "field2": "value2"}
)FilterCondition
objects by parsing this string:class FilterCondition:
# We can define FilterOperators
def __init__(self, column: str, op: FilterOperator, value: Any):
self.column = column
self.op = op
self.value = value
Let me know your comments on this, I would like to contribute.
Support using embedding services through url and api key or similar. Would allow neum to be more open and less vendor locked to the currently supported services.
Querying requirements across RAG fall not only onto unstructured data that has been embedded and added to an vector database. It also falls onto structured data sources where semantic search doesn't really make sense.
Goal: Provide a pipeline interface that connects to a structured data source and generates queries in real-time based on queries.
Implementation:
Pipeline
without an embed or sink connector, just a data source.Search
generates a query using an LLM based on the fields available in the database.PipelineCollection
and supported by smart_route
in order for model to decide when to use it.Alternative implementation:
search
we would run a similarity search of the incoming query against the description of the "cached" queries. We then can run top query against the database.As chat histories get longer, passing the entire history on every call is not a good practice. More so, user expects information from several messages ago to be available as context.
Goal: Improve size of the chat history context window to allow users to reference messages that fall outside existing window.
Solution: Leverage semantic search to index the entire chat history of a conversation and pull messages that are related to the latest message from the user.
Implementation:
Pipeline
object that uses a custom source connector that simply bypasses messages written to it into a vector database.search
we would run a normal search against the sink with filters to only pull messages from the given conversation.Prototyped: https://github.com/NeumTry/Pensieve
Other ideas:
When sink is queried using search API, if the retrieved information is correct (based on feedback or by running results against a different model), we could re-ingest the retrieved query pair (query and resulting vector) back into the vector DB, but using the query as the embedded value. The goal being that in future queries we can improve / make sure that the retrieved information is correct.
Validation is only set to return true. Change to connect to client to validate params.
Currently support unified (re-rank results into single list) and separate (results for each pipeline returned separately) searches for a collection .
Adding smart search which will do a smart routing to identify what collections are worth searching based on the query. Using the description of the pipeline, match to query.
filter
argument in SinkConnector.search
method expects the following -filters:List[FilterCondition]={}
, it should rather be filters:List[dict]={}
and then we need to convert the dict
to a FilterCondition
using dict_to_filter_condition
. Because user would provide a dictionary not a FilterCondition
object.filter
and some other places it is filters
.Given a query to the search interface for a sink, generate the FilterConditions automatically using the metadata fields available for a sink.
file_id is a unique identifier for each file processed by a pipeline.
file_id = pipeline_id + cloudFile_id
Necessary to be able to leverage delete, update and augment capabilities.
Right now we support a list of FilterConditions which are automatically AND to each other. More complex nesting might be necessary.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.