jina-ai / jina-commons Goto Github PK
View Code? Open in Web Editor NEWA collection of shared function for Jina Executor
License: Apache License 2.0
A collection of shared function for Jina Executor
License: Apache License 2.0
Currently, if we set needs_attr="text"
on get_docs_batch_generator
the function will yield docs with doc.text==""
.
It might be expected that these docs get filtered because the empty string is also the default argument for the text attribute on the Document
object.
This means
docs = DocumentArray([Document(blob=...)])
for batch in get_docs_batch_generator(docs, needs_attr='text')):
print(len(batch))
>>> 1
will yield the doc.
Currently, it is difficult to tell whether there is something hung up or just due to a large dumping task.
We can add it to https://github.com/jina-ai/jina-commons/blob/main/jina_commons/indexers/dump.py#L69
Refactor the functionality for document batching. This is currently used in the following executors and can be implemented once and imported from jina_commons instead.
It roughly looks like this:
def _batch_generator(data: List[Any], batch_size: int):
for i in range(0, len(data), batch_size):
yield data[i: i + batch_size]
def _get_docs_batch_generator(self, docs: DocumentArray, parameters: Dict):
traversal_path = parameters.get('traversal_path', self.default_traversal_path)
batch_size = parameters.get('batch_size', self.default_batch_size)
flat_docs = docs.traverse_flat(traversal_path)
filtered_docs = [doc for doc in flat_docs if doc is not None and doc.blob is not None]
return _batch_generator(filtered_docs, batch_size)
Name | Repo URL | PR
When the embedding is missing, this will break the whole program
https://github.com/jina-ai/jina-commons/blob/main/jina_commons/indexers/dump.py#L102
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.