argilla-io / argilla Goto Github PK

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

License: Apache License 2.0

Dockerfile 0.08% JavaScript 2.97% SCSS 0.81% Vue 10.29% Shell 0.03% Python 75.36% TypeScript 9.35% HTML 0.02% Procfile 0.01% Mako 0.01% CSS 0.47% Jupyter Notebook 0.61%

human-in-the-loop natural-language-processing mlops developer-tools text-labeling annotation-tool nlp machine-learning active-learning weak-supervision

argilla's Introduction

Argilla

Work on data together, make your model outputs better!

Argilla is a collaboration tool for AI engineers and domain experts who need to build high-quality datasets for their projects.

If you just want to get started, deploy Argilla on Hugging Face Spaces. Curious, and want to know more? Read our documentation.

Or, play with the Argilla UI by signing in with your Hugging Face account:

Why use Argilla?

Argilla can be used for collecting human feedback for a wide variety of AI projects like traditional NLP (text classification, NER, etc.), LLMs (RAG, preference tuning, etc.), or multimodal models (text to image, etc.). Argilla's programmatic approach lets you build workflows for continuous evaluation and model improvement. The goal of Argilla is to ensure your data work pays off by quickly iterating on the right data and models.

Improve your AI output quality through data quality

Compute is expensive and output quality is important. We help you focus on data, which tackles the root cause of both of these problems at once. Argilla helps you to achieve and keep high-quality standards for your data. This means you can improve the quality of your AI output.

Take control of your data and models

Most AI tools are black boxes. Argilla is different. We believe that you should be the owner of both your data and your models. That's why we provide you with all the tools your team needs to manage your data and models in a way that suits you best.

Improve efficiency by quickly iterating on the right data and models

Gathering data is a time-consuming process. Argilla helps by providing a tool that allows you to interact with your data in a more engaging way. This means you can quickly and easily label your data with filters, AI feedback suggestions and semantic search. So you can focus on training your models and monitoring their performance.

🏘️ Community

We are an open-source community-driven project and we love to hear from you. Here are some ways to get involved:

Community Meetup: listen in or present during one of our bi-weekly events.
Discord: get direct support from the community in #argilla-distilabel-general and #argilla-distilabel-help.
Roadmap: plans change but we love to discuss those with our community so feel encouraged to participate.

What do people build with Argilla?

Open-source datasets and models

The community uses Argilla to create amazing open-source datasets and models.

Cleaned UltraFeedback dataset used to fine-tune the Notus and Notux models. The original UltraFeedback dataset was curated using Argilla UI filters to find and report a bug in the original data generation code. Based on this data curation process, Argilla built this new version of the UltraFeedback dataset and fine-tuned Notus, outperforming Zephyr on several benchmarks.
distilabeled Intel Orca DPO dataset used to fine-tune the improved OpenHermes model. This dataset was built by combining human curation in Argilla with AI feedback from distilabel, leading to an improved version of the Intel Orca dataset and outperforming models fine-tuned on the original dataset.

Examples Use cases

AI teams from companies like the Red Cross, Loris.ai and Prolific use Argilla to improve the quality and efficiency of AI projects. They shared their experiences in our AI community meetup.

AI for good: the Red Cross presentation showcases how the Red Cross domain experts and AI team collaborated by classifying and redirecting requests from refugees of the Ukrainian crisis to streamline the support processes of the Red Cross.
Customer support: during the Loris meetup they showed how their AI team uses unsupervised and few-shot contrastive learning to help them quickly validate and gain labelled samples for a huge amount of multi-label classifiers.
Research studies: the showcase from Prolific announced their integration with our platform. They use it to actively distribute data collection projects among their annotating workforce. This allows Prolific to quickly and efficiently collect high-quality data for research studies.

👨‍💻 Getting started

Installation

First things first! You can install the SDK with pip as follows:

pip install argilla

After that, you will need to deploy Argilla Server. The easiest way to do this is through our free Hugging Face Spaces deployment integration.

To use the client, you need to import the Argilla class and instantiate it with the API URL and API key.

import argilla as rg

client = rg.Argilla(api_url="https://[your-owner-name]-[your_space_name].hf.space", api_key="owner.apikey")

Create your first dataset

We can now create a dataset with a simple text classification task. First, you need to define the dataset settings.

settings = rg.Settings(
    guidelines="Classify the reviews as positive or negative.",
    fields=[
        rg.TextField(
            name="review",
            title="Text from the review",
            use_markdown=False,
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="my_label",
            title="In which category does this article fit?",
            labels=["positive", "negative"],
        )
    ],
)
dataset = rg.Dataset(
    name=f"my_first_dataset",
    settings=settings,
    client=client,
)
dataset.create()

Next, we can add records to the dataset.

pip install datasets

from datasets import load_dataset

data = load_dataset("imdb", split="train[:100]").to_list()
dataset.records.log(records=data, mapping={"text": "review"})

🎉 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records. Need more info, check out our docs.

🥇 Contributors

To help our community with the creation of contributions, we have created our community docs. Additionally, you can always schedule a meeting with our Developer Advocacy team so they can get you up to speed.

argilla's People

Contributors

Stargazers

Watchers

Forkers

javispp drahnreb yotofu trendingtechnology sibtainrazajamali adbmd lm0007 stjordanis akhilvydyula knowledgeextraction codeaudit bobycv06fpm darkknight2223 sts-sadr lisaterumi giovaninb skyrandtech chirasmita-mallick king1rule oylumalatli rmasiniexpert andreajparker arcodergh sakares vedangj044 issam9 truongcntn2017 vipul1306 dylanbuchi danielto1404 iakhil zeeroocooll aliashraf14 rasbt cuulee afiqmuzaffar alfsuse leireropl dumpmemory rogervaas ai-app sugatoray maystraai gandalf012 manikant92 dreji18 aucan onuratakan ahmedbesbes neurotech-hq rishistyping mmdonohue o7s8r6 0417taehyun bharatr21 pablojmoreno xmas25 ruanchaves davidkartchner dotpyu ravigurnatham vikas-lb ghaccount tskatom joskid carvalhoamc faisal-w doesnotexist mtfelix logp rafab6n johnsonkuan kajyuuen zelanastasia zsjtiger abeusher hiteshkalwani davgit kanka-max eccmyang wesleytao akamil-etsy kuxall sitedata sree181 sanupam-lab luca-digrazia smarthi sawdog aadesh-baranwal gaohuan2015 aashsach tomaarsen markovml ftakelait soulpancake frascuchon m-rajoy databill86 shahjaidev

argilla's Issues

Glitch Right Sidebar on large screen

Cuando hacemos scroll de la zona central el ancho del right sidebar se reduce no debería cambiar.
Propongo tb comentar el widget contador de records porqué esta escondido detrás

[UI]: Webapp should include a footer

Allow creation null tags

The create new label accepts create empty labels, bringing into null values. This behaviour should be skipped.

[UI] allow show list of text as record inputs.

Review user flow for adding a new label for text classification

Token classification: show edited status

When we add an annotation, and before validating it, show Edited status

[0.1.1] Problem with KGLab Tutorial

I'm getting this error when running the third tutorial for tomorrow's release, wondering if you know what's going on @dvsrepo .

import torch
from torch_geometric.data import Data

tensor = torch.tensor(edge_list, dtype=torch.long).t().contiguous() 
edge_index, edge_type = tensor[:2], tensor[2]
data = Data(edge_index=edge_index)
data.edge_type = edge_type

The error is:

OSError: dlopen(/Users/ignaciotalaveracepeda/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_sparse/_convert_cpu.so, 6): Symbol not found: __ZNK2at6Tensor6deviceEv
Referenced from: /Users/ignaciotalaveracepeda/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_sparse/_convert_cpu.so
Expected in: /Users/ignaciotalaveracepeda/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch/lib/libtorch_cpu.dylib
in /Users/ignaciotalaveracepeda/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_sparse/_convert_cpu.so
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-14-844cb8b9dd75> in <module>
      1 import torch
----> 2 from torch_geometric.data import Data
      3 
      4 tensor = torch.tensor(edge_list, dtype=torch.long).t().contiguous()
      5 edge_index, edge_type = tensor[:2], tensor[2]

~/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_geometric/__init__.py in <module>
      3 
      4 from .debug import is_debug_enabled, debug, set_debug
----> 5 import torch_geometric.data
      6 import torch_geometric.transforms
      7 import torch_geometric.utils

~/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_geometric/data/__init__.py in <module>
----> 1 from .data import Data
      2 from .temporal import TemporalData
      3 from .batch import Batch
      4 from .dataset import Dataset
      5 from .in_memory_dataset import InMemoryDataset

~/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_geometric/data/data.py in <module>
      6 import torch
      7 import torch_geometric
----> 8 from torch_sparse import coalesce, SparseTensor
      9 from torch_geometric.utils import (contains_isolated_nodes,
     10                                    contains_self_loops, is_undirected)

~/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_sparse/__init__.py in <module>
     13 ]:
     14     torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
---> 15         f'{library}_{suffix}', [osp.dirname(__file__)]).origin)
     16 
     17 if torch.cuda.is_available():  # pragma: no cover

~/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch/_ops.py in load_library(self, path)
    102             # static (global) initialization code in order to register custom
    103             # operators with the JIT.
--> 104             ctypes.CDLL(path)
    105         self.loaded_libraries.add(path)
    106 

~/anaconda3/envs/rubrix0.1.1/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    362 
    363         if handle is None:
--> 364             self._handle = _dlopen(self._name, mode)
    365         else:
    366             self._handle = handle

OSError: dlopen(/Users/ignaciotalaveracepeda/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_sparse/_convert_cpu.so, 6): Symbol not found: __ZNK2at6Tensor6deviceEv
  Referenced from: /Users/ignaciotalaveracepeda/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_sparse/_convert_cpu.so
  Expected in: /Users/ignaciotalaveracepeda/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch/lib/libtorch_cpu.dylib
 in /Users/ignaciotalaveracepeda/anaconda3/envs/rubrix0.1.1/lib/python3.7/site-packages/torch_sparse/_convert_cpu.so

Add label Feature

Integrate new user flow
Cuando te pongas hablamos para validarlo
Display records group like blogs
Align the creation of labels and entities

URL not found for images

When clicking on the images in the docs you get to a not found url. There are several solutions:

remove the ability to click on the image.
maybe there's an sphinx-contrib for enlarging/full-screen the image.
Fix the path

I don't know which one is easier to do, maybe 2 is the most difficult, I let you decide. In general I would avoid sending the users to other urls and leaving the docs, even if it's our image

Add a right-side toc with "On this page" information when support to that feature is added

It will be eventually be supported as a theme option (issue os sphinx referencing it). If not, it would require CSSing it out, or picking another base theme. Pytorch-lighting is able to do this in their docs

This concept is called local_toc.

New UI Features

Here are a few features for the UI that would be awesome to have :)

Apply annotations to all records of a given filter combination. Right now i can only select all records shown on screen and annotate them at once, but i cannot select all records of the given filter combination.
It would be nice to be able to load more than 5 records at once. Sometimes i need to go through a given filter combination rapidly to reach a certain point, and it is a bit cumbersome to press x times "Next 5 records".
In general it would be a nice feature to be able to "mark" records (maybe with a color or something like that). In my use case i had to check all negative annotated samples and reannotate them if necessary. Right now there is no way to "save" this progress, that is i could not mark the record as "i already checked you". Another use case was that we wanted to "mark" some phone calls we found via the UI, that had a really bad transcription. We used the "discard" feature for this, but this is not what it was meant for.
Allow to dynamically add/change/remove labels. Right now we have to do this outside of the UI, by loading/logging the data set again. Use case would be: while annotating, i see that the labels are not really suitable, and i want to add/change/remove some.

Streamlit app section on docs

Adding the streamlit app section on documentation. Streamlit app is already developed.

How shall I proceed @dvsrepo ? Making a quick .rst file just showing how to run the app (as all the explanations are in the code) or explaining the app/what the app shows in the .rst ?

Add Rubrix logo to readme

To show in upper readme, where RUBRIX was written.

Quick links to github and github dicussion on right side toc

Similar to streamlit doc, with direct links instead of local link to pages.

Reference Section

Including a Reference section. For now it can have links to Python API docs, and we can think of more elements to add. @dvsrepo told me that you were working on the API reference, @dcfidalgo , how is it going?

UI fine tuning styles

H1 Futura 36pt
H2 Futura 28pt
Text O.S 16px
Menu O.S 14px
Chapter title in menu no clickable (ej; Getting start) : Futura, soft blue : 4C4EA3, 22pt
Code snippet: same grey border and text color (pink): F2067A

Bug: Search in "Annotated as" not working properly

The search in the "Annotated as" filter does not work properly.

svg images with white font

While implementing the new readme, and adding one of the amazing svg images we have to explain Rubrix, and as a dark mode Github user, I came across this:

Is it possible to make it a png-like background, or we stick to the light mode approach?

[UI] Provide dataset properties for components simplification.

UI components that already have a dataset as input param, should avoid require other params that can be easily computed from dataset.

For examle:

<GlobalActions :annotationEnabled="annotationEnabled" :dataset="dataset" />

The GlobalActions component receive 2 params: dataset and annotationEnabled, but the second one can be easily computed from dataset

dataset.viewSettings.annotationEnabled

Also, we can consider create getter for common computed properties.

class Dataset {

   get annotationEnabled (this) {
      return this.viewSettings.annotationEnabled;
  }
}

Extend 10 record number by page

As a first basic change : pass from 5 to 10 records

Readme update

Hi! I think we've moved enought in the documentation process to update our beautiful README. In my opinion, two main tasks have to be adressed:

Decide if we change the actual content of the readme (many of this things are now on separate documentation section).
Complete what's left (some code blocks are placeholders, for example)

So we can discuss here how to approach this. My proposal:

First paragraph and quick links stays, but quick links could maybe be at the bottom of the readme.
Use cases look cool there, but I think design principles is info overloading. We can either move it to an existing doc page (Introduction feels right to me).
Main concepts have their own place now to be. We could maintain a copy here too, I'm down to anything here.
Get started coul be a little bit reduced, and placed before main concepts. We would reference the full instructions doc page. I see this get started as a I want to test Rubrix in less than a minute to test something out, and the docs get started as a that was cool! something cool enough for me I even made it to the documentation.
Supported and planned tasks feels right to me here, we give a glance at we have done and what are we working on from the readme.

Tell me what do you think 🤠

Alternative for complex records annotation “Not so clear”

Design an alternative to improve the UX and catch the complexity of a record with the aim to help the model

Rule to show show green/red/no icon

Check the logic implemented correspond to this behaviour

Gold same as Annotation: Green
Gold different as Annotation: Red
No Annotation or no Gold : No icon

Text for login box

Replace Annotation tool create a short text with Track and iterate on data for AI

New feature : Mentions sidebar

Integrate new feature "show mentions in Explore view"

3 mentions by labels (full list colapsed)
Active scroll

refresh button to update status

Wrong annotation progress calculation

When all records are anotated, the annotation progress bar is wrongly filled

[UI] Refresh button drop the pagination status

When a refresh is applied on exploration/annotation mode, the pagination status is lost.

Using current pagination style, the solution is not simple:

Do we need load everything previous to current "page"?

A classic pagination style (with pages) could mitigate these problems. What do you think @Amelie-V and @leiyre ??

Rubrix Cheatsheet/Cookbook

The idea is to cover the interaction between the main NLP libraries. So far, I've found these to include:

HuggingFace
spaCy
Flair
Stanza

If we found any other, it should be quick to add.

Addressing the name, I've found several examples of cheatsheet but none of cookbook. It has more personality and makes more sense to me (as we are explaining how to include other services into our app is kind of mixing different ingredients), but cheatsheet is by far the standard.

One very cool example is streamlit doc. I was wondering if that was a ipynb, but it is an standalone streamlit app, that they have running in a separate repo and uploaded to their premium hosting service. I don't know if something similar can be done from our starting point (a notebook), but would be cool. Other approaches, like readthedocs, are pngs. Maybe not difficult to replicate with our current style, but not very maintainable. So for now, a jupyter is a cool start, and maybe a destination (?).

[UI] Restore the whole page status after a login

Quit Futura font from the app

In header and task name it should be Open sans

Document elasticsearch field for Elasticsearch DSL and Kibana

Add this somewhere in the docs (probably in the Rubrix UI guide):

// Commons search fields
annotated_as
annotated_by
event_timestamp
id
last_updated
metadata.*
multi_label
predicted
predicted_as
predicted_by
status
words
// Text classification
inputs.*
score
// Token classification
tokens

Style font records

Intercambiar estilos del "type record" y del "record" tal que estaba antés. Sorry por el mareo

[UI] Show keywords in annotation mode

Keywords are quite useful on annotation phase.

[docs] Quick docs review

Simplify URL and folder structure by removing documentation from the path
Home page index.rst should be what's now Introduction to Rubrix. Ideally, this page will still be the first item of the Getting started block (I've seen this in other docs) but if it's not easy to do, no problem.
Rename Introduction to Rubrix -> Rubrix docs
Add right-side table of contents (toc) to each page, with a header like "On this page".
Merge Concepts and Methods into one section called Concepts (I think is a heading issue).
Try to include the video on the home page with autoplay and using either html5 or a sphinx-contrib package.
Complete the home page (what's now Intro to Rubrix)
Complete missing things in Concepts (keep it as simple as possible).
Add Community section, with direct links to github and github discussion forum.

Token classification : fine tune New entity creation

Review ux-ui

[UI] sidebar styles in Firefox

Missleading error in rb.init()

Let's avoid
raise Exception("Unidentified error, it should not get here.")

Showing info about the error response (status, body, etc.)

Moving api reference to api folder

I've noticed that we have an empty api folder, and the api reference is inside reference folder. We can merge both into the api folder, if you see it right @dcfidalgo

Duplicated Dataset name

Section with Number of records in dataset should be task name and not dataset name.

Allow copying UI created labels

When i create new labels in the UI, and do not annotate anything with these new labels, these new labels will not be copied when doing a rubrix.load&rubrix.log.

It would be nice to be able to completely copy a data set and its "state".

422 Error not showing correctly

{'detail': ['1 validation error for CreationDatasetRequest\nname\n  string does not match regex "^(?!-|_)[a-z0-9-_]+$" (type=value_error.str.regex; pattern=^(?!-|_)[a-z0-9-_]+$)']}
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-a00c59b53ee8> in <module>
     26 
     27 # Logging into Rubrix
---> 28 rb.log(records=[record], name="zeroshot-NER")

~/Documents/RecognAI/rubrix/src/rubrix/__init__.py in log(records, name, tags, metadata, chunk_size)
    112         records = [records]
    113 
--> 114     return _client_instance().log(
    115         records=records, name=name, tags=tags, metadata=metadata, chunk_size=chunk_size
    116     )

~/Documents/RecognAI/rubrix/src/rubrix/client/__init__.py in log(self, records, name, tags, metadata, chunk_size)
    166             chunk = records[i : i + chunk_size]
    167 
--> 168             response = bulk_records_function(
    169                 client=self._client,
    170                 name=name,

~/Documents/RecognAI/rubrix/src/rubrix/sdk/api/token_classification/bulk_records.py in sync_detailed(client, name, json_body)
     85     )
     86 
---> 87     return _build_response(response=response)
     88 
     89 

~/Documents/RecognAI/rubrix/src/rubrix/sdk/api/token_classification/bulk_records.py in _build_response(response)
     65         content=response.content,
     66         headers=response.headers,
---> 67         parsed=_parse_response(response=response),
     68     )
     69 

~/Documents/RecognAI/rubrix/src/rubrix/sdk/api/token_classification/bulk_records.py in _parse_response(response)
     52     if response.status_code == 422:
     53         print(response.json())
---> 54         response_422 = HTTPValidationError.from_dict(response.json())
     55 
     56         return response_422

~/Documents/RecognAI/rubrix/src/rubrix/sdk/models/http_validation_error.py in from_dict(cls, src_dict)
     39         _detail = d.pop("detail", UNSET)
     40         for detail_item_data in _detail or []:
---> 41             detail_item = ValidationError.from_dict(detail_item_data)
     42 
     43             detail.append(detail_item)

~/Documents/RecognAI/rubrix/src/rubrix/sdk/models/validation_error.py in from_dict(cls, src_dict)
     35     @classmethod
     36     def from_dict(cls: Type[T], src_dict: Dict[str, Any]) -> T:
---> 37         d = src_dict.copy()
     38         loc = cast(List[str], d.pop("loc"))
     39 

AttributeError: 'str' object has no attribute 'copy'

The problem comes for the response with the regex.

Change "by default" in Status filter

Make it more explicit. (To design)

change it in UI
Adapt it in back

Review readme

Make sure links to docs point to /stable/ and not latest.
Align setup and installation with the setup and installation guides in the docs.

docker-compose issues

When running the docker-compose file in the readme, the rubrix_1 service fails with the following exception stack trace:

[2021-05-04 16:23:21 +0000] [40] [ERROR] Traceback (most recent call last):
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 170, in _new_conn
rubrix_1         |     (self._dns_host, self.port), self.timeout, **extra_kw
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 96, in create_connection
rubrix_1         |     raise err
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 86, in create_connection
rubrix_1         |     sock.connect(sa)
rubrix_1         | ConnectionRefusedError: [Errno 111] Connection refused
rubrix_1         | 
rubrix_1         | During handling of the above exception, another exception occurred:
rubrix_1         | 
rubrix_1         | Traceback (most recent call last):
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 252, in perform_request
rubrix_1         |     method, url, body, retries=Retry(False), headers=request_headers, **kw
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen
rubrix_1         |     method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 507, in increment
rubrix_1         |     raise six.reraise(type(error), error, _stacktrace)
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
rubrix_1         |     raise value
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
rubrix_1         |     chunked=chunked,
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 394, in _make_request
rubrix_1         |     conn.request(method, url, **httplib_request_kw)
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 234, in request
rubrix_1         |     super(HTTPConnection, self).request(method, url, body=body, headers=headers)
rubrix_1         |   File "/usr/local/lib/python3.7/http/client.py", line 1252, in request
rubrix_1         |     self._send_request(method, url, body, headers, encode_chunked)
rubrix_1         |   File "/usr/local/lib/python3.7/http/client.py", line 1298, in _send_request
rubrix_1         |     self.endheaders(body, encode_chunked=encode_chunked)
rubrix_1         |   File "/usr/local/lib/python3.7/http/client.py", line 1247, in endheaders
rubrix_1         |     self._send_output(message_body, encode_chunked=encode_chunked)
rubrix_1         |   File "/usr/local/lib/python3.7/http/client.py", line 1026, in _send_output
rubrix_1         |     self.send(msg)
rubrix_1         |   File "/usr/local/lib/python3.7/http/client.py", line 966, in send
rubrix_1         |     self.connect()
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 200, in connect
rubrix_1         |     conn = self._new_conn()
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 182, in _new_conn
rubrix_1         |     self, "Failed to establish a new connection: %s" % e
rubrix_1         | urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fd72c079b50>: Failed to establish a new connection: [Errno 111] Connection refused
rubrix_1         | 
rubrix_1         | During handling of the above exception, another exception occurred:
rubrix_1         | 
rubrix_1         | Traceback (most recent call last):
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 526, in lifespan
rubrix_1         |     async for item in self.lifespan_context(app):
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 467, in default_lifespan
rubrix_1         |     await self.startup()
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 502, in startup
rubrix_1         |     await handler()
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/rubrix/server/server.py", line 71, in configure_elasticsearch
rubrix_1         |     datasets: DatasetsDAO = create_datasets_dao(es=es_wrapper)
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/rubrix/server/datasets/dao.py", line 271, in create_datasets_dao
rubrix_1         |     _instance = DatasetsDAO(es)
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/rubrix/server/datasets/dao.py", line 66, in __init__
rubrix_1         |     self.init()
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/rubrix/server/datasets/dao.py", line 73, in init
rubrix_1         |     force_recreate=True,
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/rubrix/server/commons/es_wrapper.py", line 140, in create_index_template
rubrix_1         |     self.__client__.indices.put_template(name=name, body=template)
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 153, in _wrapped
rubrix_1         |     return func(*args, params=params, headers=headers, **kwargs)
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/indices.py", line 647, in put_template
rubrix_1         |     body=body,
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 413, in perform_request
rubrix_1         |     raise e
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 388, in perform_request
rubrix_1         |     timeout=timeout,
rubrix_1         |   File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 264, in perform_request
rubrix_1         |     raise ConnectionError("N/A", str(e), e)
rubrix_1         | elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7fd72c079b50>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fd72c079b50>: Failed to establish a new connection: [Errno 111] Connection refused)
rubrix_1         | 
rubrix_1         | [2021-05-04 16:23:21 +0000] [40] [ERROR] Application startup failed. Exiting.

Change documentation video format to .webm

At the moment Safari does not support webm format

Move Analytics sidebar button

On small browser, move the button to not hide the "create label feature". Show it under, aligned with the first record line

Review concepts section

fix and missing links (e.g. [link a las tasks] or issues with reference to guides)
include a brief description about metadata
remove docstrings from methods and add a mention to the Python Client API reference.
include a short introduction paragraph after the main title: In this section, we will bla bla, then we can have two subsections Rubrix data model (it's called now Main Concepts) or just Data model, and then Methods. In general we should avoid having a H1 with no text before the first H2, even short explanations of the content of that page will greatly help the readers.
do a quick proof reading (there's a typo with POS tagging)
I will provide you with updated figures to improve readability.

Thanks, very good job in general👍

[UI] Wildcard searches show no record text.

Applying a wildcard search for text classification like *as returns results but record text are not shown

UI: State data set to be deleted in the confirmation modal

When deleting a data set in the UI, a confirmation modal pops up if you really want to delete the data set. It would be nice if this modal explicitly states the name of the data set to be deleted.