Coder Social home page Coder Social logo

lajavaness / annotto Goto Github PK

View Code? Open in Web Editor NEW
42.0 3.0 0.0 6.64 MB

Annotto is the only go to annotation tool to successfully annotate your documents at scale

License: Apache License 2.0

JavaScript 67.91% Shell 0.06% Dockerfile 0.09% TypeScript 31.29% HTML 0.05% CSS 0.59%
ai ia annotate annotation annotation-tool

annotto's People

Contributors

alexandredljn avatar manodupont avatar olivierljn avatar semantic-release-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

annotto's Issues

Add an option of local file system access for image projects

Is your feature request related to a problem? Please describe.
It is currently impossible to use Annotto for images without access to S3.

Describe the solution you'd like
Add an option for the image projects, allowing to fetch files from local storage.
E.g. create a docker volume with images to use and item.data.url would contain a path to an image within this volume.

Keycloak roles changes are not correctly reflected in Annotto

Describe the bug
When a user has already been registered in Annotto, the role changes in keycloak will not take effect.

To Reproduce
Steps to reproduce the behavior:

  1. Login using your user with a role "admin".
  2. Refresh home page of Annotto and you should see all projects
  3. Go in Keycloak and change role mapping to "user"
  4. Refresh home page in Annotto and you should still see the same results.

Expected behavior
Your role in Keycloak should reflect the one persisted in Annotto mongodb.

Desktop (please complete the following information):

  • Version: 1.0.17

Add HTML datatype to the project and on it's items

Add the html datatype to the project as an available type, as well as to the items.

There should be no impact on the labeling types.

You should use the sanitize function of shared/utils/htmlUtils in order to clean up the HTML code and safely insert external content into the annotation page.

Currently the TextItemContainer component is used to display html content. It is necessary to remove the functionality of this component and create a new component to separate the display of content in text or in html depending on the type of project.

Annotto container fails to start

Describe the bug
The Annotto container does not start correctly. It exists just after startup, with an exception listed below.

To Reproduce
Steps to reproduce the behavior:

  1. docker run -d --name annotto -p 3000:3000 ljnrepo/annotto:latest
  2. docker logs annotto
  3. The following error appears in the logs:
09:41:23,478 INFO  [org.keycloak.services] (ServerService Thread Pool -- 59) KC-SERVICES0031: Import of realm 'annotto' requested. Strategy: IGNORE_EXISTING
09:41:23,688 FATAL [org.keycloak.services] (ServerService Thread Pool -- 59) Error during startup: java.lang.NullPointerException
	at [email protected]//org.keycloak.services.managers.RealmManager.setupMasterAdminManagement(RealmManager.java:304)
	at [email protected]//org.keycloak.services.managers.RealmManager.importRealm(RealmManager.java:525)
	at [email protected]//org.keycloak.exportimport.util.ImportUtils.importRealm(ImportUtils.java:110)
	at [email protected]//org.keycloak.exportimport.dir.DirImportProvider$4.runExportImportTask(DirImportProvider.java:138)
	at [email protected]//org.keycloak.exportimport.util.ExportImportSessionTask.run(ExportImportSessionTask.java:35)
	at [email protected]//org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:250)
	at [email protected]//org.keycloak.exportimport.dir.DirImportProvider.importRealm(DirImportProvider.java:134)
	at [email protected]//org.keycloak.exportimport.ExportImportManager.runImport(ExportImportManager.java:90)
	at [email protected]//org.keycloak.services.resources.KeycloakApplication.bootstrap(KeycloakApplication.java:207)
	at [email protected]//org.keycloak.services.resources.KeycloakApplication$1.run(KeycloakApplication.java:136)
	at [email protected]//org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:250)
	at [email protected]//org.keycloak.services.resources.KeycloakApplication.startup(KeycloakApplication.java:128)
	at [email protected]//org.keycloak.provider.wildfly.WildflyPlatform.onStartup(WildflyPlatform.java:36)
	at [email protected]//org.keycloak.services.resources.KeycloakApplication.<init>(KeycloakApplication.java:114)

Desktop (please complete the following information):

  • OS: Linux

Add management of overlaps on NER annotations

Description:
Currently, Annotto only allows for assigning a single tag to a word or group of words during annotation. However, there is a need to enhance Annotto's annotation capabilities by introducing the ability to handle overlaps in Named Entity Recognition (NER) annotations. This feature would enable users to assign multiple tags to a word, even if it is already part of a tagged group of words.

Implementation Details:

UI/UX: Update the user interface to support the viewing and management of overlapping annotations. This could involve incorporating visual cues or indicators to differentiate between different annotations on the same word.

Backend: No modification is expected on the backend, as the annotation management can remain the same. The existing data structure and storage mechanism can continue to handle annotations effectively.

Additional context:

Improved Annotation Flexibility: Enabling the assignment of multiple tags to a word provides users with increased flexibility in their annotations. This allows for more precise labeling of complex entities or multiple aspects within a single word.

Enhanced Training Data Quality: With the ability to assign overlapping tags, Annotto can produce higher-quality training datasets. This is particularly useful in cases where multiple entities coexist within the same text span.
Streamlined Annotation Process: By eliminating the need for manual workarounds or separate annotations for overlapping entities, this feature simplifies the annotation process, saving time and effort for users.

Compatibility: Ensure backward compatibility with existing annotated datasets and models, allowing for a smooth transition to the new overlapping annotation functionality.

The addition of overlap management to Annotto's NER annotations would significantly enhance its annotation capabilities, making it a more powerful tool for training machine learning models that require precise entity recognition.

Add the possibilty to annotate Timeseries dataset (for anomaly detection problems)

Timeseries are structured data

t1 v1
t2 v2
...

The goal would be to select a span ti -> tj that would be labelled as anomaly.

The span could be all the timeserie (from the start to the end) in that case, this would be a timeserie categorisation task.

There would be multiple timeseries at once (not sure how to handle this case)

Format of Item file (JSONL file) as described in the docs terminates Annotto container

Describe the bug
When trying to add items to a (new or existing) project according to the structure of the 'item file' that is given as an example in the section 'Create projet' of the documentation (https://lajavaness.github.io/annotto-docs/fr/docs/user-manual/create-project),
an error is returned to the user and the Annotto docker container is then found exited.

Note that a possible fix is shared at the end of this issue.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://lajavaness.github.io/annotto-docs/docs/user-manual/create-project#mode-texte
  2. Click on 'copy' button appearing when flying over the JSONL code example (items.jsonline)
  3. Paste in a new UTF8 text file and save as 'items.jsonline'
  4. In Annotto, select an example projet and '...' then 'Adminstration' and 'Files' tab of the project
  5. go to 'Item file (JSONL file)' and click on 'Add' and select 'items.jsonline' file you have just saved and Open
  6. Click on blue 'Save' button on the top of the page
  7. See error : << An error has occurred
    Could not parse row {"datatype": "text","uuid": "e0870093-180d-46ac-9d...>>
  8. Click on page reload in your browser
  9. See the error 'The connection failed'
  10. on the server execute the command to list both stopped and running containers
    docker ps -a
  11. see the 'Exited' status of the annotto container

Expected behavior
Annotto should add three items to the selected project.

Screenshots

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser Firefox
  • Version 102.10.0esr (64 bits)

Additional context
Server is on Docker 23.0.1 on Debian GNU/Linux 11 (bullseye) with Annotto release 1.2.7 (2023-03-30)

It seems that files in JSONL format usually do not have a comma separator between lines (in addition to LF or CR+LF separator), as it is strangely present in the example of 'items.jsonline' file, and somehow it is less strange that it does not work.

After suppressing commas between lines (at the end of each JSON line inside the jsonline file, except le last one which had no comma at the end) and keeping LF or CR+LF separators, the item file adding function of Annotto seems to work well.

One suggestion is to update the Annotto Documentation, by suppressing comma separators between lines in the example(s)*.
*: It has not been tested yet, but it could be the same problem with other jsonline examples like images.jsonline which also have comma separators which might also need to be removed.

see also :
https://github.com/lajavaness/annotto-docs/blob/main/docs/user-manual/create-project.md

Modify the configuration management

The goal would be to change the way the configuration are handled. At the moment a merge system is in place based on NODE_ENV but the system in place doesn't correctly take into account the possible fallback with environment variable. More precisely if NODE_ENV=development and ENV_TEST_XXXXX=titiand in the development.ts file configuration we have
{ test: 'toto' }

but in the parent config we do have
{ test: process.env.ENV_TEST_XXXXX }

we will have test=toto as the final merged configuration.

Describe the solution you'd like
We should refactor this solution by removing all ["development.ts", "production.ts", ...] files and just keep the parent config.ts file to remove any ambiguity or use an already working solution like convict

500 errors when opening any items

Describe the bug
When I open a project and click on an item, I got this error shown in the screenshot with this stack trace

RangeError [ERR_HTTP_INVALID_STATUS_CODE]: Invalid status code: undefined
    at new NodeError (node:internal/errors:387:5)
    at ServerResponse.writeHead (node:_http_server:314:11)
    at ServerResponse.writeHead (/usr/src/app/node_modules/on-headers/index.js:44:26)
    at ServerResponse._implicitHeader (node:_http_server:305:8)
    at write_ (node:_http_outgoing:867:9)
    at ServerResponse.end (node:_http_outgoing:977:5)
    at ServerResponse.send (/usr/src/app/node_modules/express/lib/response.js:221:10)
    at ServerResponse.json (/usr/src/app/node_modules/express/lib/response.js:267:15)
    at errorHandlerMiddleware (/usr/src/app/dist/src/utils/error.js:28:22)
    at Layer.handle_error (/usr/src/app/node_modules/express/lib/router/layer.js:71:5)
    at trim_prefix (/usr/src/app/node_modules/express/lib/router/index.js:315:13)
    at /usr/src/app/node_modules/express/lib/router/index.js:284:7
    at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:335:12)
    at next (/usr/src/app/node_modules/express/lib/router/index.js:275:10)
    at Layer.handle_error (/usr/src/app/node_modules/express/lib/router/layer.js:67:12)
    at trim_prefix (/usr/src/app/node_modules/express/lib/router/index.js:315:13)

Expected behavior
The item should open correctly.

Screenshots
Screenshot 2023-02-01 at 16 34 28

Desktop (please complete the following information):

  • Version 1.0.13, 1.21.1

Upgrade to Keycloak 22

We would like to upgrade keycloak to the latest version : 22.0.0

Keycloak 22 offers

  1. New bundle release with Quarkus which enhance loading time
  2. Optimization on the security and realm settings.
  3. Complete UI refactoring

All in one docker ljnrepo/annotto 1.2.6 is not working as expected

Bug description
We retrieve and run the latest ljnrepo/annotto image as specified:

docker run --rm -d --name annotto -p 3000:3000 ljnrepo/annotto:latest

We can log in through http://localhost:3000/ and we get the following home listing existing projects:

image

Accessing any project by clicking on its name or on the related Annotate link yields to a never ending loader view with a An error occured notification.

To Reproduce
Steps to reproduce the behavior:

  1. Go to http://localhost:3000
  2. Sign in to the admin account, click on Sign In
  3. Click on DEMO Zone and Text : CV - Extraction (for instance, or any another project link to project/id/)
  4. See error

Expected behavior
We don't know yet what should happen here ๐Ÿ˜‰

Screenshots
N/A see previous sections

Desktop (please complete the following information):

  • OS: GNU/Linux Debian 11
  • Browser: Firefox 112.0b4 and chrome 110.0.5481.177 among others tested
  • Version: N/A

Additional context
No particular suspicious logs from docker logs...

Did I miss something (at keycloak level or something)?

Images hosted on s3 don't load

Describe the bug
When annotating an image, the annotation page opens but the image doesn't load.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://annotto.lajavaness.com/project/63a1c552edb71dd1e908dc0e/annotation/63a1c552edb71d45b608dc21
  2. Click on any of the items to open the annotation screen.
  3. The "loading" animation appears but the image does not load.

Expected behavior
The image should load

Additional context
On this project, all images are hosted on the s3 bucket "s3://pollentrack", which is accessible with the rnd-data AWS profile (but not the default one). Special credentials have been created that can only see and access this specific bucket, and these have been loaded in the config file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.