Initial notes

output value for the sample "Patient registration" as displayed on https://dbic-handbook.readthedocs.io/en/latest/mri/source/images/walkthrough-1/wt1-d.1.register.png 👍

{
   "last_name": "SID000XXX",
   "first_name": "",
   "title": "",
   "patient_id": "SID000XXX",
   "date_of_birth": "1/1/1990",
   "sex": "other",
   "age": 28,
   "age_unit": "years",
   "height_ft": 5,
   "height_in": 4,
   "weight_lb": 103,
   "accession_no": "A000XXX",
   "request_id": "",
   "patient_position": "Head First - Supine",
   "performing_physician": "",
   "operator": ""
}

the output format might be different, e.g. instead of a single "definite answer" include an array of possible answers OCR gives back with confidences for them

Research existing solutions which might be used or integrated

https://github.com/con/noisseur#noisseur describes the goal for the project . Here want to collect pointers/ideas on related projects

Annotation of images for identification of the ones of interest

OCR

If none of the "complete" solutions has one built in

https://github.com/tesseract-ocr/tesseract
https://alternativeto.net/feature/screenshot-ocr/ reminded me about http://sikulix.com/ -- I believe it had some nice features to "Annotate" screenshot on what to OCR. https://github.com/RaiMan/SikuliX1/ has some recentish version. https://github.com/RaiMan/sikulix4python gives some way to use sikulix from python

Some answers opengpt gave

What software exist to establish automated QC of data entered on GUI

There are a variety of software tools that can be used to establish automated quality control (QC) of data entered through a graphical user interface (GUI). These tools can help ensure that the data being entered is accurate, consistent, and complete. Some examples of software tools that can be used for automated QC of data entered through a GUI include:

Data validation software: This type of software can be used to check the data being entered for accuracy and completeness. It can validate that the data meets certain criteria, such as being in the correct format or falling within a certain range.
Data cleansing software: This type of software can be used to identify and correct errors or inconsistencies in the data being entered. It can help ensure that the data is clean and accurate before it is used for analysis or other purposes.
Data quality tools: These tools can be used to assess the overall quality of the data being entered and identify any issues that need to be addressed. They can check for things like duplicates, missing values, and inconsistencies in the data.
Data entry software: There are also specialized software tools that are designed specifically for data entry. These tools can help streamline the data entry process and provide features such as auto-fill and data verification to help ensure that the data being entered is accurate.

Is there anything else you would like to know about establishing automated QC for data entered through a GUI?

yes, any specific open source software for that?

Yes, there are several open source software tools that can be used for automated quality control (QC) of data entered through a graphical user interface (GUI). Some examples include:

OpenRefine: This is a free, open source tool for data cleansing and data transformation. It can be used to identify and correct errors or inconsistencies in data, and it has a user-friendly interface that makes it easy to work with large datasets.
Talend: This is a powerful data integration and data management platform that includes a range of tools for data cleansing, data quality, and data governance. It is open source and available for free download.
Apache Nutch: This is an open source web crawler and data scraping tool that can be used to extract data from websites and other online sources. It can be customized to suit the specific needs of the user, and it has a range of features for data cleansing and quality control.
OpenDataKit (ODK): This is an open source data collection platform that can be used to design and deploy data collection forms, and it includes tools for data quality control and validation. It is widely used in research and development projects, and it is available for free download.

Is there anything else you would like to know about open source software for automated QC of data entered through a GUI?

is any of those can do OCR on screenshots?

Yes, some of the open source software tools that can be used for automated quality control (QC) of data entered through a graphical user interface (GUI) also have the ability to perform optical character recognition (OCR) on screenshots. OCR is a technology that allows a computer to recognize and extract text from images and scanned documents.

One example of an open source tool that can perform OCR on screenshots is Tesseract. Tesseract is a widely used OCR engine that is available for free and can be integrated into other software applications. It is highly accurate and can recognize text in a variety of languages.

Another example is OCRopus, which is an open source OCR system that is designed for scientific documents and includes a range of tools for layout analysis and data extraction. It can be used to extract text from screenshots and other images.

Is there anything else you would like to know about open source software tools that can perform OCR on screenshots?

was sikuli used by anyone for such a purpose?

Sikuli is a software tool that can be used to automate graphical user interface (GUI) interactions using image recognition. It was developed at the University of Colorado and is available as open source software.

Sikuli can be used to perform tasks such as clicking on buttons, entering text into fields, and interacting with elements on a GUI. It works by taking a screenshot of the GUI and searching for specific images within the screenshot to identify the elements it needs to interact with.

Sikuli can potentially be used for automated quality control (QC) of data entered through a GUI by taking screenshots of the data entry form and using image recognition to verify that the data being entered is accurate and complete. However, it may not be the most efficient or reliable tool for this purpose, as it relies on image recognition and may be affected by changes in the appearance of the GUI or the data being entered.

Is there anything else you would like to know about using Sikuli for automated QC of data entered through a GUI?

is there software like sikuli but more actively developed and ideally in Python?

Yes, there are several software tools that are similar to Sikuli and are more actively developed and available in Python. Sikuli is a software tool that was developed to automate graphical user interface (GUI) interactions using image recognition, and it was originally implemented in Java. Python is a popular programming language that is widely used in a variety of applications, including data analysis and automation.

One example of a software tool that is similar to Sikuli and is actively developed and available in Python is PyAutoGUI. PyAutoGUI is a Python library that allows users to automate GUI interactions using a variety of methods, including image recognition. It can be used to perform tasks such as clicking on buttons, entering text into fields, and interacting with elements on a GUI.

Another example is Pywinauto, which is a Python library for automating Windows GUI applications. It includes a range of tools for interacting with GUI elements, including image recognition, and it can be used to automate a wide range of tasks on Windows systems.

From those notes

https://github.com/ocropus/ocropy interesting, seems last activity in 2020... there is more recently developed https://github.com/ocropus/ocropus4/
https://github.com/asweigart/pyautogui - last activity in 2021 , seems have relevant function to take screnshot and identify similar to known images parts. So we could build on similar principles/functionality

Do you know any GUI software (ideally written in Python) to annotate an image or screenshot with regions of interest to be automatically detected/processed?

Yes, there are several graphical user interface (GUI) software tools that can be used to annotate images or screenshots with regions of interest (ROIs) for automatic detection or processing. These tools typically allow users to draw or select regions on the image, and they may include additional features such as labeling or classification of the ROIs. Some examples of GUI software tools for annotating images or screenshots with ROIs, written in Python, include:

Labelbox (https://www.labelbox.com/): This is a cloud-based platform for creating and managing data sets for machine learning applications. It includes a user-friendly GUI for annotating images and other types of data, and it allows users to draw or select ROIs and label them with custom tags.
LabelImg (https://github.com/tzutalin/labelImg): This is an open source graphical annotation tool that can be used to label images for machine learning applications. It is written in Python and includes a GUI for drawing or selecting ROIs and labeling them with custom tags.
VGG Image Annotator (VIA) (http://www.robots.ox.ac.uk/~vgg/software/via/): This is an open source tool for annotating images and other types of data. It is written in Python and includes a GUI for drawing or selecting ROIs and labeling them with custom tags.

note to above

https://github.com/tzutalin/labelImg points to be a part of https://github.com/heartexlabs/label-studio and both sound great fit for our need. label-studio even talks about being able to annotate parts for OCR! Concern: seems to be "large" and company/startup driven. not yet sure what is their profit model and either we could rely on it going forward
https://gitlab.com/vgg/via - seems to be more truly an opensource project, but concentrates solely on labeling... points to w3c standard for annotations: https://www.w3.org/TR/annotation-model/

with either of those two it should actually be quite doable to create desired prototype

use labeler on the set of screenshots to label for
- dialog detection features (to identify the screen or modal window based on them)
- data entry items -- annotate for either "fixed placement" or "part of screenshot" and associated "data entry field"
for a "detected dialog" run OCR on the "data entry items" and place them into json output record
- as we figured in a sample with tesseract we should be able to make OCR learn fonts specifics (e.g. how to tell 0 (zero) from O (capital O)).
process those output records and possibly inform the user about problems or just display entered information.

Collect grabbed videos from the console

so we could later review/train

targetted videos: where we go and enter all kinds of data across interfaces of interest
blindly collect all user interactions for awhile. I think it would be useful though -- would provide us nice check/training dataset to work on.

we have 2nd magewell card, which should be usable

seek permission (I will take care about that)
if all good -- coordinate with mri tech and install grabbing device for the mri console (attn @andycon). Ideally should be done on the same reproin-er, so scripts should be adjusted to be able to identify specific magewell card with specific "grabbing" project.

What a solution on top #4 prototype would need

#4 from @vmdocua delivers good core recognizer "service" to work with, but needs more work on

making more reliable OCR of the font. So far we think it is the "Clinica Pro", likely "Thin" or "Light" and without anti-aliasing on the screen. Then we could train the model to become more reliable.
- may be we can find fonts in the installation (dialogs and fonts could be placed as resources in .dll/.exe but also could be shipped as .ttf files -- may be we could locate them)
radio and checkbox buttons handling (see on the model patient_registration)
more custom (masks etc) recognition of dates, units (kg, cm) etc where we know more about expected/forced format/values
custom recognition for icons (people, linkeage symbols between scans, etc) as can be seen in dot-cockpit-editor-list_1 model
performance (needs to parallelize ocr_screen across different models to choose best model according to confidence prob)
start adding tests (there are some test Blueprints)
add CI (mypy, testing, codespell, etc)
write a module to screengrab/give to recognizer