API calls for OpenMonarch currently depend on HTTP requests to Replicate, which hosts the model on their own website. This adds an extra layer of complexity for the user, as they need to generate an API key and are limited in the number of requests they can make.
On the long run, model also needs to be continually trained, as it currently struggles with more abstract images or identifying details in images where the subject matters (e.g., celebrities in newspapers, explanatory diagrams, etc.)
Currently, OpenMonarch only detects images with a src attribute. This leaves out images that are presented through a div, iframe, etc. There should be a greater scope in terms of targeted images.
Images with any format other than .png, .jpg, or .gif are also ignored (due to Replicate limitations). If the model is moved locally, there should be some way to convert as many image formats as possible to those understood by the model.
Currently, content-script provides generated image descriptions as an alert to the user (which is semi-accessible to screen readers). However, it should also add the description as alt text for the image, as that will reduce the number of calls to the API needed and streamline the captioning process.
On the long run, descriptions could also be cached, so that images which have been described before don't need to go through the same process again.
["<all_tabs>"] statement in match area for content-script delays review in Chrome Web Store and creates security vulnerabilities. activeTab can be used in conjunction with scripting to inject content scripts whenever a command is called.