Coder Social home page Coder Social logo

hltcoe / turkle Goto Github PK

View Code? Open in Web Editor NEW
143.0 17.0 46.0 9.44 MB

Django-based clone of Amazon's Mechanical Turk service running in your local environment.

Home Page: https://turkle.readthedocs.io

License: Other

Python 65.14% CSS 10.78% HTML 11.26% Shell 0.13% JavaScript 2.36% Less 10.33%
amazon-mechanical-turk annotation crowdsourcing hlt labeling mechanical-turk nlp

turkle's Introduction

Turkle

Run a clone of Amazon's Mechanical Turk service in your local environment.

Turkle is implemented as a Django-based web application that can be deployed on your local network or hosted on a public server. It is compatible with Human Intelligence Tasks (HITs) from Amazon Mechanical Turk. Turkle can use the same HTML Task template files and CSV files as the MTurk requester web GUI. The results of the Tasks completed by the workers can be exported to CSV files.

Turkle's features include:

  • Authentication support for Users
  • Projects can be restricted to Users who are members of a particular Group
  • Projects can be configured so that each Task needs to be completed by multiple Workers (redundant annotations)
  • An admin GUI for managing Users, Groups, Projects, and Batches of Tasks
  • Scripts to automate the creation of Users, Projects, and Batches of Tasks
  • Docker images using the SQLite and MySQL database backends

Full documentation is available at Read the Docs.

turkle's People

Contributors

adalmia123 avatar cash avatar charman avatar cjmay avatar derekbelrose avatar exploy avatar lukeorland avatar tomlippincott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

turkle's Issues

show previously entered values when redisplaying a HIT form

Once a HIT has been annotated, it would be really, really great if the annotator could go back and look at their annotations and modify them if need be. Currently, the only way to change an annotation on a particular HIT is to completely redo the entire HIT. Also, it would be great to be able to pull back up an annotated HIT later to show someone else or to ask questions.

install fails without the wheel package installed

Fresh Python 3.5 with an older pip 8. unicodecsv does not include a pre-built wheel on PyPI so you get a failure like so:

Building wheels for collected packages: unicodecsv
Running setup.py bdist_wheel for unicodecsv ... error
Complete output from command /home/username/envs/turkle/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-build-fsr2y88g/unicodecsv/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" bdist_wheel -d /tmp/tmpv7oyp396pip-wheel- --python-tag cp35:
usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: -c --help [cmd1 cmd2 ...]
or: -c --help-commands
or: -c cmd --help

error: invalid command 'bdist_wheel'


Failed building wheel for unicodecsv

pull out common code in unit tests into small library

There is a lot of boilerplate code in the unit tests that create projects, batches, and users before performing the tests. This can be pulled out into a utility module used by the tests.

Poster: Cash Costello id: 192

How to access worker's data from turkle folder?

I wanted to access the annotations done by workers through the turkle folder. This is useful in case if the server has any issues and prevented me from downloading the results. So is there any way to do so ? Are results saved inside db.sqlite3 ?

Can view and submit a task that has been completed

  1. Create project and batch. 1 assignment per task.
  2. Complete the task
  3. Press the back button

You will see the task again and I think be able to submit it again. I assume it overwrites the previous submission. I haven't done much testing with this, but I could see annotators using the back button accidentally. You can also view tasks that have been completed with the direct URL like http://localhost:8000/turkle/task/5/assignment/7/

Poster: Cash Costello id: 130

Prevent Django version upgrade issues

It's been twice in a row that this is happened, it seems warranted to try to stop it. PR #25 freezes my current versions in the requirements file. I think this is a reasonable solution, especially given the volume of commits on the repo. N.B.: depends on #22.

parse csv input data files equivalently to Amazon Mechanical Turk.

E.g., If double quotes in the csv surround a field that contains commas, those commas aren't used as field delimiters.

Here is the only Amazon documentation on the CSV input file, which does not define the CSV file format:
http://docs.aws.amazon.com/AWSMechTurk/latest/RequesterUI/PublishingYourBatchofHITs.html

Here's a proposed CSV standard:
https://tools.ietf.org/html/rfc4180

Here's another suggestion:
https://en.wikipedia.org/wiki/Comma-separated_values#Toward_standardization

Adding a new user via script does not work

Adding a new user via the script

https://github.com/hltcoe/turkle/blob/master/scripts/add_user.py

does not work for us, even though the script says "Success" at the end.

$ python add_user.py -u bricksdont user-test-from-script password-test-from-script --server [server name]
Admin password:
Success

After that, the users overview in the GUI does not show this new user.

The exact same happens with import_users.py. We also tried logging into the machine the server is running on, then running the script with localhost.

What could be potential reasons for this?

Implement external questions

Related to #98

MTurk does not provide a UI for creating a HIT that is an External Question. You have to use their API for that.

The data structure for creation is described here: https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_ExternalQuestionArticle.html

ExternalURL looks to be the key parameter.

MTurk adds these query parameters to the URL: assignmentId, hitId, turkSubmitTo, and workerId
When previewing a hit, assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE

The HIT is submitted to https://www.mturk.com/mturk/externalSubmit

It must be a POST and include assignmentId.

Poster: Cash Costello id: 135

turkle using mysql does not handle 4 byte emojis

Original comment: https://gitlab.hltcoe.jhu.edu/research/turkle/issues/173#note_28038

Pasting ๐Ÿ˜ƒ into a text field for a project results in a MySQL error. Doing this same in a CSV or in a field on a task does not create an error. The error for the project template field is 1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x83' for column 'html_template' at row 1"

The emoji is saved as \ud83d\ude03 in the database if submitted in a form.

Poster: Cash Costello id: 175

Button "Save and add another" user does not work

While creating a new user in the GUI, after filling in all relevant fields, clicking

Save and add another

does not work as intended. It takes you back to the overview of all users, instead of opening an empty form for a new user, as I was expecting.

Downloading results via script fails

Downloading results for all batches results gives me the following error:

$ python download_results.py -u [admin user name] --server [server address]
Admin password:
Traceback (most recent call last):
  File "download_results.py", line 21, in <module>
    result = client.download(args.dir)
  File "/Users/mathiasmuller/Desktop/turkle/scripts/client.py", line 13, in wrapper
    return func(*args, **kwargs)
  File "/Users/mathiasmuller/Desktop/turkle/scripts/client.py", line 70, in download
    for row in soup.find('table', id='result_list').tbody.findAll('tr'):
AttributeError: 'NoneType' object has no attribute 'tbody'

Looking at the HTML content in soup at

https://github.com/hltcoe/turkle/blob/master/scripts/client.py#L69

it seems the script never makes it past the login page. An excerpt:

<div class="form-row">
<label class="required" for="id_password">Password:</label> <input id="id_password" name="password" required="" type="password"/>
<input name="next" type="hidden" value="/admin/turkle/batch/"/>
</div>
<div class="submit-row">
<label>ย </label><input type="submit" value="Log in"/>
</div>
</form>
</div>
<br class="clear"/>
</div>

I am using current master, and my password is definitely correct.

Thanks so much for your help.

sign up option

Hello,

Is there an option to make people sign up themselves and then access the actives batches ?

In fact I would like to ask some people to a listening test by sending them a link from which they could do it. Something like a google form with audio samples.

But in turkle, an account has to be created for each of them before by an admin. Except if there is a way to put a sign up option and then they would do the test.

There is also the option of not using the login. But then I cannot make each people to do a list of questions.

Thanks

Enable User-level access control

Currently, Turkle restricts access at the Group level.

Based on discussions with @ateichert and @vandurme, there are use cases where access to Tasks should be restricted at the individual User level, instead of the Group level.

Poster: Craig Harman id: 128

Versioning of templates

  1. Keep each version of the template for a project with a unique identifier
  2. Record the version of the template used for each task
  3. Include this template ID in the output CSV file
  4. Change the batch download to include the CSV file and all templates used for that batch

charman vandurme I believe this summarizes what we discussed through email.

Poster: Cash Costello id: 156

Add resumable projects

Support users starting a task, saving intermediate results, and then coming back to it later.

  1. The admin UI needs a new checkbox for resumable
  2. This option conflicts with anonymous annotators so need client side/server side checks on that
  3. Need to think about how task expiration fits with this. Maybe no expiration for resumable tasks or expiration does not affect tasks that have intermediate results
  4. Exporting a batch should not include intermediate results
  5. Requestors changing the template could break loading intermediate results. No way around this. Just need to document this and maybe provide a warning when updating a project that has any intermediate results.
  6. Loading intermediate results is not trivial. Template builders may need to build special code to populate the form with the intermediate results. Need to consider that we have html only templates that only use input controls and richer templates that have plenty of javascript or store results as data attributes on the DOM or in memory.

Also, this would be the first feature that makes our templates incompatible with MTurk. I believe the goal would be to continue to support MTurk templates but offer a superset of features. So we won't require anything in a template that would makes use incompatible but we will add optional features to better support our use cases.

Poster: Cash Costello id: 171

Example

It would be really nice if there were an example subdirectory with an example template and set of HITs.

HIT template reload

I'm using Turkle for development, and it would be extremely helpful to me if there was an easy way to "reload" the HIT template for a batch. That is, I'm recreating a project + batch every time I want to check the template, a multi-step process; if I could instead sync the template for a batch to a file on disk, it would make my development cycle much faster. (It would help if there were just a "reload" button or script to press, but I could even imagine logic going in the view to re-read a specified file from disk every time.)

This would definitely add complexity, and I don't know if anyone does, or ever would, use Turkle in a similar way to how I'm using it, so this might not be worth the costs. And maybe there's a better way to develop HIT templates that I just haven't realized?

Thanks :)

Allow users to view contributions

It would be helpful if users had access to a page where they could see how information such as how much time they'd spent on the system, how many tasks they'd done, and so on, perhaps broken down by week or month.

Add a form builder

WYSIWYG editor for forms similar to Google Forms. This allows people who don't know HTML to create simple form-based templates.

Poster: Cash Costello id: 169

Error due to missing trailing comma

The TEMPLATE_DIRS variable in turkle/settings.py needs a comma after its one item, to ensure that it's interpreted as a tuple: otherwise manage.py throws an error.

OverflowError when running migrate on Windows

When I run python manage.py migrate in an anaconda environment with the requirements recently installed on Windows 10, I get this stack trace:

Traceback (most recent call last):
  File "manage.py", line 29, in <module>
    execute_from_command_line(sys.argv)
  File "C:\ProgramData\Anaconda3\envs\coref-annotation\lib\site-packages\django\core\management\__init__.py", line 364, in execute_from_command_line
    utility.execute()
  File "C:\ProgramData\Anaconda3\envs\coref-annotation\lib\site-packages\django\core\management\__init__.py", line 338, in execute
    django.setup()
  File "C:\ProgramData\Anaconda3\envs\coref-annotation\lib\site-packages\django\__init__.py", line 27, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "C:\ProgramData\Anaconda3\envs\coref-annotation\lib\site-packages\django\apps\registry.py", line 108, in populate
    app_config.import_models()
  File "C:\ProgramData\Anaconda3\envs\coref-annotation\lib\site-packages\django\apps\config.py", line 202, in import_models
    self.models_module = import_module(models_module_name)
  File "C:\ProgramData\Anaconda3\envs\coref-annotation\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "C:\Users\cjmay\Documents\turkle\turkle\models.py", line 23, in <module>
    csv.field_size_limit(sys.maxsize)
OverflowError: Python int too large to convert to C long

Creating a new user: fields are pre-populated

While creating a new user with the GUI, annoyingly, the user name and password field are pre-populated with my own username and password.

Also, after clicking Save, my browser asks me to update my stored credentials, which is also annoying.

I don't know enough to know if 1) this is a problem with my browser settings or 2) a bug on Turkle's side.

Could you please clarify? Thanks so much.

Add template library

MTurk has a set of starting templates for common tasks

image

Poster: Cash Costello id: 170

mechanism for returning to and editing completed HITs

@charman @cash @vandurme

Is there such a mechanism, perhaps related to the CADET-style correction of existing NER tags etc? What I'm envisioning is: tasks have a boolean switch (e.g. "persistent") that, if set, keeps it in a user's landing page even if they've completed it, so they can go back and edit.

If not, does anyone see a particular reason not to have this functionality, or that it would be difficult to implement? If not, I'd take a shot at it.

FYI this is so that humanists can do the sort of annotation they're used to, but with very easy pivots to crowdsourcing (and the huge advantage of having people willing and able to implement interfaces for new data etc).

Hangs on HIT submission

When I use the template at the end of this issue, with the following batch file:

image_url
https://epsilon.aeon.co/images/2aadc0ca-7531-4d00-a9ef-0ff27280499c/idea_sized-regimentofprinces-tl.jpg
https://bloximages.newyork1.vip.townnews.com/eastoregonian.com/content/tncms/assets/v3/editorial/c/e5/ce526a60-d0cb-11e9-8f09-c7c27e38cb0b/5d729759017cc.image.jpg

it seems to work fine under the prototurk tool (I can add boxes, submit, and the generated JSON has the bounding coordinates etc). But in Turkle, when I submit, it hangs on "Loading next HIT..." (and the admin interface shows that the HIT has not been submitted).

<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<crowd-form answer-format="flatten-objects">
    <crowd-bounding-box 
        src="${image_url}"
        labels="['Text', 'Dog', 'Bird']"
        header="Draw bounding boxes around the requested items"
        name="annotatedResult">
        <short-instructions>Draw boxes around the requested target of interest.</short-instructions>
        <full-instructions header="Bounding Box Instructions">
            <p>Use the bounding box tool to draw boxes around the requested target of interest:</p>
            <ol>
              	<li>Draw a rectangle using your mouse over each instance of the target.</li>
                <li>Make sure the box does not cut into the target, leave a 2 - 3 pixel margin</li>
               	<li>When targets are overlapping, draw a box around each object, include all 
                      contiguous parts of the target in the box. Do not include parts that are completely 
                      overlapped by another object.</li>
               	<li>Do not include parts of the target that cannot be seen, even though you think you 
                      can interpolate the whole shape of the target.</li>
               	<li>Avoid shadows, they're not considered as a part of the target.</li>
               	<li>If the target goes off the screen, label up to the edge of the image.</li>
            </ol>
        </full-instructions>
    </crowd-bounding-box>
</crowd-form>
<input type="hidden" name="answer">

Initial step migrate hangs

I'm installing Turkle on a new server, steps:

  • virtualenv with Python3, activated it, installed all requirements
  • cloned current master from Github

Then, for some reason,

python manage.py migrate

hangs indefinitely. It also cannot be stopped with Ctrl+C, needs to be killed with SIGKILL. Then I also do not get a traceback.

On my local computer, I did not have this problem; migration went smoothly.

Are there any requirements I am not aware of? Does migrate need write/read permissions in certain places?

Allow per-project static file uploads

This is one possible solution to a common issue, but I'm not certain it's the right solution. Feedback appreciated!

Turkle, like Mechanical Turk, only allows a single HTML template file per project. While third-party libraries (jQuery, Bootstrap) can be accessed via links to third-party CDNs, any project-specific CSS and JavaScript needs to be in the single HTML template file. There is no support for project-specific image files.

When Turkle is used offline machines, the third-party CDNs are inaccessible, and users need to resort to things like copy-pasting entire jQuery libraries into an HTML template.

If a web application is built using a framework like React, there are potentially a lot of project-specific CSS and JavaScript files, and copying the contents of these files into a single HTML template adds complexity to the development process.

One way to address these copy-pasting issues is to allow Turkle users to upload static files that are tied to a specific project. In order to prevent path collisions, we would need to introduce a new Turkle-specific template variable - e.g. $TURKLE_STATIC_PATH. When the template is rendered, the Turkle template variable would be replaced with with a Django-generated, project-specific URL prefix.

This behavior would obviously diverge from MTurk's behavior, but could make it easier to support offline environments.

CC: ccostello dpennell vandurme

Poster: Craig Harman id: 143

Add API to Turkle

Currently, we have a few scripts that parse html which is less than ideal. We should replace those with an API similar to mTurk.

Documentation on their API: https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_OperationsArticle.html

I suggest we prototype a subset of the methods around HIT management (create, list, delete).

I'd like to try to maintain parameter compatibility with mTurk so that our scripts/clients are compatible. This likely means leaving several parameters per method as dummy parameters for things like managing paying turkers.

We would also need to implement the same style of user authentication for scripts to be interoptable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.