Comments (21)
I notice that mTurk uses xml as the data format, but Turkle uses CSV with json formatted data. Will this create a problem for us when it comes to reusing a prebuilt mTurk client? It would be necessary to either replace the csv format or to write a data emulation layer, ie for 2-way XML <-> CSV+JSON translation. Personally I hate xml and prefer working with csv+json due to its simplicity, but it depends on your project goals -- how much do you want to clone mTurk in every aspect?
https://micropyramid.com/blog/how-to-convert-xml-content-into-json-using-xmltodict/
from turkle.
Usually in Django (using the Django Rest Framework or DRF), your API resources are mapped to your Models through serializers. I believe we would normally map out the existing models like this, but Turkle's application design seems really different than mTurk:
- class Task
- class TaskAssignment
- class Batch
- class Project
Example:
GET /tasks # Returns a list of tasks
GET /tasks/<id> # Returns information for a specific task
POST /tasks # Create a new task
PUT /tasks/<id> # Completely modifies a specific task
PATCH /tasks/<id> # Partially updates a specific task
DELETE /tasks/<id> # Remove a specific task
So, we would need to build a bunch of custom serializers using the DRF, as an application layer which would respond to mTurk client requests, in order to access the existing models.
from turkle.
mTurk uses xml for the question parameter when creating a hit using their API (https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_CreateHITOperation.html). Have you noticed it anywhere else?
from turkle.
These API actions require sending and/or receiving XML payloads:
- CreateHIT
- CreateHITWithHITType
- CreateQualificationType
- GetAssignmentsForHIT
- GetQualificationRequests
- GetQualificationType
- UpdateQualificationType
Here is a list of data structures which are in XML. https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_SchemaLocationArticle.html
We would need to decide how much of this to support / not support. For example, there is a pretty elaborate xml templating language which we may not support (in Formatted Content: XHTML doc), because plain HTML/javascript templates work pretty well anyway...
from turkle.
We're not interested in qualifications. Is the same true for your use cases?
GetAssignmentsForHIT is deprecated so that leaves CreateHIT and CreateHITWithHITType. Right now, we are only using html/javascript templates and are happy with them.
from turkle.
Turkle currently cares about Group membership, which might be functionally equivalent to the mTurk notion of a Qualification. Only Users who (are part of a particular Group|have a particular Qualification) can work on a particular task.
from turkle.
But, for our use cases, Group membership is always assigned by an Admin. We don't have a need for a process where Users complete a Qualification Task that is then reviewed to determine if they would be (assigned a Qualification|added to a Group).
I do think we want the API to support the programmatic creation of Tasks that are restricted to specific Groups, so limited API support for "Qualifications" (to the extent that they provide the same functionality as Groups) may be worthwhile.
from turkle.
I've been assuming that one motivation for implementing the mturk API is using boto as the client. Looks like boto has a schema file per service: https://github.com/boto/botocore/blob/develop/botocore/data/mturk/2017-01-17/service-2.json
The schema file is then used by a validator. This is fine as long as we use a subset of their API. But if we add to it by adding a Group parameter to CreateHIT, it won't validate. I haven't actually tried this yet.
from turkle.
Seen in the clear light of day, the more I read the mTurk specs., the more I see an impedence mismatch to Turkle's specs. How would these Turkle script operations map to the mTurk api operations?
- add_user
- import_users
- upload_tasks
- download_results
I'm thinking that, maybe we should just let Turkle "be who he is", meaning we can write a custom api and client that allows us our feature set with the minimal fuss. This may be the shortest development path, even with the added burden of writing our own customized api client software. Thoughts?
from turkle.
add_user and import_users are both operations that are not supported on mTurk. They have a different user management approach.
upload_tasks is the same as create HITs. download_results is the same as get assignments.
I'm not quite ready to develop our own API, but am certainly open to that. I'd like to see if I can pass a custom service definition to boto and use it. I have a lot of meeting today, but will try to squeeze that in.
from turkle.
I've been working on an issue that we discovered related to unicode characters. I hope to get back to this soon...
from turkle.
I've been able to get the boto client to work with my mock mturk site. I had to pass an endpoint_url to the client. It also required a fake region and a fake aws access token and secret. The access token and key are used for authentication so we would have to implement the same authentication system in turkle.
I added a parameter not in their spec and as expected, the boto validator failed. I haven't seen a way to turn that off. I then grabbed the mturk service definition, modified it, set an environment variable, and it worked. So we would be able to add parameters and methods to their API without much trouble.
Still not sure this is worth it. I'm checking up on the authentication code next - hoping it is some standard like OAuth.
from turkle.
Hi Cash and Craig,
I hope you are both well. I've been studying up on Django api's and clients. I think we could possibly use code generators in order to do the heavy lifting for both the server and client development. The generated code would have nice standard code design across all models and views, etc. It could eliminate weeks of trial and error, if all goes as promised....
- Django Rest Framework (DRF) is the defacto API module, with the best support and best documentation. It can do everything you would want a django rest api to do, including auth. I think it makes sense to use it rather than try to roll our own. https://github.com/encode/django-rest-framework
- There is a project for generating DRF api code! https://github.com/Brobin/drf-generators
- Here is a project to generate Swagger API documentation from a DRF project (in just a few additional lines), https://github.com/marcgibbons/django-rest-swagger/
- Swagger client generator can read the generated docs which then allows us to automatically generate Client code, in python and or multiple other languages, too. https://github.com/swagger-api/swagger-codegen
Thoughts?
from turkle.
Thanks @cfortune - I took a look at drf-generators and it seems to want to blow away the views.py file in the turkle app to do its magic. Maybe it is intended for API only sites? Using the generator may not be possible for a HTML first site.
I'm starting to read up on DRF - specifically applications that already have HTML views and want to add an api.
from turkle.
Hi @cash , drf-generator lets you choose which types of serializers to generate, but I think it assumes you will generate your files at the beginning on an empty project, so, ya, it would blow away existing files. Maybe the way to use it is to let it blow away all the files, then we merge those files with the existing Turkle functions. Git merge tools should allow for that. It would be great to make contact with the authors of drf-generators project to get their input on modifying Turkle.
from turkle.
drf-generators created an API for CRUD operations on projects, batches, tasks, and task assignments. I believe we would only want to keep the methods for projects and batches. I'm not expecting the workers to use the API so having methods for working with tasks and task assignments don't make as much sense to me.
Maybe its possible to create another app called api that imports the model from turkle for projects and batches and create the API using that.
from turkle.
Maybe its possible to create another app called api that imports the model from turkle for projects and batches and create the API using that.
I think that is probably the right approach.
I assume we could import the user and group models too, from django admin, for use by drf auth, and limit actions via a drf group, or add drf permissions (read only, read/write, no access).
from turkle.
brobin, author of drf-generators wrote this:
If the files already exist (urls, views, etc.) they would overwrite existing code. It will warn you before overwriting. You could always run it and then merge back your existing stuff.
from turkle.
After looking through the generated code, I'm less interested in this. It was really simple code that doesn't save that much time over doing it yourself with DRF.
I'd like to get a list of design requirements for the API. On our side we want support for:
- Managing user accounts and groups
- Creating projects and batches
- Monitoring progress
- Downloading results
The above list does not include
- Assigning a task to a specific user
- CRUD operations on tasks (right now this is done at the batch level)
- Completing assignments
@charman Do you have any comments on the above list?
@cfortune What are your highest priority items
from turkle.
That's too bad about drf-generators, I thought they would have more introspection of the models and would generate more code. It still may be worthwhile to use them in order to create a nicely scoped scaffold initially.
The highest priority item for us is batch/task management. We can create projects, batches, and do user management manually via crud, because they won't change much over time. Our AI system will be hitting the API day and night, though. I would be interested in the ability to do rest operations on individual tasks rather than on a batch of tasks as a whole. For example:
- put one or more tasks to an existing batch
- get and delete (archive) all completed tasks (in one step).
Can we simply reuse existing batches, or does the program assume that each new collection of tasks will need a new batch?
from turkle.
Using the html interface, you are restricted to one time batch creation. It doesn't support adding new tasks to a batch - at least not currently.
The mTurk API is set up to work like you describe. It doesn't have the concept of batches. I'm looking through the code to see what assumptions we made on this.
from turkle.
Related Issues (20)
- Upgrade to Django 3.2
- spacing above inserted submit button
- Add meta tags support
- Minor issues integrating as app in larger projects HOT 21
- Template namespace clashes HOT 1
- replace bootstrap-datepicker with html5 date
- warning when filtering stats by date when timezone use is on
- filtering dates is inclusive on the start date and exclusive on the end date HOT 1
- Deployment using PostGRES HOT 1
- Rename "master" to "main"
- Add download instructions to documentation
- Can the user go back and change the submitted annotation? HOT 2
- Upgrade to Django 4 HOT 1
- namespace turkle's URLs for reverse lookup
- Users have field is_active and Projects have field active
- Projects/Batches can be edited so that login required is false and custom permissions is on
- Increase template size limit on hltcoe.turkle.org HOT 7
- Compatiliby with AMT Crowd HTML Elements HOT 4
- Add project ID and batch ID to export CSV
- Make it easier to find batches that annotators have been working on HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from turkle.