Coder Social home page Coder Social logo

localturk's Introduction

CircleCI

localturk

Local Turk implements Amazon's Mechanical Turk API on your own machine.

It's handy if you want to:

  1. Develop a Mechanical Turk template
  2. Do some repetitive tasks on your own, without involving Turkers.

You could use it, for instance, to generate test and training data for a Machine Learning algorithm.

Quick Start

Install:

npm install -g localturk

Run:

cd localturk/sample
localturk transcribe.html tasks.csv outputs.csv

Then visit http://localhost:4321/ to start Turking.

Templates and Tasks

Using Local Turk is just like using Amazon's Mechanical Turk. You create:

  1. An HTML template file with a <form>
  2. A CSV file of tasks

For example, say you wanted to record whether some images contained a red ball. You would make a CSV file containing the URLs for each image:

image_url
http://example.com/image_with_red_ball.png
http://example.com/image_without_red_ball.png

Then you'd make an HTML template for the task:

<img src="${image_url}" />
<input type=radio name=has_button value="yes" /> Has a red ball<br/>
<input type=radio name=has_button value="no" /> Does not have a red ball<br/>

Finally, you'd start up the Local Turk server:

$ localturk path/to/template.html path/to/tasks.csv path/to/output.csv

Now you can visit http://localhost:4321/ to complete each task. When you're done, the output.csv file will contain

image_url,has_button
http://example.com/image_with_red_ball.png,yes
http://example.com/image_without_red_ball.png,no

Image Classification

The use case described above (classifying images) is an extremely common one.

To expedite this, localturk provides a separate script for doing image classification. The example above could be written as:

classify-images --labels 'Has a red ball,Does not have a red ball' *.png

This will bring up a web server with a UI for assigning one of those two labels to each image on your local file system. The results will go in output.csv.

For more details, run classify-images --help.

Tips & Tricks

It can be hard to remember the exact format for template files. localturk can help! Run it with the --write-template argument to generate a template file for your input that you can edit:

localturk --write-template tasks.csv > template.html

When you're going through many tasks, keyboard shortcuts can speed things up tremendously. localturk supports these via the data-key attribute on form elements. For example, make yourer submit button look like this:

<input type="submit" name="result" value="Good" data-key="d">

Now, when you press d, it'll automatically click the "Good" button for you. Note that this feature is not available on mechanical turk itself!

Development

To make changes to localturk, clone it and set it up using yarn:

yarn

You can run localturk.ts or classify-images.ts directly using ts-node:

ts-node localturk.ts path/to/template.html path/to/tasks.csv path/to/output.csv

To type check and run the tests:

yarn tsc
yarn test

To publish a new version on npm, run:

yarn tsc
yarn publish

localturk's People

Contributors

danvk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

localturk's Issues

Build a standalone binary for image classification

I often use localturk for some kind of image classification task. I might use it more often if this use case were pulled out into a standalone binary.

For example:

ls *.png | shuf | head -50 | classify --options Yes,No,Maybe > results.txt

Cannot find module 'escape-html'

Hi Dan,

Thanks for all update.
Updating my local version with sudo npm install -g localturk worked ok.

After upgrade below error has been :

module.js:544
    throw err;
    ^

Error: Cannot find module 'escape-html'
    at Function.Module._resolveFilename (module.js:542:15)
    at Function.Module._load (module.js:472:25)
    at Module.require (module.js:585:17)
    at require (internal/module.js:11:18)
    at Object.<anonymous> (/usr/local/lib/node_modules/localturk/node_modules/errorhandler/index.js:18:18)
    at Module._compile (module.js:641:30)
    at Object.Module._extensions..js (module.js:652:10)
    at Module.load (module.js:560:32)
    at tryModuleLoad (module.js:503:12)
    at Function.Module._load (module.js:495:3)

I've fixed it by running:
sudo apt-get install node-escape-html

Having said that, I've noticed that npm installed file points to ".js" file and not the new ".ts".

Regards.

Line numbers should match up

If localturk stuck all its HTML and the first line of the template HTML on the first line of the output, then the line numbers would match up between the output and the template. This would be really handy!

Show a warning when nothing is submitted

It's easy to forget to put the name="..." attribute on your form items. Doing so can be a total disaster—I just labeled ~100 images into a black hole and could have used a warning! :(

Perhaps a warning could show up on the HTML page (and on the console) when no new columns are written to the output CSV.

line value in template for input processed incorrectly

The requirement is to read filename and default value for text input from tasks file.

Creating tasks.csv with:

input1,line1
./0001/010001.bin.png,abc
./0001/010002.bin.png,def

and template with:

<style>
input{font-size: x-large;}
</style>

<img src="${input1}" /><br>
<input type="text" name="line1" size=60 value=${line1} /><br>

Gives almost expected results.

Error is raised:
1 / 4 No new keys in output. Make sure your <input> elements have "name" attributes

Incorrect data is stored in output.csv
HTML for first task is generated correctly (has correct "value") but when submit it generates incorrect output to be stored and returns back to first line. Counter is incremented but looped and never gets to next item.

Running local turk on http://localhost:4321
{ input1: './0001/010001.bin.png', line1: 'abc' }
Saved {"input1":"./0001/010001.bin.png","line1":["abc","changed_abc_to_this"]}
{ input1: './0001/010001.bin.png', line1: 'abc' }

Note that the original value was "abc" and modified prior submission to "changed_abc_to_this".
This is then stored as separate columns and localturk returns to first line again.

This is more a bug now than a RFE, hence as opening as new issue as bug and closing original RFE.

Is there parameter to specify port number

Dear Dan,

Many thanks for great tool - localturk.

Trying to run 2 instances but don't know how to specify different port number.

Is changing port number implemented already?

Or can you point me to possible place to add this functionality?

BR,
Jimmy

install failure on LinuxMint/Ubuntu

Hi Dan,

Many thanks for great tool - localturk.
Trying to install it on LinuxMint v18.2 (Sonya) with "sudo npm install localturk -g" I'm getting following error:

david@DP55L ~/OCR/localturk $ sudo npm install localturk -g
npm ERR! path /usr/local/lib/node_modules/localturk/dist/localturk.js
npm ERR! code ENOENT
npm ERR! errno -2
npm ERR! syscall chmod
npm ERR! enoent ENOENT: no such file or directory, chmod '/usr/local/lib/node_modules/localturk/dist/localturk.js'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent 

npm ERR! A complete log of this run can be found in:

And log shows:

...
1698 verbose stack Error: ENOENT: no such file or directory, chmod '/usr/local/lib/node_modules/localturk/dist/localturk.js'
1699 verbose cwd /home/user/
1700 verbose Linux 4.10.0-40-generic
1701 verbose argv "/usr/bin/node" "/usr/local/bin/npm" "install" "localturk" "-g"
1702 verbose node v9.2.1
1703 verbose npm  v5.6.0
1704 error path /usr/local/lib/node_modules/localturk/dist/localturk.js
1705 error code ENOENT
1706 error errno -2
1707 error syscall chmod
1708 error enoent ENOENT: no such file or directory, chmod '/usr/local/lib/node_modules/localturk/dist/localturk.js'
1709 error enoent This is related to npm not being able to find a file.
1710 verbose exit [ -2, true ]

I've tried creating localturk under node_modules folder without luck. Tried that but trust it should be automatically created.

classify-images captures Cmd-1

It should ignore keypresses when modifier keys are down. Cmd-1 is intended to be a tab change, but it's treated as "choose label 1".

Make `-q` the default

There's no reason for localturk to keep running after you've completed all the tasks.

Question: does localturk save partial progress?

I'm just wondering if I have a really large dataset (and I do), will it be possible to partially annotate it using localturk, but saving the output.csv file, and later run it again to annotate more data and expand the existing training set without starting over?

Unhandled 'error' event

Trying to run on Windows 10, with the latest version of Node (fresh install), I see a unhandled error event.

C:\Data>classify-images -o test.csv -l "ok,bad" Washington10.png
Running  localturk --static-dir . C:\Users\Greg\AppData\Local\Temp\2020928-7916-u1cm55.xcmw.html C:\Users\Greg\AppData\Local\Temp\2020928-7916-14ghlsf.1fkz.csv test.csv
events.js:292
      throw er; // Unhandled 'error' event
      ^

Error: spawn localturk ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:269:19)
    at onErrorNT (internal/child_process.js:465:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
Emitted 'error' event on ChildProcess instance at:
    at Process.ChildProcess._handle.onexit (internal/child_process.js:275:12)
    at onErrorNT (internal/child_process.js:465:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: -4058,
  code: 'ENOENT',
  syscall: 'spawn localturk',
  path: 'localturk',
  spawnargs: [
    '--static-dir',
    '.',
    'C:\\Users\\Greg\\AppData\\Local\\Temp\\2020928-7916-u1cm55.xcmw.html',
    'C:\\Users\\Greg\\AppData\\Local\\Temp\\2020928-7916-14ghlsf.1fkz.csv',
    'test.csv'
  ]
}

C:\Data>

We have reproduced this across three different machines now. Any suggestions?

I am not able to run localturk (ubuntu 14.10)

I followed this link from your writeup on "Training an Ocropus OCR model". You have done a great job in explaining the details! I tried using your localturk but it doesn't seem to work for me. I am using ubuntu 14.10. When I run it, I get "/usr/bin/env: node: No such file or directory". Do you know why this might happen?

Add a "back" feature

It would be nice if it were possible to recover from accidental submits without hand-editing the output CSV file.

RFE: allow OCRed suggestions

Not really sure how to title it properly, but idea is more or less, that as part of training, it might be worth to have capability that job input would contain also OCRed text based on currently available model.

This would be extremely helpful in case of updating model and long texts, where existing model would be making just small mistakes.
Instead of retyping full text it might be way easier to check it visually and correct prior submit.
One more check box in such scenario would be required.

This check box would be by default un-checked.
Typing anything in the text box (including any changes should box be pre-filled with OCRed text) would automatically tick the checkbox.
Should no changes be made in the textbox (i.e. it would contain correctly recognized text) it would be required to manually tick the checkbox, confirming text has been correctly recognized and reviewed. This is to avoid situation where "turk" would by accident submit mistakenly reviewed text (or due to laziness).

This is extremely helpful in case of repeating lines of text, i.e. training Ocropus on different fonts.

I can see pros and cons for the approach as it can lead to some mistakes but overall value seem to be very good.

Hopefully I'm not over-complicating anything.

Thanks for help if possible.

License?

I'd like to adapt some of this for use in a project at the Internet Archive. I am not seeing any license information in this repository, is it under Apache 2.0 or MIT by any chance?

classify-images doesn't work with absolute paths

If you run something like:

classify-images /Users/danvk/Downloads/images/*.jpg

Then you'll get something like <img src="/Users/danvk/Downloads/images/0000.jpg">, which doesn't work. classify-images could be smart enough to serve the images from that directory.

--static_dir . and work with index.html as a template

If you set --static_dir . and name your template file index.html, then localturk will serve your template without filling in the values. This is quite surprising!

Better behavior would be to have the / handler take precedence over the static file handler.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.