Coder Social home page Coder Social logo

davidfoerster / kaleidok-examples Goto Github PK

View Code? Open in Web Editor NEW
5.0 4.0 0.0 1.86 MB

KaleidOk invites participants to use a new kind of interactive media tool and take part in an emerging experience which explores speech recognition, media retrieval and visuals generating in a collaborative context (between people, and between people and machines).

Home Page: http://www.kaleidok.co/

Java 99.96% Batchfile 0.01% Shell 0.03%
java processing-library synesthesia art speech-processing linguistics affective-computing

kaleidok-examples's Introduction

Kaleidoscope

Note: Image discovery and retrieval is defunct since Chromatik, the web service behind this function, went defunct itself.

Setup

  1. Request a Flickr API key and consult the Configuration section on where to put it for Kaleidoscope to find.

  2. Similarly request a Google API access key and provide it to Kaleidoscope.

  3. Prepare a configuration for Kaleidoscope.

  4. Some libraries need to be set up, but I'm not going into that now.

Usage

  1. Run KaleidoscopeApp as stand-alone Java application or use the included run IntelliJ configuration.

  2. Click the recorder button in the controls window or press I to start recording from the default microphone line; click the recorder button again or press O to stop. The speech in the recorded section is transcribed, the resulting text synesthetised and the synesthetiation result used to search for images with Chromatik.

  3. If you're unwilling or unable to speak to Kaleidoscope through a microphone, use the two text fields below the recorder button in the controls window. The upper one is for a message to synesthetise, the lower one for search terms for Chromatik's image search. You can run perform a synesthetiation and image search by having the upper text field in focus and pressing ENTER/RETURN.

  4. F11 (Windows/Linux) or +F (macOS) toggles full screen display of Kaleidoscope.

Configuration

The most convenient way to configure Kaleidoscope is through the integrated configuration editor:

  1. Click on the Configuration window tool button at the bottom of the controls window.

  2. Double-click the cells in the right column of the configuration table to edit their value. Most importantly, add the required API keys (see below).

Below is a description of some the most important options. All parameters except API keys are optional and have sensible defaults.

  • Flickr > access key and Transcription service > API access key

    API access keys for Flickr and Google respectively. If the key is comprised of multiple parts, separate them with a colon (:). Example:

     e0b92403f258c35c6b43d2e21c640f9f:bd7a0f0bcc5dfc25
    

    If the value to of these parameters start with @, the remainder is interpreted as the path to a resource file containing the key, where multiple parts are separated by newline characters.

     e0b92403f258c35c6b43d2e21c640f9f
     bd7a0f0bcc5dfc25
    

    The special parameter value !MOCK results in a mock implementation of the service connectors, that doesn't use the actual service, but returns a valid, pre-recorded result. For Google's speech-to-text service the mock mode can only be set through the key com.google.developer.api.key in the properties file KaleidoscopeApp.properties.

    Note: The above example does not contain a working key, just a randomly generated look-alike.

kaleidok-examples's People

Contributors

davidfoerster avatar disastergirl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kaleidok-examples's Issues

Image downloads are slow

…for multiple reasons:

  1. always loading a fixed amount of images independently of how many are needed,
  2. sequential downloads,
  3. always choosing the the image variant with the biggest pixel area, independently of file size.

Migrate to Java 1.8

Java 1.7 is not supported any longer by Oracle, the OpenJDK project and current versions of Processing, so we need to switch to a newer version in the foreseeable future.

Preliminary tests show no serious issues when running KaleidOK with Processing 3.0a5 on OpenJDK 1.8/Linux. More program functions and platforms (esp. OracleJDK 1.8/OS X) need testing.

Remove text field

Remove the text field from the application output when speech transcription is functioning. An alternative is to print the transcription to the console for experimentation purposes.

Choose a more suitable indexed image archive

Chromatik doesn't appear to be a reliable image index, because of the lack of support by its operator Exalead. Possible alternatives are:

  • Flickr
    • well supported and reliable
    • only 16 colours available and multiple colours per query (undocumented, but available)
    • search by (visual) types and categories (undocumented, but available)
  • Imagga multi-colour search
    • not yet available, but free access, as soon as testing begins
    • intelligent colour clustering
    • full RGB range, up to 5 weighted colours per query
    • can distinguish between foreground and background colours
  • Picturelicious
    • requires self-hosting
    • only clandestine image archive use readily available
    • very simple indexing approach, no clustering
    • full RGB range and an arbitrary amount of colours per query

Keyword selection

Selecting weak keywords (be, in, as), should select the strongest.

Conflict between keyword specificity and empty query results

There is a conflict using more poetic words detected that are in the Synesketch lexicon, however are too abstract for Chromatik tags so we end up with no results.

Please implement a compromise issue queries with one or more keywords by default and in the case of empty results, lower the keyword count and try again.

Add descriptions

Add short descriptions in the form of a README file to each library and sketch.

Automatic, threshold-based speech interval detection

Automatic speech detection is needed, or an alternative if it conflicts with something else. It is too difficult to record In and Out (or just In now since the 8sec cut-off is implemented).
Pressing record In key once to start a session would be sufficient, then every 8 seconds could it loop-record? Until a final Out key when the user is finished.

Images for neutral emotions are too colourful

Emotionally neutral phrases are mapped to grey colours and associated images are supposed to be grey and of low colour saturation. At the moment they turn out too colourful.

One remedy would be to raise the influence of the (grey) colours on the image search but this reduces the result set size (sometimes drastically) which is undesirable since we need to rely on key words alone instead of emotions.

An alternative solution is to desaturate these images at some point between retrieval and display.

Audio analysis is incompatible with speech transcription

Due to limitations of the libraries (TarsosDSP and Speech-to-Text) performing these two tasks, they currently cannot be performed simultaneously.

A simple solution to this, is the implementation of the speech transcription (service) as an AudioSource instance. Because of STT's architecture, this will very likely require a rewrite, though I expect to retain the large majority of its crucial code fragments. Unfortunately the author has not released the source code of the current version of STT. If he doesn't react positively to our request, a rewrite, even though more tedious, based on earlier source code and decompiled Java byte code should be possible.

Speech transcription with Sphinx takes a long time

The transcription duration could be much better (at least on my machine), but it's about on par with Google's transcription service considering network latency. This may be tweaked with smaller training sets and specialised configuration parameters at the cost of some quality, which seems to be alright at the moment. It looks like a lot of trial and error will be necessary for that.

Resources:

Recording not enabled when on Full Screen mode

Cannot commence kaleidOk recording when running the sketch on full screen mode on internal or external displays. Also to exit the full screen I must quit application. There used to be a shortcut for getting out of full-screen.

Screenshot export

It would be nice to have the option to make “posters“ of the sketch overlaid with the supplied text message.

Possible options to distribute the resulting poster:

  • save as a file and handle the distribution manually (copy to medium or one of the options below),
  • print,
  • send as e-mail attachment,
  • upload to a web site.

Give meaning to Foobar Layer

From @Disastergirl on May 26, 2015 12:47

Options:

  • Perlin Noise here could be replaced by environmental noise- "Rustle Noise", including random pulses characterised by the rustle time (the mean interval between pulses), with little or no pitch.
  • Using the clock-face tonality scale, the movement of the triangle strip would transform
  • Harmonics?
  • more coming…

Copied from original issue: Disastergirl/Kaleidoscope#7

Image size

Retrieved images are thumbnails, too small for this application. Images should be at their full size, and resized to cover screen.

Can we make use of the Flickr API as alternative to using Chromatik?

  • Would it be more efficient?
  • Would this open up more opportunities for KaleidOK presentations?
  • We could simplify the colour palette by hand or automatically to work with the limited colour set in flickr.
    if we use the current automatic procedure there's no guarantee that it will be representative.
  • Could we make use of the other features; Orientation, Size, Texture, Patterns and Tags in our search query?

Thread for every image result

From @Disastergirl on May 26, 2015 11:37

Threading is needed to continue the animation while the requests for data continue to run.
Create a bunch of "worker threads" and have those worker threads go through a common job queue.

  • be aware of array lengths
  • bring in additional variables
  • set a new boolean to avoid errors

Copied from original issue: Disastergirl/Kaleidoscope#5

Regex Extract Noun

From @Disastergirl on May 26, 2015 11:23

Use a sentence chunker or tree parser for extracting noun phrases to enhance contextual result. Either use this to replace Max Keyword or in combination (also as "any of these words"). *This needs to be conceptualised further.

Copied from original issue: Disastergirl/Kaleidoscope#2

Translate Transcription

In which situation which text is supposed to be translated?

A user speaks in their language, this is then transcribed to text and then the text is translated into English. This translated English text is then used for the remaining processes.

How should the application determine, from which language to translate or whether there is translation necessary at all?

If automatic language detection is not possible (as is when using Google translate), then we would have to manually pre-determine which language would be used for the upcoming users. Ideally any languages other than English would be detected and translate to English. Unless we translate the whole lexicon..

Are there any dependencies or interactions with the other speech or text processing steps already performed?

Not sure.

How simple does an explicit language switch need to be during operation? I'll try to suggest some possible user interface designs with different levels of required implementation effort as questions:

  • Is an application restart for each language switch feasible?
  • Is an obscure (but documented) key combination and an interaction feedback in the terminal
  • output sufficient?
  • If neither is the case, do you see yourself capable of designing and/or implementing a user interface element fit to the task?

For this state an application restart is feasible. I just want to try it out.
Key combination would be great.
I'm not even thinking of a user interface for the task until it has been tried out.

--I'd like to do interface design for all optionable features at a later stage in the project.. next year most likely

Choose a redistribution license

While preparing a distributable package for KaleidOK I looked for a license under which we can distribute it. For this we must find a license that is compatible with all the licenses of the libraries that are distributed together with KaleidOK. At the moment we use runtime libraries with the following licenses1,2 (emphasis on the most frequently used):

  • AGPL v3 (only iText for PDF output)
  • GPL v3 (e. g. TarsosDSP)
  • GPL v2+ (e. g. Synesketch)
  • Apache License v2
  • BSD License (e. g. JOGL 2, the graphics rendering library)
  • MIT License

If we want to maintain most of KaleidOK's current functionality we're left with two options:

  1. the Affero General Public License, version 3, the most restrictive of the top of the list.

    Its additional restriction over the (non-Affero) GPL v3 are unlikely to affect the future use of KaleidOK itself3, but it makes (partial) integration into other works much more difficult because it forces them to be released under a less common and very restrictive license themselves.

  2. Alternatively we can choose GPL v3 if we remove or "decouple" the PDF export (which is not in a finished, usable state anyway).

    A possible option for decoupling is to leave the source code for PDF export in the source code repository (which doesn't include iText) but remove it and iText from the stand-alone application package.

I prefer option 2 even though it requires some work and research.


1 The list is sorted according to the “hierarchy of license compatibility”, i. e. works published under a license appearing lower on the list may be included in or used to derive works published under a license appearing higher on the list. This correlates roughly with the restrictiveness of the license.

2 There are some libraries published under LGPL variants (e. g. Processing Core and Minim) which may be bundled and redistributed (as easily separable entities with attribution and source code access) together with works under any or no license.

3 “[Compared to the GPL the AGPL] has one added requirement: if you run a modified program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the modified version running there. The purpose of the GNU Affero GPL is to prevent a problem that affects developers of free programs that are often used on servers.” (source and example)

User record feedback

Please build something that provides feedback to the user about the recorder state, that isn't displayed on the sketch.

Introduce some fuzz to frequently reappearing Chromatik requests

We have some frequently reoccurring images due to equally frequently reoccurring search requests to Chromatik. From experience we know, that these requests usually result from emotionally completely neutral phrases with grey colours and no keywords.

To prevent the most frequently reoccurring images, we should introduce some fuzz into either the search requests or the selection of result subsets. E. g., we could randomize the value of the start parameter of the search to something different than 0 in these cases.

Speech transcription fails with records of more than ~8 seconds

When sending speech records longer than roughly 8 seconds to Google's speech-to-text web service, it returns an error. No cut-off to a suitable length is performed.

Kaleidoscope should work around this limitation and cut off the recording after a suitable interval.

A preliminary implementation of such a feature is available in b508abd (branch feature/restrict-speech-record-duration) and requires further testing before merging into master.

Full screen display and multi-screen support

Please add support for the following features:

  • Full screen mode of the application for presentation purposes.
  • Moving the window to a different screen without (permanent) graphics glitches.
  • Possibly spanning the sketch over multiple screens.

Image downloads are slow

Currently, it takes a long time to download images, because

  1. they're downloaded sequentially and
  2. each image size probed performs new request.

We can solve or alleviate this through the following means:

  1. Issue requests and transfer data concurrently (preferably outside the UI event handler thread).
  2. Query the available image sizes with a single request through the Flickr API (requires developer account and an API key).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.