davidfoerster / kaleidok-examples Goto Github PK

KaleidOk invites participants to use a new kind of interactive media tool and take part in an emerging experience which explores speech recognition, media retrieval and visuals generating in a collaborative context (between people, and between people and machines).

Home Page: http://www.kaleidok.co/

Java 99.96% Batchfile 0.01% Shell 0.03%

java processing-library synesthesia art speech-processing linguistics affective-computing

kaleidok-examples's Introduction

Kaleidoscope

Note: Image discovery and retrieval is defunct since Chromatik, the web service behind this function, went defunct itself.

Setup

Request a Flickr API key and consult the Configuration section on where to put it for Kaleidoscope to find.
Similarly request a Google API access key and provide it to Kaleidoscope.
Prepare a configuration for Kaleidoscope.
Some libraries need to be set up, but I'm not going into that now.

Usage

Run KaleidoscopeApp as stand-alone Java application or use the included run IntelliJ configuration.
Click the recorder button in the controls window or press I to start recording from the default microphone line; click the recorder button again or press O to stop. The speech in the recorded section is transcribed, the resulting text synesthetised and the synesthetiation result used to search for images with Chromatik.
If you're unwilling or unable to speak to Kaleidoscope through a microphone, use the two text fields below the recorder button in the controls window. The upper one is for a message to synesthetise, the lower one for search terms for Chromatik's image search. You can run perform a synesthetiation and image search by having the upper text field in focus and pressing ENTER/RETURN.
F11 (Windows/Linux) or ⌘+F (macOS) toggles full screen display of Kaleidoscope.

Configuration

The most convenient way to configure Kaleidoscope is through the integrated configuration editor:

Click on the Configuration window tool button at the bottom of the controls window.
Double-click the cells in the right column of the configuration table to edit their value. Most importantly, add the required API keys (see below).

Below is a description of some the most important options. All parameters except API keys are optional and have sensible defaults.

Flickr > access key and Transcription service > API access key

API access keys for Flickr and Google respectively. If the key is comprised of multiple parts, separate them with a colon (:). Example:
```
 e0b92403f258c35c6b43d2e21c640f9f:bd7a0f0bcc5dfc25
```
If the value to of these parameters start with @, the remainder is interpreted as the path to a resource file containing the key, where multiple parts are separated by newline characters.
```
 e0b92403f258c35c6b43d2e21c640f9f
 bd7a0f0bcc5dfc25
```
The special parameter value !MOCK results in a mock implementation of the service connectors, that doesn't use the actual service, but returns a valid, pre-recorded result. For Google's speech-to-text service the mock mode can only be set through the key com.google.developer.api.key in the properties file KaleidoscopeApp.properties.

Note: The above example does not contain a working key, just a randomly generated look-alike.

kaleidok-examples's People

Contributors

Stargazers

Watchers

kaleidok-examples's Issues

Image downloads are slow

…for multiple reasons:

always loading a fixed amount of images independently of how many are needed,
sequential downloads,
always choosing the the image variant with the biggest pixel area, independently of file size.

Migrate to Java 1.8

Java 1.7 is not supported any longer by Oracle, the OpenJDK project and current versions of Processing, so we need to switch to a newer version in the foreseeable future.

Preliminary tests show no serious issues when running KaleidOK with Processing 3.0a5 on OpenJDK 1.8/Linux. More program functions and platforms (esp. OracleJDK 1.8/OS X) need testing.

Only detecting Neutral or Confidence

For at least 20 tries I only seem to be having neutral or confidence detected in no matter what I say.

Remove text field

Remove the text field from the application output when speech transcription is functioning. An alternative is to print the transcription to the console for experimentation purposes.

Choose a more suitable indexed image archive

Chromatik doesn't appear to be a reliable image index, because of the lack of support by its operator Exalead. Possible alternatives are:

Flickr
- well supported and reliable
- only 16 colours available and multiple colours per query (undocumented, but available)
- search by (visual) types and categories (undocumented, but available)
Imagga multi-colour search
- not yet available, but free access, as soon as testing begins
- intelligent colour clustering
- full RGB range, up to 5 weighted colours per query
- can distinguish between foreground and background colours
Picturelicious
- requires self-hosting
- only clandestine image archive use readily available
- very simple indexing approach, no clustering
- full RGB range and an arbitrary amount of colours per query

Keyword selection

Selecting weak keywords (be, in, as), should select the strongest.

Conflict between keyword specificity and empty query results

There is a conflict using more poetic words detected that are in the Synesketch lexicon, however are too abstract for Chromatik tags so we end up with no results.

Please implement a compromise issue queries with one or more keywords by default and in the case of empty results, lower the keyword count and try again.

Background Texture: Noise, Gradient or Hooloovoo

From @Disastergirl on May 26, 2015 11:33

Develop concept and find solution for perlin noise generation based on emotion or vocal features. Alternatively or provisionally, use a gradient or Hooloovoo in the background.

Copied from original issue: Disastergirl/Kaleidoscope#4

Curved tips of spikes on the FFT - Spectrogram Layer

From @Disastergirl on May 26, 2015 11:27

The tips of the spikes have square ends. Ideally these tips would be curved or rounded. Or pointed as was with the Tarsos FFT.

Copied from original issue: Disastergirl/Kaleidoscope#3

Document Applet parameters

f26ea52 introduced exposition of many configurations settings as Applet parameters. To date they're not documented in either README.md or the return value of Applet#getParameterInfo().

Add descriptions

Add short descriptions in the form of a README file to each library and sketch.

Automatic, threshold-based speech interval detection

Automatic speech detection is needed, or an alternative if it conflicts with something else. It is too difficult to record In and Out (or just In now since the 8sec cut-off is implemented).
Pressing record In key once to start a session would be sufficient, then every 8 seconds could it loop-record? Until a final Out key when the user is finished.

Images for neutral emotions are too colourful

Emotionally neutral phrases are mapped to grey colours and associated images are supposed to be grey and of low colour saturation. At the moment they turn out too colourful.

One remedy would be to raise the influence of the (grey) colours on the image search but this reduces the result set size (sometimes drastically) which is undesirable since we need to rely on key words alone instead of emotions.

An alternative solution is to desaturate these images at some point between retrieval and display.

Audio analysis is incompatible with speech transcription

Due to limitations of the libraries (TarsosDSP and Speech-to-Text) performing these two tasks, they currently cannot be performed simultaneously.

A simple solution to this, is the implementation of the speech transcription (service) as an AudioSource instance. Because of STT's architecture, this will very likely require a rewrite, though I expect to retain the large majority of its crucial code fragments. Unfortunately the author has not released the source code of the current version of STT. If he doesn't react positively to our request, a rewrite, even though more tedious, based on earlier source code and decompiled Java byte code should be possible.

Display Images in Intervals with Transitions

From @Disastergirl on May 26, 2015 11:41

Each image should be displayed in different intervals as to create continuity of the animation. The ideally will have a cross fade/ multiply transition

Copied from original issue: Disastergirl/Kaleidoscope#6

Show emotion noun and it's weight in console

As in the previous applications, show the emotion and it's weighting in the console.

Speech transcription with Sphinx takes a long time

The transcription duration could be much better (at least on my machine), but it's about on par with Google's transcription service considering network latency. This may be tweaked with smaller training sets and specialised configuration parameters at the cost of some quality, which seems to be alright at the moment. It looks like a lot of trial and error will be necessary for that.

Resources:

The Incomplete Guide to Sphinx-3 Performance Tuning (outdated)
Tuning the performance (not yet written)
Performance Optimization for Sphinx 4 (not beginner-friendly, but seemingly useful nonetheless)
http://sourceforge.net/p/cmusphinx/discussion/search/?q=tune+performance
- How do you enhance performance by decreasing the running time? (looks quite promising)

Application parameters are suddenly case-sensitive

Recording not enabled when on Full Screen mode

Cannot commence kaleidOk recording when running the sketch on full screen mode on internal or external displays. Also to exit the full screen I must quit application. There used to be a shortcut for getting out of full-screen.

Max Keyword Limitation

From @Disastergirl on May 26, 2015 11:17

When increasing the number of keywords, use any of these words or all of these words to increase potential context specific image results.

Copied from original issue: Disastergirl/Kaleidoscope#1

Build a graphical user interface for layer properties

Pan background images to cover the whole canvas area

See Pan & Scan.

Weird key collisions in on-disk HTTP cache

The hash function produces far more collisions than it should. It's probably an implementation bug in KeyHasher.

Screenshot export

It would be nice to have the option to make “posters“ of the sketch overlaid with the supplied text message.

Possible options to distribute the resulting poster:

save as a file and handle the distribution manually (copy to medium or one of the options below),
print,
send as e-mail attachment,
upload to a web site.

Give meaning to Foobar Layer

From @Disastergirl on May 26, 2015 12:47

Options:

Perlin Noise here could be replaced by environmental noise- "Rustle Noise", including random pulses characterised by the rustle time (the mean interval between pulses), with little or no pitch.
Using the clock-face tonality scale, the movement of the triangle strip would transform
Harmonics?
more coming…

Copied from original issue: Disastergirl/Kaleidoscope#7

Image size

Retrieved images are thumbnails, too small for this application. Images should be at their full size, and resized to cover screen.

Can we make use of the Flickr API as alternative to using Chromatik?

Would it be more efficient?
Would this open up more opportunities for KaleidOK presentations?
We could simplify the colour palette by hand or automatically to work with the limited colour set in flickr.
if we use the current automatic procedure there's no guarantee that it will be representative.
Could we make use of the other features; Orientation, Size, Texture, Patterns and Tags in our search query?

Move user control elements to separate window for unobstructed visual presentation

We've been working on moving the user control elements into a separate window from the presentation in feature/control-frame. The work is unfinished and both the visual and behavioural design could use some improvement. This is mostly a design and to a lesser extent a technical issue.

Log program output and audio records for later analysis

This might facilitate research with data from user tests and even makes them reproducible to some extent.

Thread for every image result

From @Disastergirl on May 26, 2015 11:37

Threading is needed to continue the animation while the requests for data continue to run.
Create a bunch of "worker threads" and have those worker threads go through a common job queue.

be aware of array lengths
bring in additional variables
set a new boolean to avoid errors

Copied from original issue: Disastergirl/Kaleidoscope#5

Regex Extract Noun

From @Disastergirl on May 26, 2015 11:23

Use a sentence chunker or tree parser for extracting noun phrases to enhance contextual result. Either use this to replace Max Keyword or in combination (also as "any of these words"). *This needs to be conceptualised further.

Copied from original issue: Disastergirl/Kaleidoscope#2

Translate Transcription

In which situation which text is supposed to be translated?

A user speaks in their language, this is then transcribed to text and then the text is translated into English. This translated English text is then used for the remaining processes.

How should the application determine, from which language to translate or whether there is translation necessary at all?

If automatic language detection is not possible (as is when using Google translate), then we would have to manually pre-determine which language would be used for the upcoming users. Ideally any languages other than English would be detected and translate to English. Unless we translate the whole lexicon..

Are there any dependencies or interactions with the other speech or text processing steps already performed?

Not sure.

How simple does an explicit language switch need to be during operation? I'll try to suggest some possible user interface designs with different levels of required implementation effort as questions:

Is an application restart for each language switch feasible?

Is an obscure (but documented) key combination and an interaction feedback in the terminal

output sufficient?

If neither is the case, do you see yourself capable of designing and/or implementing a user interface element fit to the task?

For this state an application restart is feasible. I just want to try it out.
Key combination would be great.
I'm not even thinking of a user interface for the task until it has been tried out.

--I'd like to do interface design for all optionable features at a later stage in the project.. next year most likely

Create stand-alone application packages of KaleidOK

Choose a redistribution license

While preparing a distributable package for KaleidOK I looked for a license under which we can distribute it. For this we must find a license that is compatible with all the licenses of the libraries that are distributed together with KaleidOK. At the moment we use runtime libraries with the following licenses^1,2 (emphasis on the most frequently used):

AGPL v3 (only iText for PDF output)
GPL v3 (e. g. TarsosDSP)
GPL v2+ (e. g. Synesketch)
Apache License v2
BSD License (e. g. JOGL 2, the graphics rendering library)
MIT License

If we want to maintain most of KaleidOK's current functionality we're left with two options:

the Affero General Public License, version 3, the most restrictive of the top of the list.

Its additional restriction over the (non-Affero) GPL v3 are unlikely to affect the future use of KaleidOK itself³, but it makes (partial) integration into other works much more difficult because it forces them to be released under a less common and very restrictive license themselves.
Alternatively we can choose GPL v3 if we remove or "decouple" the PDF export (which is not in a finished, usable state anyway).

A possible option for decoupling is to leave the source code for PDF export in the source code repository (which doesn't include iText) but remove it and iText from the stand-alone application package.

I prefer option 2 even though it requires some work and research.

¹ The list is sorted according to the “hierarchy of license compatibility”, i. e. works published under a license appearing lower on the list may be included in or used to derive works published under a license appearing higher on the list. This correlates roughly with the restrictiveness of the license.

² There are some libraries published under LGPL variants (e. g. Processing Core and Minim) which may be bundled and redistributed (as easily separable entities with attribution and source code access) together with works under any or no license.

³ “[Compared to the GPL the AGPL] has one added requirement: if you run a modified program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the modified version running there. The purpose of the GNU Affero GPL is to prevent a problem that affects developers of free programs that are often used on servers.” (source and example)

Intelligent window placement triggers a bug in OS X

It's disabled in 5276c3c as a work-around for now, but this needs more investigation. Revert the commit to exhibit it again.

User record feedback

Please build something that provides feedback to the user about the recorder state, that isn't displayed on the sketch.

Logo and window icon for KaleidOK

We could use a pretty logo and application window icon.

Only 4 layer images are updated following a chromasthetiation even though 5 are downloaded

Audio processing thread freezing and going idle

kaleidOk is freezeng a lot and going Idle very often. This is new behaviour and I think unrelated to my graphics card

See the video:
https://drive.google.com/file/d/0BwbcYwZE23T1SHg1MVR0SFZkY1E/view?usp=sharing

Introduce some fuzz to frequently reappearing Chromatik requests

We have some frequently reoccurring images due to equally frequently reoccurring search requests to Chromatik. From experience we know, that these requests usually result from emotionally completely neutral phrases with grey colours and no keywords.

To prevent the most frequently reoccurring images, we should introduce some fuzz into either the search requests or the selection of result subsets. E. g., we could randomize the value of the start parameter of the search to something different than 0 in these cases.

A preliminary implementation of such a feature is available in b508abd (branch feature/restrict-speech-record-duration) and requires further testing before merging into master.

Full screen display and multi-screen support

Please add support for the following features:

Full screen mode of the application for presentation purposes.
Moving the window to a different screen without (permanent) graphics glitches.
Possibly spanning the sketch over multiple screens.