Coder Social home page Coder Social logo

omesa's Introduction

๐Ÿ‘‹ Hi

Iโ€™m an Assistant Professor at the Department of Cognitive Science & Artificial Intelligence at Tilburg University.

โš—๏ธ Research

I'm interested in the effect of intelligent systems on our lives. Systems that uncover our personal information, monitor and change our behavior, subtly restrict our exposure to information, and treat us unfairly. My dissertation focused on the dual-use of computational stylometry; a field that aims to infer information from writing for good, proving harmfully invasive at the same time. I develop(ed) open-source tools to better understand, and defend against such techniques invading one's privacy.

๐Ÿ“ซ Contact

See info in the dooblidoo โฌ…๏ธ.

omesa's People

Contributors

cmry avatar fkunneman avatar verhoevenben avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

omesa's Issues

JSON Serialization

Store models in .json rather than .pickle. This is a more transparent format, less prone to be corrupted, and it will allow for a pure JSON backend such as blitzdb.

Class labels not retrievable in Vectorizer (for two-class problem)

I wanted to retrieve the feature names and class labels of a Vectorizer object in order to do some feature analysis. After searching a while, I found that the feature names are availabel in the 'hasher' attribute. However, I was unable to retrieve the class labels. I think they're supposed to be in the 'encoder' attribute, but this seems to be by-passed for binary (but also two-class) problems. Yet, in a two-class problem (e.g. male vs. female) you still need to know which of these is 0 and which is 1.

Classifier Stack + Best Pick through Grid

Currently, the only way to select a classifier in Experiment is through:

Experiment({
    ...
    "classifier": LinearSVC(),
    "params": {"C": np.logspace(...)},
    ...
})

This should be more generic, and allow the ability to add multiple classifiers in the grid search mix, where from the best will be chosen. Something like:

    ...
    "classifiers": {
        LinearSVC(): {"C": ...}
    }
    ...

Major Overhaul

Some big things that are in the works:

  • Remove Environment.
  • Add Experiment that is controlled with a dict.
  • Generalize Dataloader.
  • Make generic CSV reader.
  • Write model wrapper for storage.

visualizations and storage for experiments

This is a big one, which requires the set-up of a database back-end (done already) and a (web) front-end:

  • Dump serialized config files to JSON database.
  • Represent the performance in a way that general metrics can be shown.
    • project
    • name
    • training_set (i.a.)
    • testing_set (i.a) -- this and above probably need to be abstracted from loaders
    • string representation of the used features
    • string NAME of the classifier used
    • POS / NEG f1-scores (could be put in a graph) micro f-1
  • Able to overview and compare experiments visually.
    • Flat Performance bar.
    • Plotting performance on data proportions.
    • Summary of experiment configurations.
    • Confusion matrices.
    • Aggregate performances in one report.
    • t-SNE?
  • Insight into feature importances.
    • LIME evaluation.
    • Coof. representations.

Breaks through refactor

See differences between old and current. The classify function was broken through old attribute names being used.

cross-platform operations

especially in the db and app interactions there are still many directory pointers that will only be compatible with unix systems, these can be changed via os, relevant links here and here.

full documentation update

Lot of changes have been made since porting the framework to GitHub. All components have to be rechecked for documentation validity, as this describes the old classes in many cases.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.