scarfi / orie-4741-course-project Goto Github PK

0.0 2.0 4.0 1.2 MB

GCC Machine Description 1.37% TeX 14.65% Jupyter Notebook 83.98%

orie-4741-course-project's Introduction

ORIE-4741-Course-Project

Exploring Novel Applications and Modifications to the "Information Sieve"

Kendrick Cancio (kdc57) Skylar Carfi (swc74)

In this project, we propose to explore novel applications and possibly improvements to the "Information Sieve" first described by Greg Ver Steeg and Adam Galsyan. The Information Sieve is a method for unsupervised learning that passes data through a series of progressively fine "sieves" where each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. Current applications mentioned in the paper include its use in discrete Independent Component Analysis, lossy and lossless compression, and signal processing.

orie-4741-course-project's People

Contributors

Watchers

Forkers

23ken23

orie-4741-course-project's Issues

Proposal Peer Review: Trevor McDonald

3 Things I Liked

The use of a new algorithm that does not have a lot of testing yet is exciting!
The lack of dependencies on the actual data is intriguing as well, seems like you could throw pretty weird data at this approach.
Compression implementations are a good practical way to measure learning algorithms.

3 Areas for Improvement

You say the Information Sieve does makes no guarantee about an optimal solution. How do you plan to account for and measure this, and how much do you expect this to impact your results?
Are there specific areas or 'types' of data this algorithm will tend to learn better?
You mention this algorithm will have faster run times than comparable techniques, will you be comparing them and seeing how far the information sieve can go in terms of scale?

Midterm report peer review by Pihu Yadav

I enjoyed reading your project about the Information Sieve. The introduction was very well written and it was great that you explained the mathematics of the algorithm in such detail. It is also nice that you are going to be talking to the original creator of the algorithm.
Though I think you should find a dataset and actually implement the algorithm to get some idea of the results. Right now your project seems very theoretical and it does not look like there is any preliminary data analysis yet. I think you were supposed to cover details about the data set, dealing with missing data etc. in this report, so I hope you are able to find a dataset soon.

Also I didn't get a sense of the exact aim of the project, since you will be using code already available on the creators' github are you planning on just running that on different data sets to understand it better? Or do you actually intend to improve the algorithm?

Overall, I think it is great that you are doing an algorithm development project, it looks pretty interesting. Good luck!

Final Review

I think that this is a pretty ambitious project, and was very interested to read about the comparison between Information Sieves and General Low Rank Models. Some of the things that really caught my attention was using the MNIST database to compress down the images, and then using both General Low Rank Models and Information sieves to reconstruct the data. I believe that this kind of experimental analysis is definitely something to look into when reviewing a new model, to see how it compares versus in theory.

Other interesting things is the MADELON dataset, which is artificially created to be confounding. This is the first time I've read about such a dataset and it seems really helpful to test new algorithms on, knowing that it is intentionally designed to be confounding. I think some things to improve on maybe to add exactly how the information sieve works, (linking that part from the midterm report) and giving reasons why it works better or worse than GLRMs at certain tasks.

Proposal Peer Review: Zicheng Men

3 Things that I liked:

You have a clear introduction to what you would like to study and the paper you want to refer to.
You have mentioned several methods that have already been used to study this topic, which is good, because you can develop your own way.
This is an interesting topic because it sounds that there are a lot of applications of it.

3 Things to improve:

It would be great if you can find more papers about this topic, because the idea of a single paper might be biased.
I think you should be more clear about how you want to study this topic rather than just mentioning the names of the methods.
Rather than referring to current researches, it would be better if you can include your own ideas.

Midterm report peer review

I learned a lot of new concept in the report. Although I did not completely understand the algorithm you proposed, I believed it could be a very exciting project. In the report, you briefly introduce some concept related to the project and compare the algorithm with ICA. However, it seems like there is no specific examples of how you are going to apply the algorithm on some datasets. Maybe, in your final report, you will have the opportunity to include some examples and how you optimize the algorithm.

Proposal Peer Review: Qin Lu

The Information Sieve is an exciting tool, which tries to capture the remainder information. It's a little similar to Neutron Network, but with more explicit inner layer parameters. A great things about information Sieve is that it has less limitation compared with PCA and ICA. A great topic to explore!

Things might need a little more concerns are: what's the data set and expected results? How are you going to evaluate the model? And improving the current techniques may take a large amount of time. Hope you can finish the project in time.

Midterm Review

I like how you focused more on improving an algorithm instead of applying known algorithms to a new dataset. I learning a lot from reading your clear explanations on the conceptual background of this algorithm, and I like how you are thinking about the computational complexity.
A number of issues that are raised in the proposal are not addressed: choosing a dataset, no guarantee of optimization, and lack of a clear benchmark in evaluating the model.

Final Peer Review

The project involves a innovative attempt to construct Information Sieve. Their goal is to construct the Information Sieve, and to compare the results between Information Sieve and Generalized Low Rank Model on three benchmark applications: Lossy Compression, Inputting, and Classification. Although the concept is unfamiliar, the report is pretty clear to follow. And their exploration in applying two models to the MADELON data sets seems interesting.
To improve their project, I think they can explain more about why one model outperformed the other from the theoretical aspect, to theoretically explain why certain phenomenon happened. And also they can research about which kind of dataset these two models performs well respectively.
Good job!
Shan He sh2375

Proposal Peer Review: Akash Nadan

3 Things that I liked:

I liked how detailed you got in your introduction about the current situation regarding your topic.
I liked that you are using a current research paper in order to develop your project upon.
Your overall idea on trying to find a better way to choose features from a data set is a very exciting topic and has a lot of larger applications.

3 Areas of improvement:

I would make it more clear what exactly you plan on predicting as an outcome of this project.
I think you should more explicitly state what data set you plan on using for this project and how it will help you achieve your goal.
I think you should state how exactly you plan on improving upon the techniques used in the research paper

Final Review_nd367

Interesting project. You guys are really cool and smart guys, since the topics Information Sieve and GLRM are both new to me. (Even I learned a little from Prof. Udell) . It seems that you did a lot of job on mathematical background. I will suggest make more explicit reference in your final report, since I have to keep my Chrome running when reading most techniques in you paper. I know you guys read some other academic papers, but I don't , so when you handle the data, it will be great to provide more details. Thus you paper will be more solid.

You mentioned of limitation of computing power, have you tried GPU or parallel computing. And when talking about GLRM, could you be more specific, GLRM is a quite a large system? For the sieve algorithm it will be more helpful to add some explanation, not just describe the algorithm. I can't provide more feedback because my knowledge is also limited. :( Maybe you project is really too hard.

Proposal Peer Review: Anne Ng

3 things I liked:

Clear explanations of the different algorithms and existing problems with them.
New topic that hasn't been explored by other classmates
In-depth research on the algorithms.

3 areas for improvement:

What kind of existing dataset will be benefited from this exploration? What is the potential result? (hoping to increase accuracy by how much?)
How is the testing going to be carried out?
I suggest that you can explain more on the feasibility and procedures.

Final peer review

A ambitious plan and achieved their goal partially. I am not totally familiar with the graphic unsupervised methods, so it's not for me to say. But I can see the team put a lot effort in it.

Also, It would be better if they produce more quantitative performance metrics and add more explanation to it so that the readers can better understand their report. Right now, they included many generated graphics and comparison from the original graphs, however, for untrained eyes, they can't really tell the difference, thus they don't know how to measure the performance of the model.

scarfi / orie-4741-course-project Goto Github PK

orie-4741-course-project's Introduction

ORIE-4741-Course-Project

orie-4741-course-project's People

Contributors

Watchers

Forkers

orie-4741-course-project's Issues

3 Things that I liked:

3 Areas of improvement:

Recommend Projects

Recommend Topics

Recommend Org