Coder Social home page Coder Social logo

csc412-comscicom's Introduction

CSC412 Computer Science Communication

Student Name: Aparna Gopalakrishnan

Student Number: 1004692941

I am comfortable sharing my submissions:

  • Peers on Forum
  • Anonymously via Course Twitter: @ProbablyLearn
  • My Personal Twitter: @aparna_gee (which @ProbablyLearn can retweet!)

This submission will consist of memes with short explanations for each.

  1. draft_meme5

Variable Elimination (VE) is an exact inference algorithm on graphical models. It aims to compute joint distribution of random variables in the graphical model by summing out:

where are observed variables, is the query variable, and are the remaining variables (neither observed nor queried). The complexity of VE relies on the ordering chosen while summing, which is exponential in worst case. Minimum degree ordering is one such 'good' ordering: this ordering aims to reduce storage and computation requirements by reducing the number of non-zero factors in the decomposition of Hermitian, positive-definite matrices into products of a lower triangular matric and its conjugate transpose. While there are heuristics to find a good ordering (like minimum degree ordering), finding the optimal ordering is still an NP-hard problem.

  1. draft6_meme

Generative Adversarial Networks (GANs) is a generative modelling approach which aims to model the distribution of the data itself. GANs are composed of a generator network and a discriminator network. The generator takes a random noise vector as input and aims to generate a sample in the input domain through learning the latent parameters that define the data distribution. The discriminator takes an sample point - either a 'real' point from the input domain or one generated by the generator - and tries to differentiate between them, i.e. predicts a binary class label (real/fake). These networks are trained together: the generator generates samples which are given to the discriminator along with real examples. The networks learn by how well they can 'fool' each other - discriminator is trained to get better at differentiating between real/fake samples and generators are updated on how successful they are at fooling the discriminator.

Mode collapse refers to when the generator produces outputs with little to no diversity which is good at fooling the discriminator. One of the loss measures used in training is (reverse) KL-divergence: for random variable ,

which encourages the learned distribution to model a mode of the target distribution resulting in low-diversity sampling, i.e. mode collapse. (Theis et. al 2016).

  1. mcmc

We are already familiar with intractable integrals/sums when dealing with sampling from probability distributions or in optimization. Monte Carlo Markov Chain (MCMC) methods like Metropolis-Hasting sampling create chains of samples from a continuous random variable whose pdf is proportional to a tractable integral and estimate the intractable integral as an expectation using these samples. Using the law of large numbers, if more steps are included in the chain, the closer is the estimate.

MCMC methods were created to handle sampling from higher-dimensional intractable distributions, but suffer from the curse of high dimensionality - the exponential increase in volume due to increase in dimension results in a concentration of the majority of the mass of the posterior probability distribution away from its mode in its typical set. This occurs since the increase in volume dominates the density. MCMC methods try to overcome this difficulty by exploiting the structure of the posterior distribution to force sampling from a small subset of the overall distribution. This could lead to the implicit assumption that the samples are generated from a single mode, i.e. that the distribution does not have multiple modes.

  1. meme8

Parallel WaveNet is a state-of-the-art realistic speech synthesis model which improved the efficiency of original WaveNet model (by over 1000 times!): the original model is a convolutional autoregressive (causal/masked model depending on previous values/entries) network modelling joint distribution of high-dimensional data as a product of conditional distributions:

where are the parameters of the model receiving as input and outputs a distribution over possible .

This structure allows for efficient parallel training since it can process its input in parallel, but leads to slow and sequential sample generation since sample is required for producing .

Parallel WaveNet aims to achieve efficiency in both sampling and training by using Inverse-autoregressive Flows (IAFs). IAFs are a type of normalizing flow which model a multivariate distribution as an non-linear transformation of a simple tractable distribution since the resulting random variable has log probability:

The chosen is invertible with the determinant of its Jacobian () being easy to compute (like a triangular matrix so the determinant is simply the product of its diagonal entries). Parallel WaveNet is able to achieve efficiency in both sampling and training using probability density distillation:

A fully-trained WaveNet model is used to teach a smaller and parallelized “student” network. In training, the student network is given random noise as input to which the following transformation is applied:

producing outputs: sample , where and have the convolutional autoregressive network structure of the original WaveNet.

The student network aims to match the teacher's performance (as opposed to the zero-sum adversarial game in GANs). The aim is to minimize

where is the cross-entropy between the student and teacher , and is the entropy of the student distribution which prevents the student from collapsing to the teacher's mode. (Oord et. al)

  1. extra_meme5

Hidden Markov Models (HMMs) is a (directed) graphical model describing a Markov process: a single hidden or unobservable discrete random variable and discrete observed whose behaviour depends on . denote the value of random variables at state . Additionally, the probability distribution of only depends on , i.e. only depends on previous state. HMMs are repsresented by transition probabilities , the observation probabilities , and the initial state distribution .

A Dynamic Bayesian Network is a graphical model representing conditional independencies between a set of sequential/time series random variables using a directed acyclic graph. HMMs are a special case of Dynamic Bayesian Networks (DBNs) since they represent a restrictive type of system. For example, DBNs allow more than 1 hidden variable and can represent continuous random variables as well, not just discrete random variables. DBNs allow us to extend Markov models with higher order connections: for example, connections from (i.e. dependency between) to . (Ghahramani 1997)

  1. kl-div-cat

KL-divergence, a measure of the difference between two probability distributions is defined as

It is always non-negative i.e. , but does not satisfy symmetricity and triangle inequality properties of a metric, i.e.

However, KL-divergence can be modified to satisfy the symmetricity condition as follows:

as expected information gain about from discovering which probability distribution is drawn from, or , if both have probabilities 0.5. This gives the Jensen-Shannon Divergence which is symmetric.

Extra stray memes: Here are some extra memes that I made just for fun (also because I wanted to use more It's Always Sunny meme templates but most of them are too inappropriate):

mode_collapse

extra_meme6

draft_meme4

extra2

csc412-comscicom's People

Contributors

aparnagopalakrishnan7 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.