mudit-1999,Mudit Agarwal,github

albif-active-learning-with-bandit-feedbacks

An algorithm which can learn a mutliclass classifier under bandit feedback setting actively

With the onset of of internet era, there is a boom in online learning. So, for better visual experience, along with the video of the lecture, soft copy of the slides is also being embedded into the video. But most of this slide matching process is done manually which is a laborious task. So to automate the task of slide matching, here is a small contribution from my side.

backend-acad

Rippling Acad

basic_ml

Assignment

blackbox

A basic ticket booking appp

clustering_project

compilers-nimic

Implemented a functional compiler which converts C language to x86

delaytron-efficient-learning-of-multiclass-classifiers-with-delayed-bandit-feedbacks

In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays $\{d_t\}_{t=1}^T$ is unknown to the algorithm. At the $t$-th round, the algorithm observes an example $\mathbf{x}_t$ and predicts a label $\tilde{y}_t$ and receives the bandit feedback $\mathbb{I}[\tilde{y}_t=y_t]$ only $d_t$ rounds later. When $t+d_t>T$, we consider that the feedback for the $t$-th round is missing. We show that the proposed algorithm achieves regret of $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+\left(2+\frac{L^2}{R^2\Vert \W\Vert_F^2}\right)\sum_{t=1}^Td_t\right]}\right)$ when the loss for each missing sample is upper bounded by $L$. In the case when the loss for missing samples is not upper bounded, the regret achieved by Delaytron is $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+2\sum_{t=1}^Td_t+\vert \mathcal{M}\vert T\right]}\right)$ where $\mathcal{M}$ is the set of missing samples in $T$ rounds. These bounds were achieved with a constant step size which requires the knowledge of $T$ and $\sum_{t=1}^Td_t$. For the case when $T$ and $\sum_{t=1}^Td_t$ are unknown, we use a doubling trick for online learning and proposed Adaptive Delaytron. We show that Adaptive Delaytron achieves a regret bound of $\mathcal{O}\left(\sqrt{T+\sum_{t=1}^Td_t}\right)$. We show the effectiveness of our approach by experimenting on various datasets and comparing with state-of-the-art approaches.

dqn_agent

Implement a Deep Q-Network (DQN) on the game of Atari Breakout from the OpenAI Gym

fighter-jet

Try to emulate a fighter jet game in 3-D using OpenGL in C++

gan

Designed and trained a GAN to generate data from the given normal distribution.

geometry

Boost.Geometry - Generic Geometry Library

jetpack-joyride

An attempt to replicate the famous JetPack-JoyRide Game using openGl in cpp

learning-multiclass-classifier-under-noisy-bandit-feedback-code

This algorithm is formulated to addresses the problem of multiclass classification with corrupted or noisy bandit feedback. In this setting, the learner maynot receive true feedback. Instead, it receives feedback that has beenflipped with some non-zero probability. We propose a novel approachto deal with noisy bandit feedback, based on the unbiased estimatortechnique. This algorithm can also efficiently estimate the noise rates, and thus providing an end-to-end framework. The proposed algorithm enjoys mistake bound of the order ofO(√T) in the highnoise case and of the order ofO(T^2/3) in the worst case.

mario-

An attempt to make famous Mario Game

memory-game

memory game

mining-frequent-itemset

quiz-portal

A fully functional quiz portal

random_points_generation

An algorithm to generate random point inside all types of polygon

reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

reward-corruption

Reinforcement Learning with a corrupted reward channel

rl_solution_for_real_world_problem

Application of Policy and Value Iteration in real world

sql-engine-

A mini sql-engine which will run a subset of SQL queries using command line interface

subway-surfer

Attempt to writer shader or lightning module in Webgl

time-series-and-long-term-dependensncies

Predicting Times Series and leraning temporal dependencies using vanilla as well as stacked RNN and LSTM models.

tmbsh

A basic C-shell

xtreme-tictactoe-bot

AI bot for Xtremem TicTacToe (a slight variant of ultimate tictactoe with 2 big boards) using alpha beta pruning, winning heuristic and quiescence search

mudit-1999 Goto Github PK

Mudit Agarwal's Projects

Recommend Projects

Recommend Topics

Recommend Org