ivanliu1989,Tianxiang(Ivan) Liu,github

聚类算法。实现Kmeans，DBSCAN以及谱聚类

coupon-purchase-prediction

Using past purchase and browsing behavior, this competition asks you to predict which coupons a customer will buy in a given period of time. The resulting models will be used to improve Ponpare's recommendation system, so they can make sure their customers don't miss out on their next favorite thing.

credit_api

criteo_logistic_regression

Logistic regression model built by SAS

cxxnet

fast, concise, distributed deep learning framework

d3

A JavaScript visualization library for HTML and SVG.

d3js_practice

A d3.js repository contains practical examples from start level to advanced level

data-analysis-and-machine-learning-projects

Repository of teaching materials, code, and data for my data analysis and machine learning projects.

data-science-data-products

A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. This course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course will focus on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience.

data-science-london

Data Science London is hosting a meetup on Scikit-learn. This competition is a practice ground for trying, sharing, and creating examples of sklearn's classification abilities (if this turns in to something useful, we can follow it up with regression, or more complex classification problems).

data-science-practical-machine-learning

One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.

data-science-regression-models

Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing.

data-science-statistical-inference

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data.

data.table

R package data.table extends data.frame. Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by reference by group using no copies at all, cells can contain vectors, chained queries and a fast file reader (fread). However, the main benefit is its natural syntax: DT[where, select|update, by].