Coder Social home page Coder Social logo

index's Introduction

Index – Analytics & Data Science Resources

Business/Enterprise Analytics

Analytics as a Strategy, Horizontal & Vertical Analytics, Data Science, Performance Management, Decision Making, Analytics Process, Tools

Bais in Algorithms - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/computer-scientists-find-bias-in-algorithms/a-qM2nZiSHTRWPqsqHwkO9kg%3Aa%3A285596943-34d0bed3ae%2Fieee.org

http://data-informed.com/a-tale-of-two-disciplines-data-scientist-and-business-analyst/

Challenges: http://www.kdnuggets.com/2016/08/data-science-challenges.html

Telling stories - http://mediashift.org/2015/08/when-telling-data-driven-stories-let-readers-ask-questions-too/

BI Maturaity level - https://medium.com/the-data-of-things/machine-learning-and-cognitive-systems-part-2-big-data-analytics-d3ce7023325b#.73q66f6jr Storytelling- http://www.dataversity.net/best-practices-in-data-storytelling/

Domain experts - Why We Need More Domain Experts In The Data Sciences http://www.forbes.com/sites/kalevleetaru/2016/06/12/why-we-need-more-domain-experts-in-the-data-sciences/#31656a8f3374

Analytics Cautions - http://www.informationweek.com/software/productivity-collaboration-apps/13-ways-machine-learning-can-steer-you-wrong/d/d-id/1326646?_mc=RSS%5FIWK%5FEDT&image_number=2

Analytics Cautions - Biased training data is a huge (ahem, yuge) problem for machine-learning. Cities that use data from racist frisking practices to determine who the police should stop end up producing algorithmic racism; court systems that use racist sentencing records to train a model that makes sentencing recommendations get algorithmic racism, too.

Algorithms – accountable - https://www.propublica.org/article/making-algorithms-accountable

Recency Bais - http://www.bbc.com/future/story/20160605-the-trouble-with-big-data-its-called-the-recency-bias

Analytics programs - https://hbr.org/2016/07/how-ceos-can-keep-their-analytics-programs-from-being-a-waste-of-time

Value - http://www.pcmag.com/article/345858/predictive-analytics-big-data-and-how-to-make-them-work-fo

Perils and Promises - http://www.forbes.com/sites/bernardmarr/2016/04/04/the-future-perils-and-promises-of-big-data-cloudera-co-founder-mike-olson-shares-his-views/#34bb7863b267

Disocover hidden bias - http://www.theverge.com/2016/5/25/11773108/research-method-measure-algorithm-bias

Data Storytelling - http://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs/#6059bf1f0c8a

Eitgth ways failing at data science - http://www.informationweek.com/big-data/big-data-analytics/8-ways-youre-failing-at-data-science/d/d-id/1323312

Data science questions - http://www.kdnuggets.com/2016/02/21-data-science-interview-questions-answers.html 100 interview questins for data science - https://www.dezyre.com/article/100-data-science-interview-questions-and-answers-general-for-2016/184

http://www.data-mania.com/blog/a-5-step-checklist-for-multiple-linear-regression/

About Data, Data Measurement Concepts, Big Data, Spreadsheet Analytics

http://bigthink.com/errors-we-live-by/is-all-the-truth-we-need-in-the-data

Power of small data vs big data: http://www.huffingtonpost.com/hollie-russon-gilman/the-power-of-small-data-b_b_8512954.html

“garbage in produces garbage out.”

Mark Twain popularized the saying, “There are three kinds of lies: lies, damned lies, and statistics.” It is true that data are frequently used selectively to give arguments a false sense of support. Knowingly misusing data or concealing important information about the way data and data summaries have been obtained is, of course, highly unethical.

For example, Google Flu Trends debuted to great excitement in 2008, but turned out to overestimate the prevalence of influenza by nearly 50%, largely due to bias caused by the way the data were collected; see Harford [8], for example. http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html

data security – fine/coarse grain, Anonymization

data prep - http://www.dbta.com/Editorial/News-Flashes/Trillium-Integrates-Data-Prep-and-Data-Quality-for-Big-Data-Analytics-109134.aspx

Storage and Retrieval, Data Preparation, Structured Query language (SQL)

R for Everyone: Advanced Analytics and Graphics by Jared P. Lander (2013)

Data handling with python - http://code.tutsplus.com/courses/data-handling-with-python Data Science in Python - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/15-python-libraries-for-data-science/a-paXAXvxxSkSIWwFsQNHNgw%3Aa%3A285596943-1ad98051c2%2Fbusiness2community.com R - http://tutorials.iq.harvard.edu/R/Rintro/Rintro.html R & Python for datascience Excel & Big Data - http://www.winbeta.org/news/microsoft-defines-big-data-outlines-excels-role-managing Feather - https://www.r-bloggers.com/feather-fast-interoperable-data-importexport-for-r/ WRangel - http://www.inc.com/bill-carmody/former-facebook-data-scientist-shares-how-to-wrangle-your-data.html Pandas/Json - https://www.dataquest.io/blog/python-json-tutorial/

Visualization and Exploration

http://www.datasciencecentral.com/profiles/blogs/10-features-all-dashboards-should-have Power BI/Excel/mapping - http://geoawesomeness.com/geoawesomehowto-how-to-make-a-killer-map-using-excel-in-under-5-minutes-with-powermap-plugin/

• Now You See It: Simple Visualization Techniques for Quantitative Analysis (2009) • Show Me the Numbers: Designing Tables and Graphs to Enlighten (2012) • Information Dashboard Design: Displaying Data for At-a-Glance Monitoring (2013) Signal: Understanding What Matters in a World of Noise by Stephen Few (2015)

View and summarize data in R- https://msdn.microsoft.com/en-us/library/mt629161.aspx

Public records - http://mentalfloss.com/article/78120/dive-public-records-data-visualization-website

Zoomdata - http://fortune.com/2016/03/14/zoomdata-adds-amazon-data-sources-to-its-menu/

http://dataremixed.com/2016/07/a-first-look-at-google-data-studio/ Power BI - http://www.zdnet.com/article/power-bi-features-released-microsoft-data-science-summit-announced/

http://www.jenunderwood.com/2016/07/05/popular-d3-js-4-0-release/

http://thenextweb.com/us/2016/04/04/mits-new-visualization-tool-is-a-goldmine-for-data-nerds/

PowerBI – Sundance http://www.fastcodesign.com/3058185/microsofts-new-data-viz-tool-puts-excel-charts-to-shame

Descriptive Statistics

Statistical Methods Should Enable Data to Answer Scientific Questions

This shift in perspective from statistical technique to scientific question may change the way one approaches data collection and analysis.

After learning about the questions, statistical experts discuss with their scientific collaborators the ways that data might answer these questions and, thus, what kinds of studies might be most useful.

http://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1004961#sec002

Probability Distributions and Data Modeling

Sampling and Estimation

Statistical Inference, A/B Testing

“Treat statistics as a science, not a recipe.” This is a great candidate for Rule 0.

quote from biostatistician Andrew Vickers in [21]: Baker M (2016) Statisticians issue warning over misuse of P values. Nature 531, (151) doi: 10.1038/nature.2016.19503.

http://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1004961#pcbi.1004961.ref021

Computer age statistical Inference - http://web.stanford.edu/%7Ehastie/CASI/index.html (Trevor Hastie)

7 step guide to A/B testing - http://conversionxl.com/lies-analytics-tool/

Trend Analysis, Regression

"Typically, a subject matter expert and a data scientist construct regression models to look at the relationships between independent variables that drive future events. They have a hypothesis that guides the analysis [so] they come up with relatively predictable outcomes. Machine learning picks up unexpected relationships," said Larry Schor, SVP at population health management solution provider Medecision, in an interview. http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=3

Google trends - https://medium.com/google-news-lab/what-is-google-trends-data-and-what-does-it-mean-b48f07342ee8#.hcqv8wi4c

Forecasting Techniques and Tools: Excel, R, SAP

Introduction to Data Mining

http://www.datasciencecentral.com/profiles/blogs/22-tips-for-better-data-science

Field guide to data science: http://www.slideshare.net/BoozAllen/booz-allen-field-guide-to-data-science learning systems - http://www.kdnuggets.com/2015/09/questions-data-science-can-answer.html

20 lessons – Ml - http://www.kdnuggets.com/2015/12/xamat-20-lessons-building-machine-learning-systems.html which model to use - /regression/ http://www.datavizualization.com/blog/10-types-of-regressions-which-one-to-use 7 steps to mastering machine learning in python - http://www.kdnuggets.com/2015/11/seven-steps-machine-learning-python.html/2 ML Algorithms - http://www.mo-data.com/a-tour-of-machine-learning-algorithms-bigdata-machinelearning/ Model selection - http://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html Machine Learning 101 - http://marketingland.com/how-machine-learning-works-150366

Methods - https://gab41.lab41.org/the-10-algorithms-machine-learning-engineers-need-to-know-f4bb63f5b2fa#.lvo50engr

Learning Curve – Stanford, Ang Model selection - http://www.bigdataanalyticsguide.com/2016/07/27/selecting-right-machine-learning-algorithm-predictive-analytics-needs-classification-vs-regression-vs-clustering/

Limitations - https://icrunchdata.com/4-current-limitations-artificial-intelligence-machine-learning/

ML Framework - http://www.kdnuggets.com/2016/04/top-15-frameworks-machine-learning-experts.html

Next in ML - http://www.huffingtonpost.com/quora/whats-next-in-machine-lea_b_9499600.html

Machine Learning http://www.forbes.com/sites/louiscolumbus/2016/06/04/machine-learning-is-redefining-the-enterprise-in-2016/#42c705b55fc0

Unsupervised Learning

Intro to machine learning - http://www.r2d3.us/visual-intro-to-machine-learning-part-1/?_lrsc=67983a20-0202-4272-a95c-7ca30faa5ba6&cmp=em-data-na-na-newsltr_ai_20160808&imm_mid=0e684b&trk=elevate_tw

https://re-work.co/blog/deep-learning-roland-memisevic-unlabelled-datasets-rethinking-unsupervised-learning

Upsupervised learning/future - https://www.eiuperspectives.economist.com/technology-innovation/%E2%80%9Cunsupervised-learning%E2%80%9D-and-future-analytics

Spreadsheet Modeling and Analysis

Supervised Learning (Trees & Neural Networks)

https://flipboard.com/@aj1ujm1/data-science-v706t99nz/machine-learning-for-large-scale-sem-accounts/a-wiBfPVD9Taejd8gWbCfnug%3Aa%3A285596943-48b2820599%2Fsearchengineland.com Soft decision trees - http://www.cs.cornell.edu/%7Eoirsoy/softtree.html

Decision trees vs. Nueral Networks http://programmers.stackexchange.com/questions/157324/decision-trees-vs-neural-networks

Monte Carlo Simulation and Risk Analysis

Linear and Integer Optimization

ADM Solvers over spark - https://yahooresearch.tumblr.com/post/147013834176/open-sourcing-sparkadmm-a-massively-parallel

Decision Analysis, Business Process Analysis

Customer Intelligence, Analytics Deployment Methods, AI, Analytical Trends

Deployment/Azure - https://azure.microsoft.com/en-us/documentation/articles/machine-learning-model-progression-experiment-to-web-service/

Deployment – Python & Docker - https://medium.com/@mattvonrohr/from-dev-to-ops-building-a-text-classifier-using-python-and-docker-part-1-docker-6de5d27a0a20#.ul15kaq7i

Designing great data products - https://www.oreilly.com/ideas/drivetrain-approach-data-products

Deloying python aps in VS - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/why-write-python-in-visual-studio%3F/a-F-O9Zq0pR-OabRFbLekxng%3Aa%3A285596943-26179a62cc%2Fmsdn.com

Serverless microservice - http://blog.algorithmia.com/cloud-hosted-deep-learning-models/

Ai in the enterprise - https://techcrunch.com/2016/05/12/clarifying-the-uses-of-artificial-intelligence-in-the-enterprise/

Ai vs ML vs DL - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/a-comparative-roundup%3A-artificial-intelligence-vs.-machine-learning-vs.-deep-lea/a-MoDuTkJtQ9q2lT1Pyi8tZQ%3Aa%3A285596943-6cd6ecaaed%2Fdataversity.net

Ai - http://www.popsci.com/listen-to-donald-trump-sing-about-obama-through-magic-ai

Ai – inventory checker http://business.financialpost.com/entrepreneur/fp-startups/artificial-intelligence-is-helping-silicon-valley-reinvent-itself?__lsa=c403-74b6

Social Machines - http://www.slideshare.net/jahendler/social-machines-the-coming-collision-of-artificial-intelligence-social-networking-and-humanity

AI/ML - https://techcrunch.com/2016/07/06/key-trends-in-machine-learning-and-ai/

AI/Deep Mind - http://www.recode.net/2016/7/19/12231776/google-energy-deepmind-ai-data-centers

Watson Analytics – free edition

Cognitive Analytics – Microsoft Google Tensorflow on iOS and Android https://flipboard.com/@aj1ujm1/data-science-v706t99nz/google-extends-tensorflow-machine-learning-to-ios/a-wjruugpQTNS9EbNii7SF3Q%3Aa%3A285596943-3902127546%2Finfoworld.com

Applications at Scale - https://turi.com/products/create/

Mine Craft AI research tool - http://www.slashgear.com/microsoft-opens-up-minecraft-ai-research-tool-to-the-public-08447530/

https://www.datanami.com/2016/07/07/investments-fast-data-analytics-surge/ AI – supercomputer watching computers http://www.ibtimes.co.uk/machine-learning-markets-when-intelligent-algorithms-start-spoofing-each-other-regulation-becomes-1567986

Smart AI in the home - http://www.popsci.com/jibo-adds-michael-i-jordan-to-advisory-board

IoT - http://iq.intel.com/drone-data-sparks-a-new-industrial-revolution/?cid=sm-FLIPBOARD-Q22016GCONATIVE_Revolution&utm_campaign=sm-Q22016GCONATIVE&utm_medium=social&utm_source=flipboard

Digital Transformation - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/%E2%80%9Cthe-digital-transformation-playbook%3A%E2%80%9D-fast%2C-furious-innovation/a-j2KnVaj6RaeFCFWDEGJoVw%3Aa%3A285596943-f701a9ad23%2Fsmallbiztrends.com

Advanced Data analysis - http://fedscoop.com/advanced-data-analytics

Sentiment Analysis on Trump - http://datascienceplus.com/sentiment-analysis-on-donald-trump-using-r-and-tableau/

Data Science

Research

Current Trends

Artificial Neurons - http://singularityhub.com/2016/08/14/ibms-new-artificial-neurons-a-big-step-toward-brain-like-computers/ Iot: https://www.linkedin.com/pulse/why-internet-things-drive-knowledge-revolution-david-evans

Use-cases

Project Cycle

Presentation

Situation & Business Understanding

Data Acquisition

SQL

http://www.kdnuggets.com/2015/08/beginners-guide-sql.html

Data Mining

Profiling & Summary Statistics

Cleaning

Transformation

Python - http://www.slideshare.net/GaelVaroquaux/scientit-meets-web-dev-how-python-became-the-language-of-data

Exploration & Visualization

Preliminary Exploration, Trend Analysis & Slice and Dicing

http://www.forbes.com/sites/bernardmarr/2016/08/16/how-to-use-analytics-to-identify-trends-in-your-market/#42892afd228b https://blog.taucharts.com/taucharts-data-focused-charting-library/ http://bokeh.pydata.org/en/latest/

Big Data Visualizaion

Methods Selection

http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared

model-parallelism (use same data across different models) Part I: Dask & scikit-learn: Model Parallelism

data-parallelism, which will allow fitting a single model on larger datasets - Part II: Dask & scikit-learn: Data Parallelism

distributed learning and grid search on a real dataset - Part III: Dask & scikit-learn: Putting it All Together

Feature Engineering

deep feature engineering - http://www.kdnuggets.com/2015/10/data-science-machine.html

Pre-processing

A/B Testing

Algorithms

https://algorithmia.com/algorithms/nlp/LDA Distributed Machine Learning Toolkit http://www.winbeta.org/news/microsoft-makes-distributed-machine-learning-toolkit-open-source https://en.m.wikipedia.org/wiki/Viterbi_algorithm#/search

Random numbers – true - http://fortune.com/2016/03/14/zoomdata-adds-amazon-data-sources-to-its-menu/

Artificial Intelligence

Creative AI - http://iq.intel.com/getting-creative-ai-and-machine-learning/ AI vs ML vs DL - http://www.bigdataanalyticsguide.com/2016/07/31/week-machine-learning-july-31-2016/ Locator AI - http://www.fastcodesign.com/3058793/take-a-photo-and-this-crazy-neural-network-can-deduce-where-you-are

Bayesian Analysis

https://thinkinator.com/2016/01/12/r-users-will-now-inevitably-become-bayesians/ Changepoint - https://www.r-bloggers.com/a-simple-intro-to-bayesian-change-point-analysis/

Deep Learning

http://www.forbes.com/sites/anthonykosner/2014/12/29/tech-2015-deep-learning-and-machine-intelligence-will-eat-the-world/?linkId=11530714#68e15e6f282c

Top 5 deep learning papers - http://www.kdnuggets.com/2015/10/top-arxiv-deep-learning-papers-explained.html Deep learning – big deal - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/the-big-deal-with-deep-learning/a-nXOUoIP_TVOILdb-QZ1eDg%3Aa%3A285596943-0c08a0f3c7%2Fdzone.com Deep learning in R http://www.r-tutor.com/deep-learning/introduction Deep learning - http://www.teglor.com/b/deep-learning-libraries-language-cm569/ Top 5 arXiv Deep Learning Papers, Explained

Deep learning for improved predictions - http://www.smartdatacollective.com/richardsmith/425731/leveraging-deep-learning-improved-predictive-analysis Cutting edge Deep learning - http://www.forbes.com/sites/quora/2016/08/05/this-is-the-cutting-edge-of-deep-learning-research/#3415f37e27cb Color photos - https://fstoppers.com/science/deep-learning-algorithm-automatically-colorizes-photos-138500 Amazon Deep learning - http://venturebeat.com/2016/05/11/amazon-open-sources-its-own-deep-learning-software-dsstne/ Statistical View of http://www.kdnuggets.com/2015/11/statistical-view-deep-learning.html Canonical patterns - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/google-extends-tensorflow-machine-learning-to-ios/a-wjruugpQTNS9EbNii7SF3Q%3Aa%3A285596943-3902127546%2Finfoworld.com DNN Framework - http://www.tomshardware.com/news/ceva-cdnn2-tensorflow-embedded-systems,32158.html DNN with Sparknet - http://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html

Ensemble Models

Forecasting/Time-series

Fourier Analysis

##Kalman Filter http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/

Machine Intelligence

HTM (Hierarchical Temporal Memory) - https://jaxenter.com/machine-intelligence-vs-machine-learning-128157.html

Matrix Factorization

https://www.r-bloggers.com/matrix-factorization/

Network Analysis

https://graph-tool.skewed.de/performance http://blogs.scientificamerican.com/sa-visual/visualizing-the-global-network-of-languages/ https://blog.hootsuite.com/tracking-social-media-in-google-analytics/

Inferential Statistics

Markov Models

On-line Learning

Operations Research and Analysis – Cuing

Optimization & Multiple Goals

Operations Research Superhighway Linear Prgramming and Optimal Network Flow Stochastic Optimazaion Non-linear Optimzation Integer Optimization Linear Optimization Solving linear programs in a spreadsheet Faster optimization - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/faster-optimization/a-70qXlyyeSZe4DJsGGpa3cg%3Aa%3A285596943-257a03fc96%2Fmit.edu https://visualstudiomagazine.com/articles/2015/01/01/multi-swarm-optimization.aspx

Probability

Queuing

Recommeder systems

https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/

Reinforcement Learning

Rule-based Learning

Scheduling & Multi-period Planning

Sensitivity Analysis

Simulation and Random Walks

Statistics

statistics in python - http://gael-varoquaux.info/stats_in_python_tutorial/

Stochastic Optimization

Supervised Learning

http://www.kdnuggets.com/2014/03/machine-learning-7-pictures.html tree-based from scratch - https://www.analyticsvidhya.com/blog/2016/04/complete-tutorial-tree-based-modeling-scratch-in-python/ naïve bayes: https://www.analyticsvidhya.com/blog/2015/09/naive-bayes-explained/ python cheat sheet - http://www.dummies.com/programming/python/python-for-data-science-for-dummies-cheat-sheet/ Thresholds: http://www.kdnuggets.com/2015/10/best-blogs-analytics-big-data-science-machine-learning.html Regression - http://www.datavizualization.com/blog/10-types-of-regressions-which-one-to-use Fuzzy Forests - https://mran.microsoft.com/package/fuzzyforest/

Neural Networks

Ordinal Least Squares Regression - http://www.ats.ucla.edu/stat/r/dae/ologit.htm

Tensors

http://www.kdnuggets.com/2016/08/gentlest-introduction-tensorflow-part-2.html

Unsupervised Learning

Model-based Clustering - http://www.sthda.com/english/wiki/model-based-clustering-unsupervised-machine-learning Heirachical Clustering - http://www.sthda.com/english/wiki/hierarchical-clustering-essentials-unsupervised-machine-learning

Computational Fundamentals

Computation – Popular Equations http://www.independent.co.uk/news/business/the-17-equations-that-changed-the-course-of-history-a7190351.html

Techniques and Methods

Application Notes

Mapping – Society Scientists at Stanford University apply machine learning algorithms to satellite data to automatically map out impoverished areas of the world.

Computer Vision - https://techcrunch.com/2016/08/16/intels-joule-platform-lets-makers-build-computer-vision-into-almost-anything/

Face recognition - https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78#.6akvb1k8f https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78#.qo8gg0lng

Image recognition - http://firsttimeprogrammer.blogspot.com/2016/07/image-recognition-in-r-using.html

Insights

Actions

Results

Data science

http://www.emc.com/microsites/data-scientist-interactive-guide/index.htm?cmp=soc-cor-glbl-us-sprinklr-TWITTER-Big+Data-EMCcorp-328230962 http://blog.mypath.io/how-to-become-data-scientist-for-free/ http://www.kdnuggets.com/2016/08/become-type-a-data-scientist.html

Data Engineering

GPU computing - http://www.slideshare.net/continuumio/gpu-computing-with-apache-spark-and-python

Bash

Python

http://snip.ly/nurk#http://www.swaroopch.com/notes/python/

Databases

Sampling

Surveys

Tutorials

http://www.datasciencecentral.com/profiles/blogs/17-short-tutorials-all-data-scientists-should-read-and-practice

Open Data

Scaling

Spark – Scala vs. Pyspark - https://www.dezyre.com/article/scala-vs-python-for-apache-spark/213

Scaling R - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/scalable-data-science-with-r/a-cPIU0ZgPQbW7BqQGVl5SrQ%3Aa%3A285596943-a1f62f4e58%2Foreilly.com

Parallel processing Python - http://homes.cs.washington.edu/%7Ejmschr/lectures/Parallel_Processing_in_Python.html

Real-time

https://www.datanami.com/2016/07/07/investments-fast-data-analytics-surge/

ML Cloud Tools

Azure – R, python http://blog.revolutionanalytics.com/2016/08/ml-studio-mro-python3.html ML - https://msdn.microsoft.com/en-us/library/azure/dn905812.aspx

Systems Engineering

Cases

General-Sort

http://www.business2community.com/big-data/16-case-studies-companies-proving-roi-big-data-01408654#EIxbh7UKywYEPX5B.97 Retail- http://www.businessinsider.com/zara-has-the-best-business-model-2015-12 Cool - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=8 Uses- https://flipboard.com/@aj1ujm1/data-science-v706t99nz/24-uses-of-statistical-modeling-(part-i)/a-piVeaDFrRwaGnjLd2Xc2Sw%3Aa%3A285596943-d74f05a352%2Fdatasciencecentral.com

Vertical

http://www.impactlab.net/2015/03/09/how-machine-learning-will-feed-innovation/

Food - http://www.fastcompany.com/3062262/how-machine-learning-will-change-what-you-eat

Banking – https://flipboard.com/@aj1ujm1/data-science-v706t99nz/how-apache-spark%2C-scala%2C-and-functional-programming-made-hard-problems-easy-at-b/a-i2a0y-aNSa6ojqfY21jdBQ%3Aa%3A285596943-87c72345e5%2Fcloudera.com

IoT/manufacturing - http://www.economist.com/news/leaders/21678786-manufacturers-must-learn-behave-more-tech-firms-machine-learning

Energy, https://medium.com/the-data-of-things/machine-learning-and-cognitive-systems-part-2-big-data-analytics-d3ce7023325b#.73q66f6jr

Guess patients age – http://scienmag.com/artificial-neural-networks-guess-patients-age-with-surprising-accuracy/

Civil/public transportation

Healthcare - http://www.healthcareitnews.com/news/ehr-costs-proving-be-roadblock-big-data-and-predictive-analytics

Civil/roads - https://techcrunch.com/2016/07/13/how-iot-and-machine-learning-can-make-our-roads-safer/

Medical Services - http://www.cnet.com/news/machine-learning-helped-create-new-map-human-brain-tomorrow-daily-396-show-notes/

Military- China develops missiles powered by machine learning and artificial intelligence that can autonomously change directions or targets after being fired.

Automotive - http://www.forbes.com/sites/bernardmarr/2016/07/18/how-the-connected-car-is-forcing-volvo-to-rethink-its-data-strategy/#c96056479c12 Automobiles- UK-based FiveAI receives a $2.7 million round of funding to build new AI for self-driving cars, promising a more autonomous approach less reliant on pre-made maps. Energy – Environment Researchers in Germany examine whether machine learning and big data analytics can be used to create more usable sources of renewable energy for the power grid.

Healthcare-The FDA issues new guidelines on medical innovation, aiming in part to preserve the rigor of medical research while allowing machine learning to play a larger role.

Diabetes - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=3

Smart-cities - http://www.forbes.com/sites/danielnewman/2016/08/15/big-data-and-the-future-of-smart-cities/#2bf987e73f2d

Law - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=4

Litigation - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=12

Insurance - https://channels.theinnovationenterprise.com/articles/data-analytics-in-insurance

Banking/revent money laundering - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=5

Eccomerce Fruad - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=9

Security screening - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=10

Agriculture - http://www.theverge.com/2016/8/4/12369494/descartes-artificial-intelligence-crop-predictions-usda http://splendidtable.org/story/how-to-feed-10000-people-from-food-grown-on-3-acres-in-the-city https://techcrunch.com/2016/07/06/the-land-grab-for-farm-data/

Law enforcement - North Carolina’s Charlotte-Mecklenburg Police Department pilot tests a new machine learning method for identifying officers at risk of initiating adverse events from staff records.

Autesim related gene identification - Scientists at Princeton University use machine learning to speed up the rate at which we can identify genes that correlate with the presence of Autism spectrum disorder.

Identify sales pitches - People.ai applies machine learning methods to traditional sales operations, compiling a “playbook” for more effective and efficient sales pitches for representatives.

Horizontal

Adverstigin - http://www.kdnuggets.com/2015/08/big-data-influencing-data-driven-advertising.html

Marketing - http://www.huffingtonpost.com/hollie-russon-gilman/the-power-of-small-data-b_b_8512954.html

Employee engagement - https://hbr.org/2015/12/ideos-employee-engagement-formula&cm_sp=Article-_-Links-_-End%20of%20Page%20Recirculation

Retail Analytics - http://info.data-informed.com/definitive-guide-to-retail-analytics?__hstc=248701080.8958bc34f3dcd084f2a3ec8284f90216.1471761680184.1471761680184.1471761680184.1&__hssc=248701080.1.1471761680184&__hsfp=1897583675&hsCtaTracking=7ad00c43-5d8d-44cd-9c6c-a6380d9096fe%7Cdebb8186-06ef-4385-9ff8-adff19cdc66f

Sales Analytics/relevant experience - http://www.business.com/sales/sales-analytics-how-companies-can-capitalize-on-data-to-create-more-relevant-experiences/

Online advertisting - https://medium.com/the-data-of-things/machine-learning-and-cognitive-systems-part-2-big-data-analytics-d3ce7023325b#.73q66f6jr General – http://www.telegraph.co.uk/sponsored/business/digital-leaders/future-series/12170238/business-analytics-decision-making.html Winning with Analtyics – MIT https://www.accenture.com/us-en/~/media/Accenture/next-gen/hp-analytics/pdf/Accenture-Linking-Analytics-to-High-Performance-Executive-Summary.pdf Customer insights into growth - http://www.ibmbigdatahub.com/blog/predictive-analytics-sparks-sizable-bottom-line-benefits?CT=ISM0056

Innovation- http://www.informationweek.com/big-data/big-data-analytics/10-ways-predictive-analytics-improves-innovation/d/d-id/1324508?image_number=2

Customer Journey mapping - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/how-machine-learning-improves-customer-journey-mapping/a-GJ9kZt9QTY-Z6iX7l1jShg%3Aa%3A285596943-e5f0988d36%2Fcmswire.com

Supply – Chain Management - http://www.forbes.com/sites/bernardmarr/2016/04/22/how-big-data-and-analytics-are-transforming-supply-chain-management/#6d5afd194c2d

Customer Experience - http://www.forbes.com/sites/stanphelps/2016/04/25/less-big-brother-more-big-mother-three-ways-to-use-big-data-to-enhance-customer-experience/#6fb8c31e3cb2

Production - http://www.forbes.com/sites/drillinginfo/2016/06/13/production-forecasting-predictive-analytics-and-todays-oilfield/#5f05469c410e

Behaviourial Analytics - https://channels.theinnovationenterprise.com/articles/how-behavioral-analytics-is-driving-marketing

Manufacturing - http://www.forbes.com/sites/louiscolumbus/2016/06/26/10-ways-machine-learning-is-revolutionizing-manufacturing/#7d93c5482d7f Marketing - http://www.prdaily.com/Main/Articles/A_marketers_guide_to_data_mining_21006.aspx

Security Analytics - http://www.securityweek.com/increasing-importance-security-analytics

Cybersecurity - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=6 https://techcrunch.com/2016/07/01/exploiting-machine-learning-in-cybersecurity/

Customer Service - http://www.informationweek.com/strategic-cio/executive-insights-and-innovation/11-cool-ways-to-use-machine-learning/d/d-id/1323375?image_number=11 Auto replys to e-mails; ZenDesk releases Automatic Answers, allowing businesses to generate automated replies to customer emails instead of requiring human technical support for routine questions. Upsell/Recommend - eBay acquires SalesPredict, a predictive analytics firm and eBay’s second machine learning acquisition in two months, to better match products to potential buyers.

Logistics – detect WEX, a provider of payment systems used by commercial shipping firms, uses machine learning to detect fuel waste and fraud among truck drivers

Security – id bots - Distil Networks uses machine learning algorithms to begin to defend against Advanced Persistent Bots, bots whose interactions are difficult to discern from real human users.

Zero day exploits - http://www.forbes.com/sites/kevinmurnane/2016/08/08/machine-learning-goes-dark-and-deep-to-find-zero-day-exploits-before-day-zero/#14b6880e6d76

Discover data breach - https://techcrunch.com/2016/07/25/how-predictive-analytics-discovers-a-data-breach-before-it-happens/

General-unsorted

DARDEN BUSINESS PUBLISHING-CASES http://store.darden.virginia.edu/ IVEY PUBLISHING-CASES https://www.iveycases.com/ IVEY PUBLISHING-TEACHING TOOLS https://www.iveycases.com/TeachingAuthoringTools.aspx HARVARD BUSINESS PUBLISHING https://cb.hbsp.harvard.edu/cbmp/pages/home

http://pubsonline.informs.org/page/ited/cases

Case Article—Medication Waste Reduction in an In-Hospital Pharmacy: A Case That Bridges Problem Solving Between a Traditional Case and an Industry Project GregoryDobson, VeraTilson 16 (2) , pp. 68–70 http://dx.doi.org/10.1287/ited.2015.0147ca Keywords: developing analytical skills, spreadsheet modeling, teaching with projects, teaching inventory management, teaching healthcare operations

Case Article—Mapping Business Problems to Analytics Solutions: Surrogate Experiential Learning in an MBA Introductory Data Science and Business Analytics Course Dessislava A.Pachamanova 16 (1) , pp. 15–22 http://dx.doi.org/10.1287/ited.2015.0146ca Keywords: business analytics, data science, problem framing, data analytics lifecycle, experiential learning, role playing

Case—Managing Staffing Inefficiencies Using Analytics Dessislava A.Pachamanova 16 (1) , pp. 23–23 http://dx.doi.org/10.1287/ited.2015.0146cs Keywords: business analytics, data science, problem framing, data analytics lifecycle, experiential learning, role playing

Case Article—Production Scheduling at Falcon Die Casting: A Comprehensive Example on the Application of Linear Programming and Its Extensions B. MadhuRao, JeroenBeliën 15 (1) , pp. 150–153 http://dx.doi.org/10.1287/ited.2014.0132ca Keywords: linear optimization, production planning, case study, spreadsheet optimization

Case—Production Scheduling at Falcon Die Casting B. MadhuRao, JeroenBeliën 15 (1) , pp. 154–155 http://dx.doi.org/10.1287/ited.2014.0132cs Keywords: linear optimization, production planning, case study, spreadsheet optimization

Case Article—Markdown Management at Sports Unlimited MasoudTalebian, Garrettvan Ryzin 14 (2) , pp. 96–99 http://dx.doi.org/10.1287/ited.2013.0121ca Keywords: markdown management, retail management, teaching service operations management, teaching revenue management, developing analytical skills

Case Article—Acusis: Medical Transcription Outsourcing PrakashMirchandani, Tobias (Tim)Ehlich, 13 (3) , pp. 162–164 http://dx.doi.org/10.1287/ited.2013.0110ca Keywords: teaching service operations management, outsourcing, risk management, service guarantee, medical transcription

Case Article—KEY Electronics—Sourcing and Warehouse Analysis TimKraft, Yenho T.Chung, FeryalErhun, 12 (2) , pp. 89–91 http://dx.doi.org/10.1287/ited.1110.0065ca Keywords: product sourcing, inventory modeling, global operations

Case—KEY Electronics—Sourcing and Warehouse Analysis TimKraft, Yenho T.Chung, FeryalErhun, 12 (2) , pp. 92–99 http://dx.doi.org/10.1287/ited.1110.0065cs Keywords: product sourcing, inventory modeling, global operations

Case Article—Quantifying Operational Risk in Financial Institutions BrianKeller, GüzinBayraksan, 12 (2) , pp. 100–105 http://dx.doi.org/10.1287/ited.1110.0075ca Keywords: risk modeling, teaching statistics, maximum likelihood estimation, teaching simulation

Case—Quantifying Operational Risk in Financial Institutions BrianKeller, GüzinBayraksan, 12 (2) , pp. 106–113 http://dx.doi.org/10.1287/ited.1110.0075cs Keywords: risk modeling, teaching statistics, maximum likelihood estimation, teaching simulation

Case—Flight Delays at RegionEx AmrFarahat, Susan E.Martonosi, 11 (3) , pp. 103–105 http://dx.doi.org/10.1287/ited.1110.0066cs Keywords: exploratory data analysis, Simpson's paradox, airline flight delays, cases, developing analytical skills, teaching decision analysis, teaching statistics

Case Article—Inkjet Printer Pricing Thin-YinLeong, Nang-LaikMa, 11 (3) , pp. 132–135 http://dx.doi.org/10.1287/ited.1100.0052ca Keywords: tied products pricing, spreadsheet modeling

Case Article—Keeping Logistics Under Wraps Matthew J.Drake, Paul M.Griffin, Julie L.Swann, 11 (2) , pp. 57–62 http://dx.doi.org/10.1287/ited.1100.0051ca Keywords: distribution network design, integer programming, ethics

Case—Keeping Logistics Under Wraps Matthew J.Drake, Paul M.Griffin, Julie L.Swann, 11 (2) , pp. 63–67 http://dx.doi.org/10.1287/ited.1100.0051cs Keywords: distribution network design, integer programming, ethics

Teaching Note—Keeping Logistics Under Wraps Matthew J.Drake, Paul M.Griffin, Julie L.Swann, 11 (2) , pp. 68–76 http://dx.doi.org/10.1287/ited.1100.0051tn Keywords: distribution network design, integer programming, ethics

Case Article—Revenue Management at the Hong Kong Grand: The Dine in Grandeur Dilemma Sheryl E.Kimes, RohitVerma, Christopher W.Hart, 10 (3) , pp. 126–127 http://dx.doi.org/10.1287/ited.1100.0046ca Keywords: cases, developing analytical skills, interdisciplinary teaching, teaching revenue management, teaching service operations management

Case—Revenue Management at the Hong Kong Grand: The Dine in Grandeur Dilemma Sheryl E.Kimes, RohitVerma, Christopher W.Hart, 10 (3) , pp. 128–132 http://dx.doi.org/10.1287/ited.1100.0046cs Keywords: cases, developing analytical skills, interdisciplinary teaching, teaching revenue management, teaching service operations management

Teaching Note—Revenue Management at the Hong Kong Grand: The Dine in Grandeur Dilemma Sheryl E.Kimes, RohitVerma, Christopher W.Hart, 10 (3) , pp. 133–139 http://dx.doi.org/10.1287/ited.1100.0046tn Keywords: cases, developing analytical skills, interdisciplinary teaching, teaching revenue management, teaching service operations management

Case—Forecasting Beer Demand at Anadolu Efes MuratKöksalan, SelinÖzpeynirci, HaldunSüral, 10 (3) , pp. 142–145 http://dx.doi.org/10.1287/ited.1100.0048cs Keywords: forecasting, regression model, demand estimation

Case Article—Forecasting Beer Demand at Anadolu Efes MuratKöksalan, SelinÖzpeynirci, HaldunSüral, 10 (3) , pp. 140–141 http://dx.doi.org/10.1287/ited.1100.0048ca Keywords: forecasting, regression model, demand estimation

Teaching Note—Forecasting Beer Demand at Anadolu Efes MuratKöksalan, SelinÖzpeynirci, HaldunSüral, 10 (3) , pp. 146–155 http://dx.doi.org/10.1287/ited.1100.0048tn Keywords: forecasting, regression model, demand estimation

Case Article—Process Control and Design of Experiments/ANOVA PrakashMirchandani, 10 (2) , pp. 74–78 http://dx.doi.org/10.1287/ited.1090.0041ca Keywords: process control charts, process capability indices, six sigma, design of experiments (DOE), interaction and main effects

Case Article—Introductory Integrative Cases on Airline Revenue Management Robert A.Shumsky, 9 (3) , pp. 135–139 http://dx.doi.org/10.1287/ited.1090.0033ca Keywords: revenue management, airlines, forecasting, optimization, simulation

Case Series—BlueSky Airlines: Single-Leg Revenue Management Robert A.Shumsky, 9 (3) , pp. 140–144 http://dx.doi.org/10.1287/ited.1090.0033cs1

Case Series—BlueSky Airlines: Network Revenue Management Robert A.Shumsky, 9 (3) , pp. 145–147 http://dx.doi.org/10.1287/ited.1090.0033cs2

Teaching Notes—BlueSky Airlines: Single-Leg Revenue Management (A–C) Robert A.Shumsky, 9 (3) , pp. 148–157 http://dx.doi.org/10.1287/ited.1090.0033tn

Case Article—Revenue Management at Harrah's Entertainment, Inc. NarendraAgrawal, Morris A.Cohen, NoahGans, 9 (3) , pp. 158–159 http://dx.doi.org/10.1287/ited.1090.0031ca Keywords: revenue management, gaming industry, clearing prices, bid prices, hotels

Case—Revenue Management at Harrah's Entertainment, Inc. NarendraAgrawal, Morris A.Cohen, NoahGans, 9 (3) , pp. 160–168 http://dx.doi.org/10.1287/ited.1090.0031cs

Teaching Note—Revenue Management at Harrah's Entertainment, Inc. NarendraAgrawal, Morris A.Cohen, NoahGans, 9 (3) , pp. 169–179 http://dx.doi.org/10.1287/ited.1090.0031tn

Case Article—Seagate–Quantum: Encroachment Strategies Glen M.Schmidt, Jan A.Van Mieghem, 5 (2) , pp. 64–67 http://dx.doi.org/10.1287/ited.5.2.64ca Keywords: disruptive technology, product diffusion, new product development, technology management, marketing strategy, operations strategy, cases

Beer in the Classroom: A Case Study of Location and Distribution Decisions MuratKöksalan, F. SibelSalman, 4 (1) , pp. 65–77 http://dx.doi.org/10.1287/ited.4.1.65 Keywords: facility location, distribution network, linear programming, integer programming, strategic planning, cases

Data Sets

https://www.analyticsvidhya.com/blog/2014/11/data-science-projects-learn/ • Source code for our Big Data keyword correlation API • Great statistical analysis: forecasting meteorite hits • Fast clustering algorithms for massive datasets • 53.5 billion clicks dataset available for benchmarking and testing • Over 5,000,000 financial, economic and social datasets • New pattern to predict stock prices, multiplies return by factor 5 • 3.5 billion web pages • Another large data set - 250 million data points - available for do... • 125 Years of Public Health Data Available for Download • Two big datasets to challenge your data science expertise • From the trenches: real data science project (Google Analytics) • Data sets and other machine learning resources from UC Irvine 

INFORMS MARKETING SCIENCE SOCIETY-ISMS RESEARCH DATASETS https://www.informs.org/Community/ISMS/ISMS-Research-Datasets KD NUGGETS http://www.kdnuggets.com/datasets/index.html DATA.GOV http://www.data.gov/ MARKETING EDGE http://www.marketingedge.org/marketing-programs/data-set-library

http://www.datasciencecentral.com/profiles/blogs/great-github-list-of-public-data-sets.

GOOGLE PUBLIC DATA EXPLORER http://www.google.com/publicdata/directory

(http://www.sigkdd.org/kddcup/index.php) 1997-2010 2011 2012 2013 2014

Courses

http://cs.calstatela.edu/wiki/index.php/Courses/CS_461/Spring_2011 http://www.digitaltrends.com/computing/microsoft-launches-data-science-curriculum/

Machine learning is poised to be the next frontier, so don't miss this opportunity to gain this invaluable and in-demand skillset. Save hundreds on The Complete Machine Learning Bundle, not just $39.99 in the GDGT Deals store. • An Introduction to Machine Learning & NLP in Python ($99 value) • An Introduction To Deep Learning & Computer Vision ($49) • Learn By Example: Statistics and Data Science in R ($99) • Learn By Example: Hadoop & MapReduce for Big Data Problems ($99) • Byte Size Chunks: Java Object-Oriented Programming & Design ($79) • Byte-Sized-Chunks: Twitter Sentiment Analysis in Python ($69) • Byte-Sized-Chunks: Decision Trees and Random Forests ($69) • Byte-Sized-Chunks: Recommendation Systems ($69) • From 0 to 1: Learn Python Programming—Easy as Pie ($49) • Quant Trading Using Machine Learning ($99)

Conferences:

http://www.kdd.org/kdd2016/ aug 13-17 https://flipboard.com/@aj1ujm1/data-science-v706t99nz/the-7-conferences-data-scientists-shouldn%E2%80%99t-miss/a-7BoauZ99SoawPPJDGu1s3g%3Aa%3A285596943-3affa662b0%2Frjmetrics.com

Instructor Resources:

SAS-INSTRUCTOR RESOURCES http://support.sas.com/learn/ap/tkit/list.html

Books

Requested books:

-June 27 - http://www.cambridge.org/us/academic/subjects/computer-science/pattern-recognition-and-machine-learning/bayesian-reasoning-and-machine-learning?format=HB&isbn=9780521518147 -http://www.cambridge.org/us/academic/subjects/economics/econometrics-statistics-and-mathematical-economics/time-series-models-business-and-economic-forecasting-2nd-edition?format=PB&isbn=9780521520911 https://leanpub.com/artofdatascience

Free books - https://ai.icymi.email/60-free-books-on-bigdata-datascience-datamining-machinelearning-python-r-and-more/

Free books - http://www.kdnuggets.com/2015/03/free-data-mining-data-science-books-resources.html

Free books - https://flipboard.com/@aj1ujm1/data-science-v706t99nz/10-free-machine-learning-books/a-zUSQxSTQRMi-tLUGgiH1Yg%3Aa%3A285596943-294230a527%2Fdatasciencecentral.com

Financial Analytics - http://www.cambridge.org/pa/academic/subjects/statistics-probability/statistics-econometrics-finance-and-insurance/financial-analytics-r-building-laptop-laboratory-data-science?format=HB

http://web.stanford.edu/%7Ehastie/CASI/contents.html

INFORMS subject matter experts compiled the following list of key references that may help you prepare for the CAP® exam. A select committee of subject matter experts who have earned the CAP® credential, have developed a Study Guide that can be used to help candidates prepare for the exam. The Study Guide is available on the CAP® website and has information relating to each of the Domain areas listed here. Domain I – Business Problem (Question) Framing Kirkwood CW (1997) Strategic Decision Making: Multiobjective Decision Analysis with Spreadsheets (Duxbury Press, Paci c Grove, CA). Domain II – Analytics Problem Framing Albright SC, Winston W, Zappe C (2011) Data Analysis and Decision Making, 4th ed. (South-Western Cengage Learning, Mason, OH). Domain III – Data Hubbard DW (2010) How to Measure Anything: Finding the Value of “Intangibles” in Business, 2nd ed. (John Wiley & Sons, Hoboken, NJ). Hillier F, Hillier M (2013) Introduction to Management Science: A Modeling and Case Study Approach, 5th ed. (McGraw-Hill Higher Education, New York). Vose D (2008) Risk Analysis: A Quantitative Guide, 3rd ed. (John Wiley & Sons, Chichester, UK). Domain IV – Methodology (Approach) Selection Neter J, Kutner M, Nachtsheim C, Wasserman W (1996) Applied Linear Statistical Models, 4th ed. (McGraw-Hill/Irwin, New York). Domain V – Model Building and Domain VII – Model Life Cycle Management Hillier FS, Lieberman GJ (2010) Introduction to Operations Research, 9th ed. (McGraw-Hill, New York). Ross SM (2010) Introductory Statistics, 3rd ed. (Academic Press, Burlington, MA). Clemen RT (1997) Making Hard Decisions: An Introduction to Decision, 2nd ed. (Duxbury Press, Paci c Grove, CA). Law AM, Kelton DW (2006) Simulation Modeling and Analysis, 4th ed. (McGraw-Hill, New York). 21 Domain VI – Deployment Laursen GHN, Thorlund J (2010) Business Analytics for Managers: Taking Business Intelligence Beyond Reporting (John Wiley & Sons, Hoboken, NJ). Bartlett R (2013) A Practitioner's Guide to Business Analytics: Using Data Analysis Tools to Improve Your Organization's Decision Making and Strategy (McGraw-Hill, New York). Breeden J (2013) Tipping Sacred Cows: Kick the Bad Work Habits that Masquerade as Virtues (Jossey-Bass, San Francisco, CA). Brohaugh W (2007) Write Tight: Say Exactly What You Mean With Precision and Power (Sourcebooks, Naperville, IL). Cukier (2013) Big Data: A Revolution That Will Transform How We Live, Work, and Think (Houghton Mif in, New York). Davenport T, Harris J (2110) Analytics at Work: Smarter Decision, Better Results (Harvard Business Review Press, Boston). Davenport T, Kim J (2013) Keeping up with the Quants: Your Guide to Understanding and Using Analytics (Harvard Business Review Press, Boston). Duarte N (2012) HBR Guide to Persuasive Presentations (Harvard Business Review Press, Boston). Eckerson W (2012) Secrets of Analytical Leaders: Insights from Information Insiders (Technics Publications, West eld, NJ). Franks B (2012) Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics (John Wiley & Sons, Hoboken, NJ). Jarman K (2013) The Art of Data Analysis: How to Answer Almost Any Question Using Basic Statistics (John Wiley & Sons, Hoboken, NJ). Phillips J (2013) Building a Digital Analytics Organization: Creating Value by Integrating Analytical Processes, Technology, and People into Business Operations (Pearson, Upper Saddle River, NJ). Provost F, Fawcett T (2013) Data Science for Business: What You Need to Know About Data Mining and Data- Analytic Thinking (O’Reilly Media, Sebastopol, CA). Redman T (2001) Data Quality: The Field Guide (Digital Press, Woburn, MA). Sashihara S (2011) The Optimization Edge: Reinventing Decision Making to Maximize All Your Company's Assets (McGraw-Hill, New York). Savage S (2012) The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty (John Wiley & Sons, Hoboken, NJ). Saxena R, Srinivasan A (2012) Business Analytics: A Practitioner’s Guide (Springer, New York). Shmueli (2012) Practical Time Series Forecasting: A Hands-On Guide (Springer, New York). Siegel E (2013) Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (Wiley, New York). Silver N (2012) The Signal and the Noise: Why Most Predictions Fail but Some Don’t (Penguin Press, New York). Soares S (2013) Big Data Governance: An Emerging Imperative (MC Press Online, Boise, ID). Spitzer DR (2007) Transforming Performance Management: Rethinking the Way We Measure and Drive Organizational Success (AMACOM, New York). Taylor J (2011) Decision Management Systems: A Practical Guide to Using Business Rules and Predictive

Question pool

  1. Which of the following BEST describes the data and information ow within an organization? . a)  Information assurance

. b)  Information strategy

. c)  Information mapping

. d)  Information architecture

  1. A multiple linear regression was built to try to predict customer expenditures based on 200 independent variables (behavioral and demographic). 10,000 rows of data were fed into a stepwise regression, each row representing one customer. 1,000 customers were male, and 9,000 customers were female. The nal model had an adjusted R-squared of 0.27 and seven independent variables. Increasing the number of rows of data to 100,000 and rerunning the stepwise regression will MOST likely: . a)  have nesligible impact upon the adjusted R-squared.

. b)  increase the impact of the male customers.

. c)  change the heteroskedasticity of the residuals in a favorable manner.

. d)  decrease the number of independent variables in the nal model.

  1. A clothing company wants to use analytics to decide which customers to send a promotional catalogue in order to attain a targeted response rate. Which of the following techniques would be the most appropriate to use for making this decision? . a)  Integer programming

. b)  Logistic regression

. c)  Analysis of variance

. d)  Linear regression

  1. Which of the following is an effective optimization method?

a) b) c) d) 5. A box a) b) c) d) Analysis of variance (ANOVA) Generalized linear regression model (GLM) Box-Jenkins Method (ARIMA) Mixed integer programming (MIP) and whisker plot for a dataset will MOST clearly show: the difference between the second quartile and the median. the 90% con dence interval around the mean. where the [actual-predicted] error value is not zero. if the data is skewed and, if so, in which direction. 6. In the important information to discuss? initial project meeting with a client, which of the following is the MOST . a)  Timeline and implementation plan

. b)  Analytical model to use

. c)  Business issue and project goal

. d)  Available budget

  1. Which of the following statements is true of modeling a multi-server checkout line? . a)  A queuing model can be used to estimate service rates.

. b)  A queuing model can be used to estimate average arrivals.

. c)  Variability in arrival and service times will tend to play a critical role in congestion.

. d)  Poisson distributions are not relevant.

  1. A company is considering designing a new automobile. Their options are a design based on current gasoline engine technology or a government proposed “Green” technology. You are a government of cial whose job is to encourage automakers to adopt the “Green” technology. You cannot provide funding for development or production costs, but you can provide a subsidy for every car sold. The development costs and the wholesale price, in USD ($), of the cars are shown in the table following:

Gasoline Technology “Green” Technology Production Wholesale Price/ vehicle 25,000 40,000 Variable Cost/vehicle 15,000 35,000 Fixed Development Cost 100,000,000 200,000,000

  1. How large a subsidy per vehicle sold will be required, assuming there will be enough demand to motivate the switch? . a)  Greater than $5000

. b)  Less than $5000

. c)  Cannot be determined

. d)  Equal to $5000

  1. A furniture maker would like to determine the most pro table mix of items to produce. There are well-known budgetary constraints. Each piece of furniture is made of a predetermined amount of material with known costs, and demand is known. Which of the following analytical techniques is the MOST appropriate one to solve this problem? . a)  Optimization

. b)  Multiple regression

. c)  Data mining

. d)  Forecasting

  1. You have simulated the net present value (NPV) of a decision. It ranges between -$10 million and +$10 million. To BEST present the likelihood of possible outcomes, you should: . a)  present a single NPV estimate to avoid confusion.

. b)  present a histogram to show the distribution of various NPV estimates.

. c)  trim all outliers to present the most balanced diagram.

. d)  relax constraints associated with extreme points in the simulation.

  1. A company ships products from a single dock at their warehouse. The time to load shipments depends on the experience of the crew, products being shipped and weather. The company thinks there is signi cant unmet demand for their products and would like to build another dock in order to meet this demand. They ask you to build a model and determine if the revenue from the additional products sold will cover the cost of the second dock within two years of it becoming operational. Which of the following is the MOST appropriate modeling approach?" . a)  Optimization because it is a transportation problem.

. b)  Optimization because the company’s objective to maximize pro t and capacity at the dock is a limited resource.

. c)  Forecasting because you can determine the throughput at the dock, calculate the net revenue and compare this with the cost of the new dock.

. d)  Discrete event simulation because there are a sequence of discrete random events through time.

  1. Two investors who have the same information about the stock market buy an equal number of shares of a stock. Which of the following statements MUST be true? . a)  The risks for the two investors are statistically independent.

. b)  oth investors are subject to the same risks.

. c)  Both investors are subject to the same uncertainty.

. d)  If the investors are optimistic, they should have borrowed, rather than bought the shares.

  1. A project seeks to build a predictive data-mining model of customer pro tability based upon a series of independent variables including customer transaction history, demographics, and externally purchased credit-scoring information. There are currently 100,000 unique customers available for use in building the predictive model. Which of the following strategies would re ect the BEST allocation of these 100,000 customer data points? . a)  Use 70,000 randomly selected data points when building the model, and hold the remaining 30,000 as a test dataset.

. b)  Build the model using all 100,000 data points.

. c)  Randomly partition the data into 4 datasets of equal size, build four models and take their average.

. d)  Use 1,000 randomly selected data points when building the model.

  1. Conjoint analysis in market research applications can: . a)  give its best estimates of customer preference structure based on in-depth interviews with a small number of carefully chosen subjects.

. b)  only trade off relative importance to customers of features with similar scales.

. c)  allow calculation of relative importance of varying features and attributes to customers.

. d)  only trade off among a limited number of attributes and levels.

  1. One of the main advantages of tree-based models and neural networks is that they: . a)  are easy to interpret, use, and explain.

. b)  build models with higher R squared than other regression techniques.

. c)  reveal interactions without having to explicitly build them into the model.

. d)  can be modeled even when there is a signi cant amount of missing data.

  1. The monthly pro t made by a clothing manufacturer is proportional to the monthly demand, up to a maximum demand of 1000 units, which corresponds to the plant producing at full capacity. (Any excess demand over 1000 units will be satis ed by some other manufacturer, and hence yield no additional pro t.) The monthly demand is uncertain, but the average demand is reliably estimated at 1000 units. At this level of demand the monthly pro t is $3,000,000. Which of the following statements must be true of the expected monthly pro t, P? . a)  P can have any positive value.

. b)  P is possibly greater than $3,000,000.

. c)  P is equal to $3,000,000.

. d)  P is less than $3,000,000.

  1. After building a predictive model and testing it on new data, an under prediction by a forecasting system can be detected by its: . a)  negative-squared.

. b)  bias.

. c)  mean absolute deviation.

. d)  mean squared error.

  1. All times in the decision tree below are given in hours. What is the expected travel time (in hours) of the optimal (minimum travel time) decision?

  2. a) 7.8 b) 6.9 c) 7.4 d) 7.0

  3. An analytics professional is responsible for maintaining a simulation model that is used to determine the staf ng levels required for a speci c operational business process. (Bf type is the new content; don't leave it as bf type!). Assuming that the operational team always uses the number of staff determined by the model, which of the following is the MOST important maintenance activity? . a)  Ensure that all the model input data items are available when needed.

. b)  Determine if there has been a change in model accuracy over time.

. c)  Ensure that all users are reviewing the model results in a timely fashion.

. d)  Determine that the model’s reports are understood by the users.

  1. A segmentation of customers who shop at a retail store may be performed using which of the following methods? . a)  Monte Carlo Markov Chain and ANOVA

. b)  Clustering, factor and control charts

. c)  Decision tree and recursive function analyses

. d)  Clustering and decision tree . 21. In the following diagram, what is true of Strategy B compared to Strategy A?

cum. probability 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 (400) (200) - Cumulative Probability Curves NPV - Millions US$

200 400 600 800 1,000 Strategy A Strategy B 1,200 1,400 1,600 NPV, Millions US$ 19. a)  Strategy B exhibits stochastic (probabilistic) dominance over Strategy A.

  1. b)  Strategy B has the same downside risk as Strategy A since the curves have the same shape.

  2. c)  Strategy B must have the same uncertainties impacting it as Strategy A because the curves are so similar in shape.

  3. d)  Strategy A exhibits stochastic (probabilistic) dominance over strategy B.

  4. Each month you generate a list of marketing leads for direct mail campaigns. Which of the following should you do before the list is used? a) Exclude people who were on the list the previous month. b) Retain x% of the leads as control for performance measurement. c) Remove opt-outs. d) Exclude people who were never on the list.

  5. When analyzing responses of a survey of why people like a certain restaurant, factor analysis could reduce the dimension in which of the following ways? . a)  Collapse several survey questions regarding food taste, health value, ingredients and consistency into one general unobserved “food quality” variable.

. b)  Condense similar survey respondent answers into clusters of like-minded customers for market segment analysis.

. c)  Reduce the variability of individual subject ratings by centering each respondent’s ratings around his or her average rating.

. d)  Decrease variability by analyzing inter-rater reliability on the question items before offering the survey to a wide number of respondents.

. 24. A preferred method or best practice for organizing data in a data warehouse for reporting and analysis is: . a) transactional-based modeling. b) multidimensional modeling. c) relation-based modeling. d) tuple-based modeling. . 1. d 7. c 13. a 19. b 2. a 8. a 14. c 20. d 3. b 9. a 15. c 21. a 4. d 10. b 16. d 22. c 5. d 11. d 17. b 23. a 6. c 12. c 18. d 24. b

Domain I: Business Problem Framing Questions 6, 8, 10, 12 Domain II: Analytics Problem Framing Questions 7, 14, 16, 20 Domain III: Data Questons 1, 2, 5, 23, 24 Domain IV: Methodology (Approach) Selection Questions 3, 4, 9, 11 Domain V: Model Building Questions 13, 15, 18, 21 Domain VI: Deployment Questions 17, 22 Domain VII: Model Lifecycle Management Question 19

Rationale for Correct Answers 23. d) Information architecture CORRECT: Information architecture refers to the analysis and design of the data stored by information systems, concentrating on entities, their attributes, and their interrelationships. It refers to the modeling of data for an individual database and to the corporate data models that an enterprise uses to coordinate the de nition of data in several (perhaps scores or hundreds) distinct databases.

  1. a) have negligible impact upon the adjusted R-squared. CORRECT: The increase in size of the data will not impact the adjusted R-squared calculation because both samples are suf ciently large randomly selected subsets of data.

  2. b) Logistic regression CORRECT: This type of classi cation model is often used to predict the outcome of a categorical dependent variable (response vs. no response) based on one or more predictor variables, so this is the most appropriate answer. The goal of the analytics in the stated problem is to determine who is most likely to respond, and the binary nature of this predicted outcome is provided by logistic regression.

  3. d) Mixed integer programming (MIP) CORRECT: This is a mathematical optimization technique used when one or more of the variables are restricted to be integers. It is an effective optimization model.

  4. d) if the data is skewed and, if so, in which direction. CORRECT: A box and whisker plot, sometimes just called a “box plot,” was invented by John Tukey as a way to graphically display the distribution of data. The ends of the box are at the rst and third quartiles, and there is a line somewhere in the box representing the median value. The whiskers extend either to the minimum and maximum values in the data set, or possibly less if they do not include points identi ed as outliers.

  5. c) Business issue and project goal CORRECT: Understanding the business issue and project goal provides a sound foundation on which to base the project.

  6. c) Variability in arrival and service times will tend to play a critical role in congestion. CORRECT: Arrival and service time distributions are inputs to a queuing model that would be used to model a checkout line and directly in uence congestion.

  7. a) Greater than $5000 CORRECT: If we consider the pro t from an individual vehicle to be the wholesale price minus the variable cost, we see that the pro t from a Gasoline Technology vehicle is $25K-$15K = $10K. Similarly, the pro t from a “Green” Technology vehicle is $40K-$35K = $5K. In order to make up for this difference in lost pro t, the subsidy provided to the automaker would have to be at least $5K (the difference between $10K and $5K). In addition, the subsidy would need to be greater than $5000 so that the automakers would be able to recover their increased xed costs at a reasonable level of demand.

  8. a) Optimization CORRECT: The problem statement describes an optimization problem: the furniture maker’s objective function is to maximize his pro t. The decision variables are the amount of each item to produce, and the constraints are that he must meet demand and be within his budget. Optimization is the most appropriate technique to solve this problem.

  9. b) present a histogram to show the distribution of various NPV estimates. CORRECT: Net Present Value (NPV) takes as input a time series of cash ow (both incoming and outgoing) and a discount

rate and outputs a price. By showing a histogram (a graphical representation of the distribution of data), it is possible to see how likely various NPVs (beyond the given minimum and maximum) are to occur. This would be useful information to have when considering a decision, especially since the range of outcomes includes $0, meaning the decision could result in a pro t or a loss. 33. d) Discrete event simulation because there are a sequence of random events through time. CORRECT: The time to load shipments depends on the experience of the crew, products being shipped, and weather. Given that there is a sequence of random events through time, discrete event simulation is the most appropriate modeling approach.

  1. c) Both investors are subject to the same uncertainty. CORRECT: Both investors are subject to the same uncertainty regarding the stock market.

  2. a) Use 70,000 randomly selected data points when building the model, and hold the remaining 30,000 out as a test dataset. CORRECT: This split provides suf cient data to build the model and suf cient data to test the model. This is the best allocation of the customer data points, (A common ‘rule of thumb’ is to use about two thirds of the data to build the model and one third to test it.)

  3. c) allow calculation of relative importance of varying features and attributes to customers. CORRECT: Conjoint analysis by de nition maps consumer preference structures into mathematical tradeoffs, and was designed to allow a marketer to compare the relative utility of varying features and attributes.

  4. c) reveal interactions without having to explicitly build them into the model. CORRECT: Tree-based models and neural networks are employed to nd patterns in the data that were not previously identi ed (or input into the model building process).

  5. d) P is less than $3,000,000. CORRECT: When the demand is 1000 or greater, the pro t is $3,000,000. But when the demand is less than 1000, the pro t is less than $3,000,000. Given this and that the average demand is 1000 units, the expected monthly pro t must be less than $3,000,000.

  6. b) bias. CORRECT: The bias measures the difference, including the direction of the estimate and the right answer. Depending on whether it’s positive or negative, it will show whether there is an over or under estimate.

  7. d) 7.0 CORRECT: To answer this question, one needs to solve the decision tree using the “roll back” technique. Continuing back the bottom branch of the tree, the expected time if you y is (0.5)(9.0) + (0.5)(5) = 7.0 hours. Now, when faced with the “drive or y” decision, you should choose to y (since 7.0 hours is less than 7.35 hours). Thus, answer d) 7.0 hours is the expected travel time of the optimal (or minimal travel time) decision.

  8. b) Determine if there has been a change in model accuracy over time. CORRECT: The most important maintenance activity for the analytics professional responsible for maintaining the simulation model is to monitor the accuracy of the model over time. If there has been a change in accuracy, the analytics professional may need to revisit the assumptions of the model.

  9. d) Clustering and decision trees CORRECT: Customer segmentation consists of dividing a customer base into groups of individuals that are similar in speci c ways relevant to marketing, e.g., age, gender, interests, spending habits and so on. The purpose of customer segmentation is to allow a company to target speci c groups of customers effectively and allocate marketing resources to best effect. Two ways to do this segmentation are clustering and decision trees.

  10. a) Strategy B exhibits stochastic (probabilistic) dominance over Strategy A. CORRECT: Because the cumulative probability curve for Strategy B is below (or to the right) of the corresponding curve for Strategy A, it can be said that Strategy B exhibits stochastic dominance (SD) over Strategy A. B stochastically dominates A when, for any good outcome x, B gives at least as high a probability of

receiving at least x as does A, and for some x, B gives a higher probability of receiving at least x. Since the curves do not cross, B stochastically dominates A. 44. c) Remove opt-outs. CORRECT: The list of marketing leads should not include people or organizations that have opted out.

  1. a) Collapse several survey questions regarding food taste, health value, ingredients, and consistency into one general unobserved "food quality" variable. CORRECT: Factor analysis is a statistical method used to describe variability among observed variables in terms of a potentially lower number of unobserved variables called factors. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset.

  2. b) multidimensional modeling. CORRECT: Multidimensional modeling is the optimum way to organize data in a data warehouse for analysis. It is associated with OLAP (On-line Analytical Processing). OLAP data is organized in cubes that can be taken directly from the data warehouse for analysis.

Other Resources:

90 blogs: http://www.kdnuggets.com/2015/10/best-blogs-analytics-big-data-science-machine-learning.html KAGGLE IN-CLASS INITIATIVE (http://inclass.kaggle.com/)

KAGGLE-KDD CUP (1997-2010) (http://www.sigkdd.org/kddcup/index.php) 1997-2010 2011 2012 2013 2014

INFORMS JOURNALS http://pubsonline.informs.org/action/doSearch?AllField=%22Data%2C+as+supplemental+material%2C+ are+available+%22 Question Pool

index's People

Contributors

tonyj4102 avatar

Watchers

James Cloos avatar Rana Hasan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.