Below is a list of data science resources to introduce you to the world of data analytics, data science, and programming. I'll update periodically as I learn of new resources and tools. R, Python, and SQL are programming languages used for data collection, data manipulation, data analytics, statistical analysis, web scraping, machine learning, and artificial intelligence. If you have any suggestions, send me a note at [email protected], thanks.
- Codecademy: This is one of the best online interactive learning platforms that I've used - Codecademy offers free (and for pay) coding classes in several programming languages including Python, Java, JavaScript, SQL, and more
- Kaggle: World's largest community of data scientists and machine learners, with discussion boards, downloadable datasets, and competitions to reinforce your programming skills with R, Python, etc.
- Khan Academy: Statistics is critical for any data scientist, and the Khan Academy statistics and probability lessons will provide you with the basics
- Super Data Science Courses: I'm a huge fan of Kirill Eremenko's SuperDataScience, and now they're offering online intro courses (their $1 Premium 30-day trial will get you access to over 20 courses)
- freeCodeCamp's Best Courses: freeCodeCamp ranks online data science courses on their Medium blog
- Udacity's Data Science Nanodegree: I know two people who have taken this course and raved about it, but it requires some understanding of statistics and experience working with data (Udacity has other data science courses that are easier difficulty)
- Fast.ai: For the more advanced data science students, Fast.ai provides free, high-quality, highly-reviewed machine learning and deep learning courses
Outside of the classroom, podcasts help me stay current with data science news and local events. If you have any podcast suggestions that touch on analytics careers, data science, machine learning, or artificial intelligence, I'd love to hear from you!
- Super Data Science Podcast: Hosted by Kirill Eremenko, the goal of the Super Data Science podcast is to bring you the most inspiring data scientists and analysts from around the world
- Data Skeptic: Hosted by Kyle Polich, Data Skeptic aims to reveal the science of fake news
- The Science of Social Media: Created by Buffer (social media account management and post scheduling tool), this podcast is great for understanding how data science is being used in marketing and advertising, and explores technologies around social media analytics
- Linear Digressions: This podcast is great for students and practitioners, each episode covering the complexities of a specific data science topic, e.g., A/B testing and compliance bias, data privacy laws, text analysis (and tools like Word2Vec), computational limitations, AutoML tools, etc.
-
Eremenko, Kirill. Confident Data Skills: Master the Fundamentals of Working with Data and Supercharge Your Career. Kogan Page, 2018.
- From the creator of Super Data Science, this was an easy "for fun" read, light math, and provides an overview of data analytics and careers in this field. I swear I'm not on Kirill's payroll(!)... he's just a data science career guru with fresh perspectives. Amazon link
-
Provost, Foster and Fawcett, Tom. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O'Reilly Media, Inc., 2013.
- This was a textbook for Foundations of Data Science, an UCI graduate course that I recently completed. Examples in this textbook are denser and math-heavier than Confident Data Skills. This would be most appropriate for anyone actively trying to move into a data science role. Amazon link
-
Murray, Scott. Interactive Data Visualization for the Web: An Introduction to Designing with D3. O'Reilly Media, Inc., 2nd Edition, 2017.
- Ideal for beginners, Scott Murray takes you through easy-to-consume fundamental concepts and methods of D3, the most powerful JavaScript library for web-based visualizations. Amazon link
- Note: This 2nd Edition (October 2017) release was updated with D3's v4 syntax and D3 v5 was released in April 2018. While some of the syntax in this 2nd Edition may be rendered obsolete, it's still a valuable resource for learning the fundamentals of D3 and data design.
- RStudio: To get started with R, download and install RStudio's open-source desktop IDE
- swirl: Once RStudio is installed (or your IDE of choice), installing the swirl package is as easy as typing install.packages("swirl") in the RStudio console
- This R package includes 15 interactive lessons (about 30 minutes each) that teach you the fundamentals of R programming and the data structures
- R for Excel Users: Pivoting from a financial modeling background to data science, this blog was helpful in understanding how to do common "Excel stuff" in R
- RStudio shortcuts: Keystrokes make everything faster, and this certainly helps
- Tip for Excel users: If you're using Excel regularly, stop using your mouse!
- [Introduction to dplyr]: After installing the dplyr package, you can copy/paste sample R code from this site to learn how to use dplyr functions
- dplyr is one of the Tidyverse packages that makes data manipulation and summarization easier and faster to do in R
- ggplot2: While R's base graphing capabilities are powerful, the ggplot2 package makes it easier to produce high-quality plots and visualizations (ggplot2 sample below)
Many online Python trainings were made when Python 2.x was standard. Python 3.x is the future of the language, so don't bother installing Python 2.x (syntax between the two is very similar). Ping me if you need help.
- PyCharm: A plethora of Python IDEs exist, PyCharm just happens to be my Python desktop IDE of choice
- The full version is $89 USD for individual users, but if you're an active student, you can download the entire suite for free.
- You can download a free stripped-down community version but it lacks full web development, database and SQL support
- Google for Education: Google's Python classes (YouTube links below) are loaded with great content and three downloadable scripts that accompany the lessons - These will take you a few days to complete and if you have a second monitor, load the YouTube videos on your second monitor so you can pause and easily reference the instructor's syntax
- Google's Python Class (Day 1, Part 1): Introduction and strings
- Google's Python Class (Day 1, Part 2): Lists and sorting
- Google's Python Class (Day 1, Part 3): Dictionary and files
- Google's Python Class (Day 2, Part 1): Regular Expressions
- Google's Python Class (Day 2, Part 2): Utilities: OS Modules and Commands
- Google's Python Class (Day 2, Part 3): URLs and HTTP, Exceptions
- Google's Python Class (Day 2, Part 4): Conclusion
- SQLZOO: Learn SQL in stages with these free, interactive tutorials
- The Try-SQL Editor: The SQL Tryit Editor lets you practice SQL statements with pre-populated data sets
- 33mail: I’ve learned a lot about the harvesting, sale, and use of personal digital data for commercial purposes. Privacy-enhancing services such as 33mail allow you to create new email addresses instantly for forwarding purposes.