I want to learn data science and become a data analyst. Preferably from free online sources. Bear in mind I come from a mechanical engineering background. So I am not familiar with software or any programming language. The sources need to start from the most basic level because of it.
Thank you in advance.
Hi everyone! I’m a 3rd year student looking to break into data science. I know Python and basic stats but feel overwhelmed by where to go next. Could you share
-
A structured roadmap (topics, tools, projects)?
-
Best free/paid resources (MOOCs, books)?
-
How much SQL/ML is needed for entry-level roles? Thanks in advance!
-
Should I focus more on stats or coding first?
-
What projects would make my portfolio strong?
-
Are there any free/paid resources you recommend?
I worked as a web programmer in the past (PHP, Javascript, SQL).
Now I am a PhD student in Psychology.
I like Data Science very much and I am trying to learn Excel, R, Python, and Matlab, but to understand how these algorithms work I would also need some Math knowledge.
A few decades ago, I studied Calculus in high school which I have almost completely forgotten, but never Linear Algebra, and I passed a few exams in Statistics.
Since English is not my first language, what (video) course would you suggest to learn Data Science, including Calculus and Linear Algebra, which is not too complex to understand, not too long, and not very expensive?
Thank you very much!
[skipping over background information like I resigned my job already, now looking for a new job, but I don’t have python, R, and analytical skills]
My phone has been listening to me. So I been getting ads like datacamp, codecademy, and etc. I’m thinking of subscribing to datacamp but before I do any other courses, certifications, or sites I should consider that you recommend?
Thank you for considering my post.
I've handpicked more than 60 free online resources to learn data science DataPen.io
You can find resources for data analysis, statistics, machine learning, programming, cheat sheets and more.
I and a group of my friends want to learn data science and we would like to know the best resources.
I checked out Udacity's Data Science Course, watched the first 20 videos or so and it seemed helpful, working with pandas is great, pandas is useful. Looked at some reviews though and learned that the entire course scratches the surface and doesn't really teach much about real data science.
Checked out Coursera's Intro Data Science course and watched the first 10 videos or so, this was mostly just explaining the field and its subfields. Which is great, I just knew I wouldn't start coding until probably 20 videos in or so. Checked out the reviews for this one too, pretty much the same things were said of this course: scratches the surface, too complex to fast, bad instructor, would be better off learning each subject individually, etc. (one person even said this was the worst course on coursera!)
Reading ThinkStats at the moment and checked the reviews for this one too, they all seem to be good with a couple of bad ones talking about how it is an introduction to pandas rather than an introduction to statistics. Which is fine. ThinkBayes is on our reading list too.
I am wondering what would be the absolute best course of action to learn data science, so that we don't waste any more time.
Would really appreciate some quality advice
Hello everyone!
Staying on top of the constantly growing skill requirements in Data Science is quite a challenge. To manage my own learning and growth, I've been curating a list of useful resources and tools that cover the full spectrum of the field — from data analysis and engineering to deep learning and AI.
I'd love to get your professional opinion. Could you please take a look? Have I missed anything crucial? What else would you recommend adding or focusing on?
To give you an immediate sense of the list's scope and structure, I've attached screenshots of the table of contents below.
The full version with all the active links and additional resources is available on GitHub. You can find the link at the end of the post.
I'd be happy if this list is useful to others.
You can view the full list here View on GitHub
Thanks for your time! Your advice is invaluable!
Hey everyone, over the past couple of months I have compiled a list of resources by reading this subreddit and asking around. Hopefully, this list helps somewhat even though there have been a ton of similar posts. If anyone has anything to add, I would really appreciate it!!
Math
1)Calc I-III
Resources: Look up Professor Leonard's videos on Youtube ( he is the best resource for calc I and II IMO), Paul's Notes, and PatrickJMT's youtube account. Paul's Notes has a bunch of practice sets with answers that I can try to find if anyone is interested.
4)Linear Algebra ( Free textbooks: Linear Algebra from UC Davis, Linear Algebra from Saint Michael's College, and Linear Algebra Done Wrong)
Statistics Foundation:
-
Intro to Statistics
a)(https://www.openstaxcollege.org/textbooks/introductory-statistics)
b) Stanford's online Statistical Learning with R (started this week and is free)
-
Probability (preferably using R)
Books (http://ipsur.org/index.html & http://publicifsv.sund.ku.dk/~pd/ISwR.html)
3) Bonus: Econometrics
R
-
http://www.ats.ucla.edu/stat/r/
-
http://tryr.codeschool.com/
-
http://swirlstats.com/
4)https://www.datacamp.com/
Python Intro:
1)Automate the Boring Stuff Book/ Udemy course (Make sure to find a coupon for the class, the book is free online)
2)Programming For Everyone (Michigan University)
3) Rice University- An Introduction to Interactive Programming in Python (there is a part one and two)
4)Introduction to Computer Science and Programming Using Python(MIT course on edx) - maybe a little more advanced.
Data Analysis
1)Data Science Class at Harvard (CS 109/ Stat 221)
2)Introduction to Computational Thinking and Data Science(MIT course on edx) follow-up to the other MIT course for python
3) Data Analysis and Statistical Inference (coursera)
4) Codeacademy for Data Scientists
https://www.dataquest.io/
Machine Learning
1)Pretty much everyone recommends starting with Andrew Ng's class
2)http://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/
3) Neural Networks for Machine Learning (Coursera)
SQL
-
SAMS Teach Yourself SQL in 10 Minutes
-
Khan Academy course for introductory SQL
-
W3Schools
Extra Resources:
-
Join meet-up groups. For example, I just joined a group called DataPhilly
-
The book Data Science from Scratch
-
Finding data: data.gov, r/datasets, R10 - Yahoo News Feed dataset, version 1.0 (1.5TB), http://archive.ics.uci.edu/ml/datasets.html
-
100 free data science books:
http://www.learndatasci.com/free-books/
EDIT: Added more resources that people suggested.
It was requested in the past to but together some resources for new R users. Below is a collection of everything I could find to help others out.
Data Science Learning Resources
Sections
-
Programming
-
Machine Learning
-
Leadership & Strategy
Programming
General
-
The Pragmatic Programmer (Book)
-
Clean Code (Book)
-
Architecture Playbook (Online guide)
Python
-
A Whirlwind Tour of Python (Book)
-
Python Data Science Handbook
-
Python Tricks (Book)
-
Learning Python (Book)
-
Effective Python (Book)
R
-
R for Data Science (Book)
-
Advanced R (Book)
-
R Markdown: The Definitive Guide (Book)
-
bookdown: Authoring Books and Technical Documents with R Markdown (Book)
-
Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving (Book)
-
Automated Data Collection with R (Book)
-
Introduction to Data Science (Book)
Spark
-
Spark: The Definitive Guide: Big Data Processing Made Simple (Book)
-
Learning Spark: Lightning-Fast Big Data Analysis (Book)
-
Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling (Book)
Command Line
-
The Missing Semester of Your CS Education (Online course)
-
Learning the bash Shell (Book)
-
The Art of the Command Line (GitHub resources)
-
explainshell.com (Online help)
Containers
-
Docker tips & tricks or just useful commands (Online article)
-
Rocker: R configurations for Docker (GitHub resources)
-
Docker and Python: making them play nicely and securely for Data Science and ML (PyCon Talk)
Functional Programming
-
An Introduction to the Basic Principles of Functional Programming (Online article)
-
R for Data Science, Ch. 21 (Book)
-
Advanced R, Ch. 9 (Book)
-
Jenny Bryan's purrr tutorials (Online tutorial)
-
Foundations of Functional Programming with purrr (DataCamp)
-
Intermediate Functional Programming with purrr (DataCamp)
Version Control
-
Excuse me, do you have a moment to talk about version control? (Paper)
-
Happy Git and GitHub for the useR (Book)
-
Learn Git (Online tutorial)
-
Git Commit Message Style Guide (Online guide)
Code Packaging
-
Python Packaging Authority
-
Python Packaging User Guide
-
R Packages
Style Guide, Readability, Best Practices
-
The Art of Readable Code (Book)
-
The Tidyverse Style Guide (Online book)
-
PEP 8 -- Style Guide for Python Code (Online guide)
-
Guidelines for code reviews (README)
-
Code Review Best Practices (Blog post)
Testing
-
Testing R Code (Book)
-
Python Testing with pytest (Book)
-
Multiply your Testing Effectiveness with Parameterized Testing (PyCon Talk)
-
Test-Driven Development (Book)
Machine Learning
General
-
Introduction to Statistical Learning (Book)
-
Applied Predictive Modeling (Book)
-
Elements of Statistical Learning (Book)
-
Computer Age of Statistical Inference (Book)
-
Statistical Modeling: The Two Cultures (Paper)
-
Deep Learning (Book)
-
Hands-On Machine Learning with Scikit-Learn & TensorFlow (Book | GitHub)
-
Hands-On Machine Learning with R (Book)
-
Google's Machine Learning Crash Course (MOOC)
Unsupervised Modeling
-
ISLR: Ch. 10.3 Clustering Methods (Book chapter)
-
A K-Means Clustering Algorithm (Paper)
-
Generalized Low Rank Models (Paper)
-
Deep Learning Ch. 15 Autoencoders (Book chapter)
-
Hands-On Mach. Learning with Scikit-Learn Ch. 15 Autoencoders (Book chapter | GitHub resource)
-
Sparse autoencoder (Andrew Ng CS294A lecture notes)
A/B Testing
-
Lessons from Running Thoursands of A/B Tests (Online presentation with many references)
-
Online Controlled Experiments at Large Scale (Paper)
-
Peaking at A/B Tests (Paper)
-
Multi-armed Bandit (Online tutorial)
-
A Modern Bayesian Look at the Multi-armed Bandit (Paper behind above online tutorial)
-
Predicting Search Satisfaction Metrics with Interleaved Comparisons (Paper)
-
Evaluating Retrieval Performance using Clickthrough Data (Paper)
Multivariate Adaptive Regression Splines
-
Multivariate Adaptive Regression Splines (Friedman's original paper)
-
APM: Ch. 7.2 Multivariate Adaptive Regression Splines (Book chapter)
-
ESL: Ch. 9.4 Multivariate Adaptive Regression Splines (Book chapter)
-
Notes on the earth package (Paper)
K-Nearest Neighbor
-
k-Nearest neighbour classifiers (Paper)
-
APM: Ch. 7.4 & 13.5 K-Nearest Neighbors (Book chapter)
-
ESL: Ch. 13.3 k-Nearest-Neighbor Classifiers (Book chapter)
Random Forests
-
An Introduction to Recursive Partitioning Using the RPART Routines (Paper)
-
Random Forests - Leo Breiman's original research paper (Paper)
Gradient Boosting Machines
-
How to explain gradient boosting (Online tutorial)
-
Trevor Hastie - Gradient Boosting & Random Forests at H2O World 2014 (YouTube)
-
Trevor Hastie - Data Science of GBM (2013) (slides)
-
Mark Landry - Gradient Boosting Method and Random Forest at H2O World 2015 (YouTube)
-
Peter Prettenhofer - Gradient Boosted Regression Trees in scikit-learn at PyData London 2014 (YouTube)
-
Alexey Natekin1 and Alois Knoll - Gradient boosting machines, a tutorial (Paper)
Deep Learning
-
Deep Learning with R (Book)
-
Deep Learning with Python (Book)
-
Deep Learning Specialization (MOOC)
-
keras.rstudio.com (Online articles & tutorials)
-
blogs.rstudio.com/tensorflow (Online articles & tutorials)
-
Illustrated Guide to Recurrent Neural Networks (Blog)
-
Illustrated Guide on Vanishing Gradients (Blog)
-
Illustrated Guide to LSTMs and GRUs (Blog)
-
Understanding LSTMs (Blog)
-
Rohan & Lenny: Recurrent Neural Networks & LSTMs (Blog)
-
The Unreasonable Effectiveness of Recurrent Neural Networks (Blog)
-
Revisiting Small Batch Training for Deep Neural Networks (Paper)
-
On Loss Functions for Deep Neural Networks in Classification (Paper)
-
Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
-
Efficient BackProp (Paper)
-
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (Paper)
-
Cyclical Learning Rates for Training Neural Networks (Paper)
-
A Disciplined Approach to Neural Network Hyperparameters: Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay (Paper)
-
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Paper)
Ensembles / Model Stacking / Super Learners
-
Ensemble Methods in Machine Learning (Paper)
-
Stacked Regressions (Paper)
-
Super Learner (Paper)
Natural Language Processing / Text Mining
-
Text Mining with R (Book)
-
Probabilistic Topic Models (Paper)
-
The Illustrated Word2vec (Online tutorial)
-
Sebastian Ruder's series on Word Embeddings (Online articles & tutorials)
-
Neural Models for Information Retrieval (Paper)
-
Why do we use word embeddings in NLP? (Blog)
Tuning
-
Hyperparameters and Tuning Strategies for Random Forest (Paper)
-
Tunability: Importance of Hyperparameters of Machine Learning Algorithms (Paper)
-
Machine Learning Benchmarks and Random Forest Regression (Paper)
-
Random Search for Hyperparameter Optimization (Paper)
Feature Engineering
-
Feature Engineering for Machine Learning (Book)
-
Feature Engineering and Selection: A Practical Approach for Predictive Models (Book)
Feature Selection
-
Feature Selection with the Boruta Package (Paper)
-
APM: Ch. 19 An Introduction to Feature Selection (Book chapter)
Machine Learning Interpretability
-
Scott Lundberg's presentation on SHAP
-
H2O.ai Machine Learning Interpretability Resources (GitHub resources)
-
Patrick Hall's Awesome Machine Learning Interpretability Resources (GitHub resources)
-
Interpretable Machine Learning (Book)
-
Visualizing the Feature Importance for Black Box Models (Paper)
-
A Simple and Effective Model-Based Variable Importance Measure (Paper)
-
Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (Paper)
-
pdp: An R Package for Constructing Partial Dependence Plots (Paper)
-
"Why Should I Trust You?": Explaining the Predictions of Any Classifier (Paper)
-
A Unified Approach to Interpreting Model Predictions (Paper)
-
Consistent Individualized Feature Attribution for Tree Ensembles (Paper)
-
On the Art and Science of Machine Learning Explanations (Paper)
-
Explanation in artificial intelligence: Insights from the social sciences (Paper)
-
Please Stop Permuting Features: An Explanation and Alternatives (Paper)
-
A Stratification Approach to Partial Dependence for Codependent Variables (Paper)
-
Explaining Machine Learning Classifiers through Diverse Counterfactual Examples (Paper)
Auto ML
-
A Review of Automatic Selection Methods for Machine Learning Algorithms and Hyperparameter Values (Paper)
-
Learning Multiple Defaults for Machine Learning Algorithms (Paper)
Benchmarking
-
The Design and Analysis of Benchmark Experiments (Paper)
-
Szilard Pafka's ML Benchmarking Research (GitHub resources)
-
Data-driven advice for applying machine learning to bioinformatics problems (Paper)
Resampling Procedures
-
Futility Analysis in the Cross-Validation of Machine Learning Models (Paper)
-
Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out, and Bootstrap (Paper)
Productionalization
-
150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
-
Hidden Technical Debt in Machine Learning Systems (Paper)
-
Deep Learning in Production (Github resources)
Leadership & Strategy
-
Platform Revolution (Book)
-
No Rules Rules: Netflix and the Culture of Reinvention (Book)
-
The Influential Product Manager: How to Lead and Launch Successful Technology Products (Book)
-
Mastering Product Management: A Step-by-Step Guide (Book)
This is a Fakespot Reviews Analysis bot. Fakespot detects fake reviews, fake products and unreliable sellers using AI.
Here is the analysis for the Amazon product reviews:
Name: The Pragmatic Programmer From Journeyman to Master
Company: Unknown
Amazon Product Rating: 4.5
Fakespot Reviews Grade: A
Adjusted Fakespot Rating: 4.5
Analysis Performed at: 12-18-2019
Link to Fakespot Analysis | Check out the Fakespot Chrome Extension!
Fakespot analyzes the reviews authenticity and not the product quality using AI. We look for real reviews that mention product issues such as counterfeits, defects, and bad return policies that fake reviews try to hide from consumers.
We give an A-F letter for trustworthiness of reviews. A = very trustworthy reviews, F = highly untrustworthy reviews. We also provide seller ratings to warn you if the seller can be trusted or not.
Just a head's up, link no longer works for this guide.
I've curated resources for Data science / Machine learning.
Hope it helps.
(I assume you have a decent background in math, these math resources are more like a refresher)
Probability and Statistics:
https://seeing-theory.brown.edu/
Statistics - crash course
Essence of linear algebra:
Essence of linear algebra - 3b1b
Essence of calculus:
Essence of calculus - 3b1b
Neural networks:
Neural networks - 3b1b
Differential equations:
Differential equations - 3b1b
Python for programmers:
https://jakevdp.github.io/WhirlwindTourOfPython/
Numpy, Pandas, Matplotlib, Scikit-learn:
https://jakevdp.github.io/PythonDataScienceHandbook/
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:
Link to amazon(paid but Highly recommended)
Online Courses:
Applied DS in python by University of Michigan (Only If you don't read the book above)
How to Win a Data Science Competition: Learn from Top Kagglers
Deep Learning Specialization - deeplearning.ai
I came across someone's learning list and after checking a few courses, I think he has a pretty good learning path. https://github.com/amitness/learning
But, some of the MOOCs are a bit pricey for me, or they only have annual packages which I might not need. Are there any alternative ways to study those subjects? Maybe books, YouTube, blogs, other MOOCs, etc.?
If you had two years to start a fresh, which data skills would you focus on and why?
Hey guys,
We're updating our awesome-python-for-data-science repository.
Some things we're hoping to add:
-
Best books and repositories to find resources
-
Best open source tools (teaching tools, preferrably free)
-
Best interactive resources --> especially this one, what are you using nowadays?
-
I've heard about Virgilio but feels like TL, DR, we're looking for practice-learning!
-
Hey everyone, I want to learn Data Science from scratch, help me to learn it from best resources so I can start my career...
Hello, im currently a senior at my college as an applied math major. i know tons of programming languages but at the basic level. I've honed my SQL and Excel skills. I know a little pandas but not to the point where i can remember things. any good resources/interactive courses online where i can learn this without having to pay too much money?
Hey Reddit,
I am sharing a curriculum I created and followed that has helped me transition from a non technical job (marketing) to a career where I am now building deep learning training pipelines, prototyping apps and deploying them online.
Resources are based on 2 years of constantly searching for the best online materials whether they're a course, a book, a YouTube channel, or even a newsletter. This is the 3rd edition of this curriculum, updated two weeks ago.
It is intended to be equivalent to a Master degree in Data Science and as an alternative to attending college. It is focused on being practical but without neglecting Math, learning how to code, and also learning how to learn.
I'd love to hear your feedback and to know if anyone else has made a complete career change from a non technical position?
Here's the link:
https://julienbeaulieu.github.io/2019/09/25/comprehensive-project-based-data-science-curriculum/