Since you're coming from a mechanical engineering background with no prior programming experience, the best approach is to start with foundational skills before diving into full-fledged data science. I will List all the free and few paid resources which really good for self learner 1: Learn Python Basics (No coding experience? No problem!) Automate the Boring Stuff with Python – Free book & Udemy course (perfect for beginners). CS50 Introduction to Programming (Harvard - Free on edX) – Great foundation in coding. 2: Data Analysis & Visualization Pandas & NumPy (Kaggle Free Micro-Courses) – Essential for handling data. Matplotlib & Seaborn – Learn to visualize data for insights. Google Data Analytics Certificate (Coursera - free trial) – Structured learning. 3: Learn SQL (Super important for Data Analysts) SQLBolt (Free Interactive Lessons) – Great for beginners. Mode Analytics SQL Tutorial – Real-world SQL queries & exercises. 4: Work on Real-World Projects & Build a Portfolio Kaggle (Free datasets + projects) – Best way to gain hands-on experience. Data Analysis with Python (freeCodeCamp - Free full course on YouTube) 5: Learn Business Intelligence (Optional, but helpful!) Power BI (Microsoft Learn - Free) or Tableau (Free Public Version) – These tools make you stand out in job applications. Sometimes, you might find the content a bit unstructured,I faced the same issue while going through it after few months of preparation. So, I took help from some courses that strengthened both my foundation and practical project work. LogicMojo Data Science Course was quite helpful as well as which covers algorithms with tools and projects very well. To work with real data, explore Kaggle’s free micro courses and practice SQL (SQLBolt, Mode Analytics). Finally, apply what you learn by working on small projects—analyzing trends, visualizing real datasets, or even automating reports. The key is to learn by doing, not just watching tutorials Answer from IndependentTeach9008 on reddit.com
🌐
Reddit
reddit.com › r/learndatascience › what are the best sources to self learn data science from scratch?
r/learndatascience on Reddit: What are the best sources to self learn data science from scratch?
March 17, 2022 -

I want to learn data science and become a data analyst. Preferably from free online sources. Bear in mind I come from a mechanical engineering background. So I am not familiar with software or any programming language. The sources need to start from the most basic level because of it.

Thank you in advance.

Top answer
1 of 15
27
Since you're coming from a mechanical engineering background with no prior programming experience, the best approach is to start with foundational skills before diving into full-fledged data science. I will List all the free and few paid resources which really good for self learner 1: Learn Python Basics (No coding experience? No problem!) Automate the Boring Stuff with Python – Free book & Udemy course (perfect for beginners). CS50 Introduction to Programming (Harvard - Free on edX) – Great foundation in coding. 2: Data Analysis & Visualization Pandas & NumPy (Kaggle Free Micro-Courses) – Essential for handling data. Matplotlib & Seaborn – Learn to visualize data for insights. Google Data Analytics Certificate (Coursera - free trial) – Structured learning. 3: Learn SQL (Super important for Data Analysts) SQLBolt (Free Interactive Lessons) – Great for beginners. Mode Analytics SQL Tutorial – Real-world SQL queries & exercises. 4: Work on Real-World Projects & Build a Portfolio Kaggle (Free datasets + projects) – Best way to gain hands-on experience. Data Analysis with Python (freeCodeCamp - Free full course on YouTube) 5: Learn Business Intelligence (Optional, but helpful!) Power BI (Microsoft Learn - Free) or Tableau (Free Public Version) – These tools make you stand out in job applications. Sometimes, you might find the content a bit unstructured,I faced the same issue while going through it after few months of preparation. So, I took help from some courses that strengthened both my foundation and practical project work. LogicMojo Data Science Course was quite helpful as well as which covers algorithms with tools and projects very well. To work with real data, explore Kaggle’s free micro courses and practice SQL (SQLBolt, Mode Analytics). Finally, apply what you learn by working on small projects—analyzing trends, visualizing real datasets, or even automating reports. The key is to learn by doing, not just watching tutorials
2 of 15
17
You can try LogicMojo Data Science Live Classes, its good.
🌐
Reddit
reddit.com › r/learndatascience › best resources to learn data science
r/learndatascience on Reddit: Best resources to Learn Data Science
February 2, 2025 - 48K subscribers in the learndatascience community. Learn Data Science using Reddit!
🌐
Reddit
reddit.com › r/learnmachinelearning › what’s the best data science learning path for 2025?
r/learnmachinelearning on Reddit: What’s the best Data Science learning path for 2025?
May 10, 2025 -

Hi everyone! I’m a 3rd year student looking to break into data science. I know Python and basic stats but feel overwhelmed by where to go next. Could you share

  1. A structured roadmap (topics, tools, projects)?

  2. Best free/paid resources (MOOCs, books)?

  3. How much SQL/ML is needed for entry-level roles? Thanks in advance!

  4. Should I focus more on stats or coding first?

  5. What projects would make my portfolio strong?

  6. Are there any free/paid resources you recommend?

🌐
Reddit
reddit.com › r/learnmachinelearning › what course would you suggest to learn data science?
r/learnmachinelearning on Reddit: What course would you suggest to learn Data Science?
March 20, 2024 -

I worked as a web programmer in the past (PHP, Javascript, SQL).

Now I am a PhD student in Psychology.

I like Data Science very much and I am trying to learn Excel, R, Python, and Matlab, but to understand how these algorithms work I would also need some Math knowledge.

A few decades ago, I studied Calculus in high school which I have almost completely forgotten, but never Linear Algebra, and I passed a few exams in Statistics.

Since English is not my first language, what (video) course would you suggest to learn Data Science, including Calculus and Linear Algebra, which is not too complex to understand, not too long, and not very expensive?

Thank you very much!

Top answer
1 of 10
18
I would suggest you to take up "Mathematics for Machine Learning and Data Science" Specializatiom offered by deeplearning.ai in Coursera. It teaches the Linear Algebra, Calculus, Stats and Probability behind various ML/DL concepts in very simple and visualising manner. As you also mentioned about understanding the ML algorithms, try searching up "Patrick Loeber" in YouTube and start with the "ML from Scratch" playlist where he writes the ML algorithms using just python and numpy. This would help you to understand the working logic behind each ML concepts.
2 of 10
4
Psychology major here who became a software developer, then an MLE four years ago, and am about to start as a data scientist in a month. Like you, I took Calculus years ago, and like you, it was decades (I’m in my 40s now). Here’s what I’ve done and am doing… ($$) As others have said, I’d start with Mathematics for Machine Learning Coursera course. This is a quick crash course that should get you a bit of practice again. ($) Book: The Art of Statistics, by David Spiegelhalter. This book is exceptional. It’s a good, lay-level rundown of basic statistics, and can serve as a fantastic first pass for later statistical analysis, helping to intuitively understand what certain statistical things do, without delving deeply into the math. (Free online PDF) Textbook: I’m reading and completing the exercises in “Calculus Made Easy”, which is a book from 1910, and has been highly useful in getting back up to speed in calculus. (Free online PDF) Textbook (with online videos as well): Gilbert Strang’s Linear Algebra book. While I’ve tried the calculus book, it’s just a bit much for the direction I’m trying to go, which is highly statistical. All I need is enough math knowledge to get the statistical knowledge, so I’m using Strang for linear algebra, but Calculus Made Easy for calculus. (Free online PDF) Textbook: Finally, All of Statistics (which can also be found in PDF form online). It’s a book I’m going through to get the deeper mathematical concepts of statistical things. I wouldn’t recommend this until you’re back up to speed on your calculus though. (Free on YouTube) For an intuitive understanding of things, you can’t get much better than the 3Blue1Brown videos. Their explanation of neural networks alone is worth watching. For ML, you may not need the All of Statistics book, but it certainly wouldn’t hurt.
🌐
Reddit
reddit.com › r › learndatascience
Learn data science
October 9, 2014 - Additionally, if you want to learn a specific topic, you can explore our extensive collection, including SQL, Python, Java, C++, and more. One of the best parts is that everything can be learned directly in your browser. Another key feature is that each student gets a personalized AI tutor, trained specifically on data science and programming tasks.
🌐
Reddit
reddit.com › r/python › best way to learn data science?
r/Python on Reddit: Best way to learn data science?
December 12, 2014 -

I and a group of my friends want to learn data science and we would like to know the best resources.

I checked out Udacity's Data Science Course, watched the first 20 videos or so and it seemed helpful, working with pandas is great, pandas is useful. Looked at some reviews though and learned that the entire course scratches the surface and doesn't really teach much about real data science.

Checked out Coursera's Intro Data Science course and watched the first 10 videos or so, this was mostly just explaining the field and its subfields. Which is great, I just knew I wouldn't start coding until probably 20 videos in or so. Checked out the reviews for this one too, pretty much the same things were said of this course: scratches the surface, too complex to fast, bad instructor, would be better off learning each subject individually, etc. (one person even said this was the worst course on coursera!)

Reading ThinkStats at the moment and checked the reviews for this one too, they all seem to be good with a couple of bad ones talking about how it is an introduction to pandas rather than an introduction to statistics. Which is fine. ThinkBayes is on our reading list too.

I am wondering what would be the absolute best course of action to learn data science, so that we don't waste any more time.

Would really appreciate some quality advice

Top answer
1 of 5
37
While my official title is not "Data Scientist" (I'm a post doc at a US DOE national lab), about 75% of my day-to-day involves what I would consider data science using numpy,scipy,scikit-image, some pandas, matplotlib, etc... I would suggest finding something you are interested in and doing some "data science" on it. My personal opinion (which is worth what you have paid for it) is that it is best to learn by doing, rather than just reading. The reading and courses will help, but that is only a tiny fraction of it. Things you may be able to do: Analyze stock tick data Find out some information about sports players and their statistics Look at currency market data (there is a lot of historical data for bitcoin readily available for various exchanges) Analyze ebook data (for common words, sentence length, ...) Analyze twitter feeds/trends (similar stuff to ebooks, and you can throw in some info about geospatial location) Look at price data of a product/s as a function of time on something like amazon or newegg (you can learn some simple url scraping with this too) Learn something about your local region with weather data. I'm sure there are more options that others can think of too. Good Luck!
2 of 5
24
Here's what I'd recommend. GETTING STARTED WITH DATA SCIENCE If you're interested in learning data science I'd suggest the following: Tools I’d recommend learning R before Python (although Python is an exceptional tool). Here are a few reasons. Many of the hot tech companies in SF, the Valley, and NYC like Google, Apple, FB, LinkedIn, and Twitter are using R for much of their data science (not all of it, but a lot). R is the most common programming language among data scientists. O’Reilly Media just released their 2014 Data Science Salary Survey . I’ll caveat though, that Python came in at a close second. Which leads me to the third reason: R has 2 packages that dramatically streamline the DS workflow: dplyr for data manipulation ggplot2 for data visualization Learning these has several benefits: they streamline your workflow. They speed up your learning process, since they are very easy to use. And perhaps most importantly, they really teach you how to think about analyzing data. GGplot2 has a deep underlying structure to the syntax, based on the Grammar of Graphics theoretical framework. I won’t go into that too much, but suffice it to say, when you learn the ggplot2 syntax, you’re actually learning how to think about data visualization in a very deep way. You’ll eventually understand how to create complex visualizations without much effort. Skill Areas My recommendations are: Learn basic data visualizations first. Start with the essential plots: the scatter plot the bar chart the line chart (But, again I recommend learning these in R’s ggplot2.) The reason I recommend these is The are, hands down, the most common plots. For entry level jobs, you’ll use these every day. They are “foundational” in the sense that when you learn about the underlying structure of these plots, it begins to open up the world of complex data visualizations. As with any discipline, you need to learn the foundations first; this will dramatically speed your progress in the intermediate to advanced stages. You’ll need these plots as “data exploration” tools. Whether you’re finding insights for your business partners or investigating the results of a sophisticated ML algorithm, you’ll likely be exploring your data visually. These plots are your best “data communication” tools. As noted elsewhere in this thread, C-level execs need you to translate your data-driven insights into simple language that can be understood in a 1-hour meeting. Communicating visually with the basic plots will be your best method for communicating to a non-technical audience. Communicating to non-technical audiences is a critical (and rare) auxiliary skill, so if you can learn to do this you will be very highly valued by management. I usually suggest learning these with dummy data (for simplicity) but if you have a simple .csv file, that should work to. Learn data management second (AKA, data wrangling, data munging) After you learn data visualization, I suggest that you “back into” data management. For this, you should find a dataset and learn to reshape it. The core data management skills: subsetting (filtering out rows) selecting columns sorting adding variables aggregating joining You can start learning these here . Again, I recommend learning these in R’s dplyr because dplyr makes these tasks very straight forward. It also teaches you how to think about data wrangling in terms of workflow: the “chaining operator” in dplyr helps you wire these commands together in a way that really matches the analytics workflow. dplyr makes it seamless. Learn machine learning last. ML is sort of like the “data science 301” course vs. the 102 and 103 levels of the data-vis and data manipulation stuff I outlined above. Here, I’ll just give book recos: An Introduction to Statistical Learning . This is a highly regarded introduction Machine Learning with R I’ve also heard that there is some foundational ML information in R in Action , though I haven’t read it myself. After you get these foundations, then you can move on to specialize in a particular area. OTHER RESOURCES: Data Visualization Nathan Yao of Flowing Data is great. His blog shows excellent data visualization examples. Also, I highly recommend his books. In particular, Data Points . Data Points will help you learn how to think about visualization. The book ggplot2 by Hadley Wickham. This is a great resource (though a little outdated, as Hadley has updated the ggplot package). I also really like Randal Olson’s work (AKA, /u/rhiever ). He creates some great data visualizations that can serve as inspiration as you start learning. TL;DR I'd recommend learning R for data science before Python. Learn data visualization first (with R's ggplot2), using simple data or dummy data. Then find a more complicated dataset. Learn data manipulation second (with R's dplyr), and practice data manipulation on your more complex data. Learn machine learning last.
Find elsewhere
🌐
Reddit
reddit.com › r/datascience › resources for data science & analysis: a curated list of roadmaps, tutorials, python libraries, sql, ml/ai, data visualization, statistics, cheatsheets
r/datascience on Reddit: Resources for Data Science & Analysis: A curated list of roadmaps, tutorials, Python libraries, SQL, ML/AI, data visualization, statistics, cheatsheets
October 16, 2025 -

Hello everyone!

Staying on top of the constantly growing skill requirements in Data Science is quite a challenge. To manage my own learning and growth, I've been curating a list of useful resources and tools that cover the full spectrum of the field — from data analysis and engineering to deep learning and AI.

I'd love to get your professional opinion. Could you please take a look? Have I missed anything crucial? What else would you recommend adding or focusing on?

To give you an immediate sense of the list's scope and structure, I've attached screenshots of the table of contents below.

The full version with all the active links and additional resources is available on GitHub. You can find the link at the end of the post.

I'd be happy if this list is useful to others.

You can view the full list here View on GitHub

Thanks for your time! Your advice is invaluable!

🌐
Reddit
reddit.com › r/datascience › list of data science resources
r/datascience on Reddit: List of Data Science Resources
January 22, 2016 -

Hey everyone, over the past couple of months I have compiled a list of resources by reading this subreddit and asking around. Hopefully, this list helps somewhat even though there have been a ton of similar posts. If anyone has anything to add, I would really appreciate it!!

Math

1)Calc I-III

Resources: Look up Professor Leonard's videos on Youtube ( he is the best resource for calc I and II IMO), Paul's Notes, and PatrickJMT's youtube account. Paul's Notes has a bunch of practice sets with answers that I can try to find if anyone is interested.

4)Linear Algebra ( Free textbooks: Linear Algebra from UC Davis, Linear Algebra from Saint Michael's College, and Linear Algebra Done Wrong)

Statistics Foundation:

  1. Intro to Statistics

    a)(https://www.openstaxcollege.org/textbooks/introductory-statistics)

    b) Stanford's online Statistical Learning with R (started this week and is free)

  2. Probability (preferably using R)

Books (http://ipsur.org/index.html & http://publicifsv.sund.ku.dk/~pd/ISwR.html)

3) Bonus: Econometrics

R

  1. http://www.ats.ucla.edu/stat/r/

  2. http://tryr.codeschool.com/

  3. http://swirlstats.com/

4)https://www.datacamp.com/

Python Intro:

1)Automate the Boring Stuff Book/ Udemy course (Make sure to find a coupon for the class, the book is free online)

2)Programming For Everyone (Michigan University)

3) Rice University- An Introduction to Interactive Programming in Python (there is a part one and two)

4)Introduction to Computer Science and Programming Using Python(MIT course on edx) - maybe a little more advanced.

Data Analysis

1)Data Science Class at Harvard (CS 109/ Stat 221)

2)Introduction to Computational Thinking and Data Science(MIT course on edx) follow-up to the other MIT course for python

3) Data Analysis and Statistical Inference (coursera)

4) Codeacademy for Data Scientists

https://www.dataquest.io/

Machine Learning

1)Pretty much everyone recommends starting with Andrew Ng's class

2)http://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/

3) Neural Networks for Machine Learning (Coursera)

SQL

  1. SAMS Teach Yourself SQL in 10 Minutes

  2. Khan Academy course for introductory SQL

  3. W3Schools

Extra Resources:

  1. Join meet-up groups. For example, I just joined a group called DataPhilly

  2. The book Data Science from Scratch

  3. Finding data: data.gov, r/datasets, R10 - Yahoo News Feed dataset, version 1.0 (1.5TB), http://archive.ics.uci.edu/ml/datasets.html

  4. 100 free data science books:

http://www.learndatasci.com/free-books/

EDIT: Added more resources that people suggested.

🌐
Reddit
reddit.com › r/rprogramming › data science learning resources
r/rprogramming on Reddit: Data Science Learning Resources
December 11, 2020 -

It was requested in the past to but together some resources for new R users. Below is a collection of everything I could find to help others out.

Data Science Learning Resources

Sections

  • Programming

  • Machine Learning

  • Leadership & Strategy

Programming

General

  • The Pragmatic Programmer (Book)

  • Clean Code (Book)

  • Architecture Playbook (Online guide)

Python

  • A Whirlwind Tour of Python (Book)

  • Python Data Science Handbook

  • Python Tricks (Book)

  • Learning Python (Book)

  • Effective Python (Book)

R

  • R for Data Science (Book)

  • Advanced R (Book)

  • R Markdown: The Definitive Guide (Book)

  • bookdown: Authoring Books and Technical Documents with R Markdown (Book)

  • Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving (Book)

  • Automated Data Collection with R (Book)

  • Introduction to Data Science (Book)

Spark

  • Spark: The Definitive Guide: Big Data Processing Made Simple (Book)

  • Learning Spark: Lightning-Fast Big Data Analysis (Book)

  • Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling (Book)

Command Line

  • The Missing Semester of Your CS Education (Online course)

  • Learning the bash Shell (Book)

  • The Art of the Command Line (GitHub resources)

  • explainshell.com (Online help)

Containers

  • Docker tips & tricks or just useful commands (Online article)

  • Rocker: R configurations for Docker (GitHub resources)

  • Docker and Python: making them play nicely and securely for Data Science and ML (PyCon Talk)

Functional Programming

  • An Introduction to the Basic Principles of Functional Programming (Online article)

  • R for Data Science, Ch. 21 (Book)

  • Advanced R, Ch. 9 (Book)

  • Jenny Bryan's purrr tutorials (Online tutorial)

  • Foundations of Functional Programming with purrr (DataCamp)

  • Intermediate Functional Programming with purrr (DataCamp)

Version Control

  • Excuse me, do you have a moment to talk about version control? (Paper)

  • Happy Git and GitHub for the useR (Book)

  • Learn Git (Online tutorial)

  • Git Commit Message Style Guide (Online guide)

Code Packaging

  • Python Packaging Authority

  • Python Packaging User Guide

  • R Packages

Style Guide, Readability, Best Practices

  • The Art of Readable Code (Book)

  • The Tidyverse Style Guide (Online book)

  • PEP 8 -- Style Guide for Python Code (Online guide)

  • Guidelines for code reviews (README)

  • Code Review Best Practices (Blog post)

Testing

  • Testing R Code (Book)

  • Python Testing with pytest (Book)

  • Multiply your Testing Effectiveness with Parameterized Testing (PyCon Talk)

  • Test-Driven Development (Book)

Machine Learning

General

  • Introduction to Statistical Learning (Book)

  • Applied Predictive Modeling (Book)

  • Elements of Statistical Learning (Book)

  • Computer Age of Statistical Inference (Book)

  • Statistical Modeling: The Two Cultures (Paper)

  • Deep Learning (Book)

  • Hands-On Machine Learning with Scikit-Learn & TensorFlow (Book | GitHub)

  • Hands-On Machine Learning with R (Book)

  • Google's Machine Learning Crash Course (MOOC)

Unsupervised Modeling

  • ISLR: Ch. 10.3 Clustering Methods (Book chapter)

  • A K-Means Clustering Algorithm (Paper)

  • Generalized Low Rank Models (Paper)

  • Deep Learning Ch. 15 Autoencoders (Book chapter)

  • Hands-On Mach. Learning with Scikit-Learn Ch. 15 Autoencoders (Book chapter | GitHub resource)

  • Sparse autoencoder (Andrew Ng CS294A lecture notes)

A/B Testing

  • Lessons from Running Thoursands of A/B Tests (Online presentation with many references)

  • Online Controlled Experiments at Large Scale (Paper)

  • Peaking at A/B Tests (Paper)

  • Multi-armed Bandit (Online tutorial)

  • A Modern Bayesian Look at the Multi-armed Bandit (Paper behind above online tutorial)

  • Predicting Search Satisfaction Metrics with Interleaved Comparisons (Paper)

  • Evaluating Retrieval Performance using Clickthrough Data (Paper)

Multivariate Adaptive Regression Splines

  • Multivariate Adaptive Regression Splines (Friedman's original paper)

  • APM: Ch. 7.2 Multivariate Adaptive Regression Splines (Book chapter)

  • ESL: Ch. 9.4 Multivariate Adaptive Regression Splines (Book chapter)

  • Notes on the earth package (Paper)

K-Nearest Neighbor

  • k-Nearest neighbour classifiers (Paper)

  • APM: Ch. 7.4 & 13.5 K-Nearest Neighbors (Book chapter)

  • ESL: Ch. 13.3 k-Nearest-Neighbor Classifiers (Book chapter)

Random Forests

  • An Introduction to Recursive Partitioning Using the RPART Routines (Paper)

  • Random Forests - Leo Breiman's original research paper (Paper)

Gradient Boosting Machines

  • How to explain gradient boosting (Online tutorial)

  • Trevor Hastie - Gradient Boosting & Random Forests at H2O World 2014 (YouTube)

  • Trevor Hastie - Data Science of GBM (2013) (slides)

  • Mark Landry - Gradient Boosting Method and Random Forest at H2O World 2015 (YouTube)

  • Peter Prettenhofer - Gradient Boosted Regression Trees in scikit-learn at PyData London 2014 (YouTube)

  • Alexey Natekin1 and Alois Knoll - Gradient boosting machines, a tutorial (Paper)

Deep Learning

  • Deep Learning with R (Book)

  • Deep Learning with Python (Book)

  • Deep Learning Specialization (MOOC)

  • keras.rstudio.com (Online articles & tutorials)

  • blogs.rstudio.com/tensorflow (Online articles & tutorials)

  • Illustrated Guide to Recurrent Neural Networks (Blog)

  • Illustrated Guide on Vanishing Gradients (Blog)

  • Illustrated Guide to LSTMs and GRUs (Blog)

  • Understanding LSTMs (Blog)

  • Rohan & Lenny: Recurrent Neural Networks & LSTMs (Blog)

  • The Unreasonable Effectiveness of Recurrent Neural Networks (Blog)

  • Revisiting Small Batch Training for Deep Neural Networks (Paper)

  • On Loss Functions for Deep Neural Networks in Classification (Paper)

  • Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)

  • Efficient BackProp (Paper)

  • Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (Paper)

  • Cyclical Learning Rates for Training Neural Networks (Paper)

  • A Disciplined Approach to Neural Network Hyperparameters: Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay (Paper)

  • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Paper)

Ensembles / Model Stacking / Super Learners

  • Ensemble Methods in Machine Learning (Paper)

  • Stacked Regressions (Paper)

  • Super Learner (Paper)

Natural Language Processing / Text Mining

  • Text Mining with R (Book)

  • Probabilistic Topic Models (Paper)

  • The Illustrated Word2vec (Online tutorial)

  • Sebastian Ruder's series on Word Embeddings (Online articles & tutorials)

  • Neural Models for Information Retrieval (Paper)

  • Why do we use word embeddings in NLP? (Blog)

Tuning

  • Hyperparameters and Tuning Strategies for Random Forest (Paper)

  • Tunability: Importance of Hyperparameters of Machine Learning Algorithms (Paper)

  • Machine Learning Benchmarks and Random Forest Regression (Paper)

  • Random Search for Hyperparameter Optimization (Paper)

Feature Engineering

  • Feature Engineering for Machine Learning (Book)

  • Feature Engineering and Selection: A Practical Approach for Predictive Models (Book)

Feature Selection

  • Feature Selection with the Boruta Package (Paper)

  • APM: Ch. 19 An Introduction to Feature Selection (Book chapter)

Machine Learning Interpretability

  • Scott Lundberg's presentation on SHAP

  • H2O.ai Machine Learning Interpretability Resources (GitHub resources)

  • Patrick Hall's Awesome Machine Learning Interpretability Resources (GitHub resources)

  • Interpretable Machine Learning (Book)

  • Visualizing the Feature Importance for Black Box Models (Paper)

  • A Simple and Effective Model-Based Variable Importance Measure (Paper)

  • Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (Paper)

  • pdp: An R Package for Constructing Partial Dependence Plots (Paper)

  • "Why Should I Trust You?": Explaining the Predictions of Any Classifier (Paper)

  • A Unified Approach to Interpreting Model Predictions (Paper)

  • Consistent Individualized Feature Attribution for Tree Ensembles (Paper)

  • On the Art and Science of Machine Learning Explanations (Paper)

  • Explanation in artificial intelligence: Insights from the social sciences (Paper)

  • Please Stop Permuting Features: An Explanation and Alternatives (Paper)

  • A Stratification Approach to Partial Dependence for Codependent Variables (Paper)

  • Explaining Machine Learning Classifiers through Diverse Counterfactual Examples (Paper)

Auto ML

  • A Review of Automatic Selection Methods for Machine Learning Algorithms and Hyperparameter Values (Paper)

  • Learning Multiple Defaults for Machine Learning Algorithms (Paper)

Benchmarking

  • The Design and Analysis of Benchmark Experiments (Paper)

  • Szilard Pafka's ML Benchmarking Research (GitHub resources)

  • Data-driven advice for applying machine learning to bioinformatics problems (Paper)

Resampling Procedures

  • Futility Analysis in the Cross-Validation of Machine Learning Models (Paper)

  • Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out, and Bootstrap (Paper)

Productionalization

  • 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)

  • Hidden Technical Debt in Machine Learning Systems (Paper)

  • Deep Learning in Production (Github resources)

Leadership & Strategy

  • Platform Revolution (Book)

  • No Rules Rules: Netflix and the Culture of Reinvention (Book)

  • The Influential Product Manager: How to Lead and Launch Successful Technology Products (Book)

  • Mastering Product Management: A Step-by-Step Guide (Book)

🌐
Reddit
reddit.com › r/learnmachinelearning › data science roadmap with resources.
r/learnmachinelearning on Reddit: Data Science roadmap with resources.
April 22, 2021 -

I've curated resources for Data science / Machine learning.

Hope it helps.

(I assume you have a decent background in math, these math resources are more like a refresher)

Probability and Statistics:

  • https://seeing-theory.brown.edu/

  • Statistics - crash course

Essence of linear algebra:

  • Essence of linear algebra - 3b1b

Essence of calculus:

  • Essence of calculus - 3b1b

Neural networks:

  • Neural networks - 3b1b

Differential equations:

  • Differential equations - 3b1b

Python for programmers:

  • https://jakevdp.github.io/WhirlwindTourOfPython/

Numpy, Pandas, Matplotlib, Scikit-learn:

  • https://jakevdp.github.io/PythonDataScienceHandbook/

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:

  • Link to amazon(paid but Highly recommended)

Online Courses:

  • Applied DS in python by University of Michigan (Only If you don't read the book above)

  • How to Win a Data Science Competition: Learn from Top Kagglers

  • Deep Learning Specialization - deeplearning.ai

🌐
Reddit
reddit.com › r/learndatascience › affordable way to learn data science?
r/learndatascience on Reddit: Affordable way to learn data science?
October 26, 2023 -

I came across someone's learning list and after checking a few courses, I think he has a pretty good learning path. https://github.com/amitness/learning

But, some of the MOOCs are a bit pricey for me, or they only have annual packages which I might not need. Are there any alternative ways to study those subjects? Maybe books, YouTube, blogs, other MOOCs, etc.?

🌐
Reddit
reddit.com › r/machinelearning › [d] best tools to learn data science nowadays?
r/MachineLearning on Reddit: [D] Best tools to learn data science nowadays?
July 26, 2023 -

Hey guys,

We're updating our awesome-python-for-data-science repository.

Some things we're hoping to add:

  • Best books and repositories to find resources

  • Best open source tools (teaching tools, preferrably free)

  • Best interactive resources --> especially this one, what are you using nowadays?

    • I've heard about Virgilio but feels like TL, DR, we're looking for practice-learning!

🌐
Reddit
reddit.com › r/learnmachinelearning › hey learners! i have curated some of the best data science resources and created a curriculum out of them. if you're transitioning from a non technical background, this is for you.
r/learnmachinelearning on Reddit: Hey learners! I have curated some of the best data science resources and created a curriculum out of them. If you're transitioning from a non technical background, this is for you.
November 9, 2020 -

Hey Reddit,

I am sharing a curriculum I created and followed that has helped me transition from a non technical job (marketing) to a career where I am now building deep learning training pipelines, prototyping apps and deploying them online.

Resources are based on 2 years of constantly searching for the best online materials whether they're a course, a book, a YouTube channel, or even a newsletter. This is the 3rd edition of this curriculum, updated two weeks ago.

It is intended to be equivalent to a Master degree in Data Science and as an alternative to attending college. It is focused on being practical but without neglecting Math, learning how to code, and also learning how to learn.

I'd love to hear your feedback and to know if anyone else has made a complete career change from a non technical position?

Here's the link:

https://julienbeaulieu.github.io/2019/09/25/comprehensive-project-based-data-science-curriculum/

🌐
Reddit
reddit.com › r/learndatascience › learning data science - where to start
Learning Data Science - where to start : r/learndatascience
October 18, 2024 - There are free resources enough ... project portion either from kaggle or some other opensource github resources. Do feel free to contact me for mentorship on this. ... If you will learn data science again, where will you start?...