Videos
I've just completed FreeCodeCamp's Machine Learning with Python certificate, and I think there are a bunch of unnecessary roadblocks that might prevent people from being able to do the same. This guide isn't meant to hand-hold anyone through the whole thing or give away all the answers. However, I want to provide some pointers/direction to anyone looking for it because I don't think the lessons do a good job.
If you're planning on completing this course, I recommend bookmarking this post. It will save you some headaches.
The course contains 36 videos lessons and 5 projects. The video lessons each end with a simple quiz question that's really just there to ensure you are paying attention. I don't think anyone needs help with any of those quiz questions. This post is here to give information on the 5 projects.
Project #1: Rock Paper Scissors.
This might look easy at first glance, but it might actually be the most difficult project of the five, and the video lessons offer zero help in solving this problem. The challenge: using Python, you need to create an algorithm (or a rudimentary AI) that beat four different opponents at least 60% of the time.
It's worth testing out and implementing different strategies, and I think after some work you shouldn't have trouble beating 2 or 3 of the four opponents. Your biggest struggle will come with the opponent named Abbey.
Abbey uses something called a Markov Chain, and the only way I figured out how to beat it is to implement my own Markov Chain (Abbey looks at 2-move sets, and I could only seem to beat her if I looked at 3-move sets). Here is a blog which explains Markov Chains that I highly recommend reading. You can also look at Abbey's code in order to figure out how to implement it.
You might need to implement some other strategies as well, but implementing a Markov Chain should be your main focus point (And if there are other ways to beat Abbey, let me know!). Also, I highly recommend you don't introduce any sort of randomness into your algorithm, it will probably just make your results inconsistent.
Project #2: Cat and Dog Image Classifier
Luckily, the video lessons do mostly cover how to complete this project, though the link to the instructor's Google Collab isn't given (which is annoying). Luckily, this project is very close to one from the TensorFlow docs. Read through and follow their tutorial here and that'll help with 90% of the project
Project #3: Book Recommendation Engine Using KNN
The instructions for this project are definitely confusing and unclear. First the instructions say that the dataset includes ratings on a "scale of 1-10" but you might notice a lot of 0s in the data. I thought that maybe I had to clean all the 0 ratings out - that's a mistake. Leave the 0s in there!
Second the instructions say "remove from the dataset users with less than 200 ratings and books with less than 100 ratings." But obviously it matters how you go about this. For example, if a book has 120 ratings before you remove the users with < 200, and 90 ratings after you remove those users - are we supposed to keep that book in the data or filter it out?
The only solution that worked for me is to do the filtering immediately on the df_ratings data like so:
userCounts = df_ratings['user'].value_counts()isbnCounts = df_ratings['isbn'].value_counts()#remove all users with less than 200 reviewsdf_ratings = df_ratings[~df_ratings['user'].isin(userCounts[userCounts < 200].index)]#remove all books with less than 100 ratingsdf_ratings = df_ratings[~df_ratings['isbn'].isin(isbnCounts[isbnCounts < 100].index)]
The other weird issue is with the way FCC wants you to output your data. Instead of just appending the results to a list, you have to reverse what the model puts out. Not sure why they included this step (if that sounds confusing, you'll see what I mean when you get to the end of the project).
For any other issues: Read through this blog post, and it will help you with the project quite a bit.
Project #4 - Linear Regression Health Costs Calculator
I think this one is pretty straight forward, you just need to remember to use the Sequential model from Karas. You'll have to play with the number of Dense layers you want, and when you train the model you'll have to play with the numbers of epochs and the validation split. I'm not even sure what the best numbers for those are, but just fiddle with them for a while and you can figure it out!
Project #5 - Neural Network SMS Text Classifier
This one was broken out of the box for me. In the first block, there is a line of boilerplate code !pip install tf-nightly which didn't work. I changed it to just !pip install tensorflow and everything worked fine after that.
Besides that, I think this project is a good challenge. Here is a tutorial the sort of helped me. But it's a little different (it classifies whether movie reviews are positive or negative), so you have to make sure you carefully read through it in order to make sense with this FCC project.
Honestly, it took me a while just to figure out how to get data in tsv format into the script. It's actually pretty easy, so I'll save you the trouble:
train_data = pd.read_csv(test_file_path, sep="\t", names=["class", "text"])
test_data = pd.read_csv(test_file_path, sep="\t", names=["class", "text"])
Then just remember to convert the categorical data ("class") to numeric data. And again, you'll use a Sequential model from Karas.
This model can be slow to train. On the menu click "runtime" then select "change runtime type" and you can add a GPU accelerator. That should speed things up.
That's all I've got to help. Any questions, feel free to ask.
Hey, just wanted to ask if anyone has completed the "Machine Learning with Python" course on the FreeCodeCamp website and if it is worth it or not. The certification is what I'm most interested in, but I'm not sure if I should be devoting my time to this or building a project of my own, as they have you build a few of your own projects. Would the certification make me stand out on linked in or on my resume? If anyone has a strong opinion, let me know!