Thoughts On The Data Science And Machine Learning Courses I Have Taken So Far

In January 2019, after a long career in the wireless communications industry I decided to leave my job and to focus on transitioning into the fields of Data Science and Machine Learning (I discuss what lead me to this decision in my previous post). However, I did not start right away studying Data Science and Machine Learning. I felt burned out – the last proper vacation I have taken was in August 2012. So, I decided to take a six-months break to get my head straight, come up with a plan, and then start executing it.

It wasn’t until September that I started working on Data Science and Machine Learning subjects. Even then, I spent significant amount of time researching the future of Data Science and Machine Learning, the different areas in these fields, the roles of Data Scientist and Machine Learning Engineer, opportunities and so on.

  • First decision made: Picking the programming language best suited for this field. After a relatively short research, Python was my choice. I did not have prior Python coding experience, but had more than 15 years of coding experience with Matlab and almost everybody on the web agreed that the transition from Matlab to Python is very easy (it is, but I still find myself writing Python code “Matlab style”). Most important factor, though, was the predominant opinion that Python is the most widely used coding language in Data Science and Machine Learning with great libraries and devoted community constantly developing open source code.
  • First lesson learned: YouTube tutorials on Data Science/Machine Learning with Python did not work for me. I spent about two weeks going over random tutorials on the subject and had hard time following the presentations and building knowledge in any meaningful way. It is possible that I picked bad tutorials, but it is my opinion that for a person without some prior knowledge in coding in Python and somewhat good understanding of the different aspects of Data Science and Machine Learning, the YouTube way of learning is ineffective.

After my experience with the YouTube tutorials and other videos, I decided that I need to find a good course or a series of courses on the subject and learn from there. I wanted a course that won’t break the bank, but also provides good level of knowledge. After short search on the web and reading various reviews, Udemy appeared to be one of the most popular choices. The same day I signed up with Udemy and as it happened they had big price reduction for their courses – course price reduced to $10.99 from the original $199 price. These “sales” are very common for Udemy, so my advice is if you want to sign for one or several of their courses wait for such sale – it will happen sooner rather than later. I took my fist course, I liked it and since then I took several more. Below I will share my opinion about the Udemy courses I have taken so far, what I liked and did not like.

Udemy courses I have taken so far in chronological order:

  • Python for Data Science and Machine Learning Bootcamp by Jose Portilla
  • Machine Learning, Data Science and Deep Learning with Python by Frank Kane
  • Machine Learning A-Z: Hands-on Python and R in Data Science by Kirill Eremenko
  • The Complete SQL Bootcamp by Jose Portilla
  • PyTorch for Deep Learning with Python Bootcamp by Jose Portilla

I also bought the course “Spark and Python for Big Data with PySpark” by Jose Portilla. However, I had some serious issues with making things work in the very beginning of the course (the course is quite outdated). After struggling with it for a while, I decided that it would take me too much time to go through the course and opted for his SQL course, instead.

I will focus mostly on the first three courses which are very close in content and purpose and, thus, can be adequately compared. The SQL course is really quite basic introductory course to SQL. In my opinion, the name “Complete SQL Bootcamp” is seriously misleading. I will share my opinion on some parts of the PyTorch course in a later post.

As one can gather from the courses titles the fist three courses have significant overlap and similarities.

  • Common topics like Regression, Classification, Clustering, Natural Language Processing are discussed in all of them to a greater or lesser extent.
  • Courses are structured in a similar fashion providing first a brief introduction and basic explanation of a particular algorithm, followed up by an example project dealing with typical data sets, modeling using the ML algorithm and results from the model.
  • All three courses in my opinion are of introductory to low-intermediate level.
  • All three lecturers have many years of experience in Data Science and Machine Learning.
  • All courses are in the form of video lessons with supporting material in the form of Python notebooks and data sets for all projects discussed in the course. The only exception is the SQL course which has video lectures only without supporting notebooks.

My thoughts on each course:

  • Python for Data Science and Machine Learning Bootcamp by Jose Portilla: The course is well structured with logical progression on the different topics. In my opinion, it has the best balance in terms of algorithms introduction and explanation, data exploration and visualization, modeling, and presentation of the results. However, there are few spots in the algorithms explanations sections where the explanations are somewhat vague. These spots are not so many as to knock down significantly my rating of the course. As my first course I learned a lot from it and would recommend it to all beginners in this field.
  • Machine Learning, Data Science and Deep Learning with Python by Frank Kane: This is the course I liked the least. The course lacked logical structure, good algorithms explanations, and in some cases models were created by using outdated home-written code instead of using the well-established and powerful Scikit Learn models. One of the things that turned me off from this course was the lecturer’s poor explanation of the algorithms. The explanations were vague and incomplete, and almost always would finish with “That’s all there is to this algorithm. Don’t be confused or intimidated by the name. It’s just a bunch of big words, but in reality it is that simple.” I am quoting loosely here, but the above captures well the essence of the lecturer’s statement. I view his statements as great disservice to the brilliant mathematicians and scientists who have put tremendous amount of work in developing these algorithms. I have to admit that I did not finish his course – I would browse through a certain section and only go over things I thought interesting or contributed in some way to my learning. I would not have taken this course had I found adequate reviews about the course before I bought it. I will touch briefly on the courses reviews later.
  • Machine Learning A-Z: Hands-on Python and R in Data Science by Kirill Eremenko: I cannot comment on the R sections of this course because I followed the Python related material only. As far as the sections I studied, I think this course is perhaps a level above the course by Jose Portilla. This assessment is based on the incredibly good algorithms explanations by the lecturer and the streamlined projects process. The algorithms explanation sections are called “Intuition.” Despite not going deep into the math behind, they provide a very clear explanation about the particular algorithm, the way it works and its strengths and drawbacks. It is evident that the lecturer has very good understanding of the algorithms discussed. As the saying goes: you know something well, when you can explain it to a layman. Because of this, I think this is a must take course for beginners and even intermediate-level students. What is lacking in my opinion is the very little EDA, and data preprocessing and visualization shared in the projects sections. The focus is mostly on the modeling and the results parts. Regardless of this drawback, I think very highly of this course and I am going often back to it to refresh my memory on the fundamentals of different algorithms. I believe that taking this course together with the course by Jose Portilla will provide you with very good foundation in Data Science and Machine Learning.

Some other notes and comments:

  • Courses ratings: All three courses have a rating on Udemy website of 4.5/4.6 with 5 being the maximum rating. It is my belief that when considering ratings we need to keep in mind the audience these courses are intended for. In my opinion, these are courses for beginners. Despite all three courses targeting the same audience, they deliver different quality. This makes me wonder how it is possible for all three courses to have essentially the same rating. Perhaps the bar for such online courses is set too low which shouldn’t be the case. Of the three courses Frank Kane’s course is the weakest. Jose Portilla’s course is about two grades above. And as I mentioned earlier, Kirill Eremenko’s course is a level above Jose’s. Rating scale of 1 to 5 does not allow for clear differentiation, so I would rather use 1 to 10 rating. In this case, my grade for Frank would be 6, Jose – 8, and Kirill – 9 (there are no perfect courses, so no 10 assigned).
  • Courses relevancy: In terms of how up-to-date these courses are I would have to say that they are all outdated to some degree. From my recollections about the material presented, these courses have been developed about two years ago. As far as well established algorithms as Linear, Logistic, Random Forrest, or other Regressions, SVM, KNN Classification, Naive Bayes Classification, K-Means Clustering, etc. this is not an issue. However, when it comes to more modern algorithms such as XGBoost, don’t expect much there. The same goes for Deep Learning – Natural Language Processing algorithms in particular – where something from two years ago is ancient history today. Because of this, my strong recommendation is to not limit yourself to these courses thinking that you have mastered the best and latest in Data Science and Machine Learning. You have to use all available resources to do research on the latest and most powerful/efficient algorithms – things as Transfer Learning, Transformers, etc. – and learn and master these as well.
  • Course projects code quality: I would like to mention this as a warning to beginners. Do not think that every line of code is the gold standard in coding. I have encountered some examples of code in different projects from these courses which is far from best practices coding. Even in the beginning when I started my studies and I did not have much experience in Python, I could still see that there were non-optimal lines of code. Later on as I finished the courses and expanded my studies learning from outside sources beyond the course material, I could easily identify the bad lines of code in my early projects where I had followed exactly the project’s notebook. I am still far from writing “world-class” Python code, but I have improved greatly from my early days. The moral of the story here is that one has to be critical of what they see and not to take it as granted that the code presented is top level. It would be difficult for complete beginners, but with time, if you are critical and work on improving, you will grow beyond the level of coding presented in some sections of these courses.
  • Time it takes to finish a course: Depending on your level and the time you can spend daily or weekly for studying, it could take anywhere from two to six weeks. I found that for my first courses (the first three) I spent on average about a month on each. I would finish the material in less time, but because I was still in the beginning of my studies, I would go back over most of the material again. This time, I would write the code by myself after the video lecture with very little consulting from the project’s notebook. For some of the projects I have 2 or 3 different notebooks with variations and improvements in the data preprocessing and visualization; adding to the early projects cross-validation and hyper parameter tuning to the models after I have learned about these techniques which are covered late in the courses. All of this takes time.

In conclusion:

I would like to finish with the statement that all of the above is my personal opinion and experience which is strongly influenced by my background and prior experience. Clearly, different people (as I have witnessed from reading many of the courses reviews) have different opinions and different experience. My advice is to use critical thinking, do not settle for a single source (e.g. taking a single course) and think that’s all there is to learn. Take two or more courses, compare and take the best at your experience level from each course.

Finally, do not limit your learning to the course work alone. I wish I had discovered the usefulness of Quora’s forums on these subjects much earlier. These helped me to find many Machine Learning and Data Science sites on the web where one can find useful information which will help you grow beyond and above the course level. Some sites I have found useful, some not so much. Currently, I have about 40 bookmarked Data Science/Machine Learning sites and that’s after I have deleted significant amount of my early bookmarks after I have outgrown them. It is up to everybody to apply their own filters and decide what works best for them.

One of the things I am extremely happy about is that while studying and working on various projects I discovered excitement and eagerness to learn and work again. This can be attested to by the many times I have started at 9:00 in the morning and continued until the early morning hours of the next day in search of a new data set for my next project, trying to see if one algorithm provides better results with a particular data set than other, or if the model can reach better accuracy with additional tuning, and so on.

To everybody following this path good luck!

Related Post