A Mathematical Foundation for Data Science

Alvaro Henriquez
2 min readDec 1, 2020

Get the math you need.

In this post I’d like to share some math resources to help those pursuing a Data Science education on their own. One can take a full university course, in which case these resources may or may not be useful. Or one may enroll in a course, whether in person or online. These might or might not include all of the math required to fully understand the subject. Or perhaps you might be trying to learn it on your own by reading online resource, books, or some other way.

There is no doubt that one can learn things like coding a model without understanding the math behind the algorithms. There are plenty of code examples out there that one can just copy and have a working model. An perhaps you can learn to apply this models to solve specific problems.

But what happens when the problem requires something that you just can’t get out of the box. What if you need a custom loss function, or optimization function? And how about parameter tuning? Or dimensionality reduction? Shouldn’t you understand how these work in order to get the best out of your model.

You can by a performance car and drive it and you should be just fine. But if you want to get the best out of it you should now something about how the things under the hood work. It would be advantageous if you knew how to tune the car.

That’s the point here. The more you know about the inner workings of your model, the better you can tune, or customize it. Just like a sports car.

So, what are the tools to get the job done?

Data Science is all about math at it’s core. Therefore one should have a solid foundation in certain areas. These are:

  1. Statistics and Probabilities
  2. Linear Algebra
  3. Multivariate Calculas

Now that can sound a bit intimidating, I know. But there are many online resources that one can leverage. And they are of all different levels. Your not looking for a math degree here, but a good foundation.

So here is a list of resources that I find useful:

Statistics and Probabilities:

  1. StatQuest- Josh Starmer is a wonderful teacher. Makes it all fun and easy.
  2. jbstatistics - Like this channel as well. Very clear explanations.
  3. Zedstatistics - Good as well.
  4. Statistics and Probabilities Full Course | Statistics for data Science

Linear Algebra:

  1. Essence of Linear Algebra- Great course. Brought to you by legendary Grant Sanderson- Three Blue One Brown. Take this one first.
  2. Mathematics for Machine Learning: Linear Algebra- From the Imperial College of London.
  3. Gilbert Strang Lectures on Linear Algebra (MIT)
  4. Khan Academy: Linear Agebra - It’s Grant Sanderson again. Fan of his no matter where he’s teaching.

Multivariate Calculas:

  1. Mathematics for Machine Learning: Multivariate Calculas - Imperial College of London.
  2. Khan Academy: Multivariate Calculas - Grant Sanderson yet again.

So there you have it.

Building a good mathematical foundation for a Data Science education is well within your reach. Go out and learn!

I hope that you found this post helpful.

--

--