I want to echo the recommendation of Andrej Kaparthy's YouTube channel.
Before I started watching his videos, I thought that understanding how gradient descent actually worked and what autograd actually does under the hood was unimportant - after all, I can get a working network by just slapping together some layers and letting my ML framework of choice handle the rest (and, to the credit of modern frameworks, you can get impressively far with that assumption). Andrej's Micrograd video was what changed my mind - understanding the basics of how gradients are calculated and how they flow has made everything make so much more sense.
If the classes at my university had been as good as what that man publishes on his YouTube channel for free, I would've actually finished my degree instead of dropping out.
Before I started watching his videos, I thought that understanding how gradient descent actually worked and what autograd actually does under the hood was unimportant - after all, I can get a working network by just slapping together some layers and letting my ML framework of choice handle the rest (and, to the credit of modern frameworks, you can get impressively far with that assumption). Andrej's Micrograd video was what changed my mind - understanding the basics of how gradients are calculated and how they flow has made everything make so much more sense.
If the classes at my university had been as good as what that man publishes on his YouTube channel for free, I would've actually finished my degree instead of dropping out.