🔄Gradient descent calculates the gradient of the loss function to update the weights in a neural network.
🌊Momentum adds a velocity term to the update rule, allowing for smoother and more efficient optimization.
🎢RMSprop scales the gradient step based on the cumulative square root of prior gradients, enabling adaptive learning rates.
📈Adam combines the benefits of RMSprop and momentum to achieve efficient and adaptive optimization.
💨Choosing the right optimization algorithm and learning rate is crucial to finding the global minimum of the loss function.