Stochastic Gradient Descent
Only update weights by choosing a specific instance of the batch instead of all of them.
much faster than batch gradient for machine learning
samples have redundancy between them
Only reason for batching is because hardware is more efficient at batching
Parallelized in a simple way, which is best solved by batching