Vanishing Gradient is a common problem in training deep neural networks, especially in very deep architectures. It makes it difficult for the model to learn from data during training. What is Vanishing Gradient? In deep learning, training happens through a method called backpropagation , where the model adjusts its weights using gradients (a kind of slope) of the loss function with respect to each weight. These gradients tell the model how much to change each weight to improve performance. However, in deep neural networks (many layers), the gradients can get very small as they are propagated backward through the layers. This is called vanishing gradient . As a result: Early layers (closer to the input) receive almost no updates . The network stops learning or learns very slowly . When Does Vanishing Gradient Happen? Very Deep Networks : The more layers, the more chance gradients will shrink as th...
This blog contains AI knowledge, algorithm, and python features for AI practitioners.