Xavier Initialization
Xavier Initialization Glorot, X. & Bengio, Y. (2010) is a Gaussian initialization heuristic that keeps the variance of the input to a layer the same as that of the output of the layer. This ensures that the variance remains the same throughout the network.
PyTorch Usage
PyTorch offers both uniform and normal distributed initializations for the Xavier heuristic.
conv_layer = t.nn.Conv2d(16, 16)
torch.nn.init.xavier_uniform_(conv_layer.weight, gain=1)
torch.nn.init.constant_(conv_layer.bias, 0)
or
conv_layer = t.nn.Conv2d(16, 16)
torch.nn.init.xavier_normal_(conv_layer.weight, gain=1)
torch.nn.init.constant_(conv_layer.bias, 0)
The gain
value depends on the type of the non linearity used in the layer and can be obtained using the torch.nn.init.calculate_gain() function in PyTorch.
For ReLU networks use the default gain=1
.