Neural Network Layers - Linear Layer

One of the basic component of building a nearal network is a layer. A layer is a collection of computational units that is dot product of inputs and weights. Layers are crucial because they determine the architecture of the Neural Network. The example below explores a simple Linear Layer to understand the computation and properties.

Defining a Linear Layer

We define layers using Pytorch's torch.nn API. The example below defines a linear layer with $5$ inputs and $3$ outputs.

import torch
import torch.nn as nn

linear_layer = nn.Linear(5, 3)
print(linear_layer)
Linear(in_features=5, out_features=3, bias=True)

Linear Weights and Biases

When intiailized, a layer will contain a set of weights and biases equal to the number of inputs. We can access theses weights and biases directly from the attributes of the layer.

linear_layer.weight
Parameter containing: tensor([[-0.2769, -0.3803, -0.0929, 0.0815, 0.0497], [-0.3134, 0.3714, 0.0171, -0.0938, -0.3852], [ 0.1670, 0.0836, 0.4242, 0.2093, -0.2870]], requires_grad=True)
linear_layer.bias
Parameter containing: tensor([-0.2017, 0.1331, -0.2384], requires_grad=True)

Computer Layer Outputs

We can see in the above layer, we have a set of 5 parameters for the linear inputs for each output layer. Collectively, we have 15 parameters. We can compute the output of the linear layer by passing on an input tensor with a dimension of the inputs layer.

sample_tensor = torch.rand(1, 5)
sample_tensor
tensor([[0.1519, 0.1639, 0.6413, 0.7559, 0.7305]])

Computer the output

linear_layer(sample_tensor)
tensor([[-0.2677, -0.1950, 0.0212]], grad_fn=<AddmmBackward0>)

Computing Output Layer with Dot Product

To truly understand the operation above, let's demonstrate how we actually get the output by computing the Dot Product. We will do this three times (3 output), each time using the respective weights and biases. Note that we will need to flatten the sample input tensor.

torch.dot(linear_layer.weight[0], sample_tensor.flatten()) + linear_layer.bias[0]
tensor(-0.2677, grad_fn=<AddBackward0>)
torch.dot(linear_layer.weight[1], sample_tensor.flatten()) + linear_layer.bias[1]
tensor(-0.1950, grad_fn=<AddBackward0>)
torch.dot(linear_layer.weight[2], sample_tensor.flatten()) + linear_layer.bias[2]
tensor(0.0212, grad_fn=<AddBackward0>)

Note that we have the same results as we have with passing the sample tensor into the input layer.