{ "cells": [ { "cell_type": "markdown", "id": "given-quilt", "metadata": {}, "source": [ "# PilotNet Sigma-Delta Neural Network (SDNN) Training\n", "\n", "\n", "\n", "\n", "\n", "
\"Drawing\" \"Drawing\"
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
PilotNet: Predict the car's steering angle from the dashboard view.
PilotNet dataset is available freely here. © MIT License.\n", "
\n", "\n", "## What are SDNNs?\n", "\n", "\n", "\n", "
\"Drawing\" \"Drawing\"
\n", "\n", "__Sigma-delta neural networks__ consists of two main units: _sigma_ decoder in the dendrite and _delta_ encoder in the axon. Delta encoder uses differential encoding on the output activation of a regular ANN activation, for e.g. ReLU. In addition it only sends activation to the next layer when the encoded message magnitude is larger than its threshold. The sigma unit accumulates the sparse event messages and accumulates it to restore the original value.\n", "\n", "\n", "\n", "\n", "
\n", "\n", "A sigma-delta neuron is simply a regular activation wrapped around by a sigma unit at it's input and a delta unit at its output.\n", "\n", "When the input to the network is a temporal sequence, the activations do not change much. Therefore, the message between the layers are reduced which in turn reduces the synaptic computation in the next layer. In addition, the graded event values can encode the change in magnitude in one time-step. Therefore there is no increase in latency at the cost of time-steps unlike the rate coded Spiking Neural Networks.\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\"Drawing\" \"Drawing\"
Credit Eadweard Muybridge © Public Domain
\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "mature-massage", "metadata": {}, "outputs": [], "source": [ "import sys, os\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import h5py\n", "\n", "import torch\n", "from torch.utils.data import DataLoader\n", "from torchvision import transforms\n", "import torch.nn.functional as F\n", "\n", "import lava.lib.dl.slayer as slayer\n", "\n", "from pilotnet_dataset import PilotNetDataset\n", "import utils" ] }, { "cell_type": "code", "execution_count": 2, "id": "expressed-juice", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "torch.manual_seed(4205)" ] }, { "cell_type": "markdown", "id": "dying-strength", "metadata": {}, "source": [ "# Event sparsity loss\n", "\n", "Sparsity loss to penalize the network for high event-rate." ] }, { "cell_type": "code", "execution_count": 3, "id": "nonprofit-charger", "metadata": {}, "outputs": [], "source": [ "def event_rate_loss(x, max_rate=0.01):\n", " mean_event_rate = torch.mean(torch.abs(x))\n", " return F.mse_loss(F.relu(mean_event_rate - max_rate), torch.zeros_like(mean_event_rate))" ] }, { "cell_type": "markdown", "id": "comfortable-corrections", "metadata": {}, "source": [ "# Network description\n", "\n", "__SLAYER 2.0__ (__`lava.dl.slayer`__) provides a variety of learnable _neuron models_ , _synapses_ _axons_ and _dendrites_ that support quantized training. \n", "For easier use, it also provides __`block`__ interface which packages the associated neurons, synapses, axons and dendrite features into a single module. \n", "\n", "__Sigma-delta blocks__ are available as `slayer.blocks.sigma_delta.{Dense, Conv, Pool, Input, Output, Flatten, ...}` which can be easily composed to create a variety of sequential network descriptions as shown below. The blocks can easily enable _synaptic weight normalization_, _neuron normalization_ as well as provide useful _gradient monitoring_ utility and _hdf5 network export_ utility.\n", "\n", "\n", "\n", "These blocks can be used to create a network using standard PyTorch procedure." ] }, { "cell_type": "code", "execution_count": 4, "id": "mature-doubt", "metadata": {}, "outputs": [], "source": [ "class Network(torch.nn.Module):\n", " def __init__(self):\n", " super(Network, self).__init__()\n", " \n", " sdnn_params = { # sigma-delta neuron parameters\n", " 'threshold' : 0.1, # delta unit threshold\n", " 'tau_grad' : 0.5, # delta unit surrogate gradient relaxation parameter\n", " 'scale_grad' : 1, # delta unit surrogate gradient scale parameter\n", " 'requires_grad' : True, # trainable threshold\n", " 'shared_param' : True, # layer wise threshold\n", " 'activation' : F.relu, # activation function\n", " }\n", " sdnn_cnn_params = { # conv layer has additional mean only batch norm\n", " **sdnn_params, # copy all sdnn_params\n", " 'norm' : slayer.neuron.norm.MeanOnlyBatchNorm, # mean only quantized batch normalizaton\n", " }\n", " sdnn_dense_params = { # dense layers have additional dropout units enabled\n", " **sdnn_cnn_params, # copy all sdnn_cnn_params\n", " 'dropout' : slayer.neuron.Dropout(p=0.2), # neuron dropout\n", " }\n", " \n", " self.blocks = torch.nn.ModuleList([# sequential network blocks \n", " # delta encoding of the input\n", " slayer.block.sigma_delta.Input(sdnn_params), \n", " # convolution layers\n", " slayer.block.sigma_delta.Conv(sdnn_cnn_params, 3, 24, 3, padding=0, stride=2, weight_scale=2, weight_norm=True),\n", " slayer.block.sigma_delta.Conv(sdnn_cnn_params, 24, 36, 3, padding=0, stride=2, weight_scale=2, weight_norm=True),\n", " slayer.block.sigma_delta.Conv(sdnn_cnn_params, 36, 64, 3, padding=(1, 0), stride=(2, 1), weight_scale=2, weight_norm=True),\n", " slayer.block.sigma_delta.Conv(sdnn_cnn_params, 64, 64, 3, padding=0, stride=1, weight_scale=2, weight_norm=True),\n", " # flatten layer\n", " slayer.block.sigma_delta.Flatten(),\n", " # dense layers\n", " slayer.block.sigma_delta.Dense(sdnn_dense_params, 64*40, 100, weight_scale=2, weight_norm=True),\n", " slayer.block.sigma_delta.Dense(sdnn_dense_params, 100, 50, weight_scale=2, weight_norm=True),\n", " slayer.block.sigma_delta.Dense(sdnn_dense_params, 50, 10, weight_scale=2, weight_norm=True),\n", " # linear readout with sigma decoding of output\n", " slayer.block.sigma_delta.Output(sdnn_dense_params, 10, 1, weight_scale=2, weight_norm=True)\n", " ])\n", " \n", " \n", " def forward(self, x):\n", " count = []\n", " event_cost = 0\n", "\n", " for block in self.blocks: \n", " # forward computation is as simple as calling the blocks in a loop\n", " x = block(x)\n", " if hasattr(block, 'neuron'):\n", " event_cost += event_rate_loss(x)\n", " count.append(torch.sum(torch.abs((x[..., 1:]) > 0).to(x.dtype)).item())\n", "\n", " return x, event_cost, torch.FloatTensor(count).reshape((1, -1)).to(x.device)\n", "\n", " def grad_flow(self, path):\n", " # helps monitor the gradient flow\n", " grad = [b.synapse.grad_norm for b in self.blocks if hasattr(b, 'synapse')]\n", "\n", " plt.figure()\n", " plt.semilogy(grad)\n", " plt.savefig(path + 'gradFlow.png')\n", " plt.close()\n", "\n", " return grad\n", " \n", " def export_hdf5(self, filename):\n", " # network export to hdf5 format\n", " h = h5py.File(filename, 'w')\n", " layer = h.create_group('layer')\n", " for i, b in enumerate(self.blocks):\n", " b.export_hdf5(layer.create_group(f'{i}'))\n", " \n", " " ] }, { "cell_type": "markdown", "id": "closed-willow", "metadata": {}, "source": [ "# Training parameters" ] }, { "cell_type": "code", "execution_count": 5, "id": "brown-twins", "metadata": {}, "outputs": [], "source": [ "batch = 8 # batch size\n", "lr = 0.001 # leaerning rate\n", "lam = 0.01 # lagrangian for event rate loss\n", "epochs = 200 # training epochs\n", "steps = [60, 120, 160] # learning rate reduction milestones\n", "\n", "trained_folder = 'Trained'\n", "logs_folder = 'Logs'\n", "\n", "os.makedirs(trained_folder, exist_ok=True)\n", "os.makedirs(logs_folder , exist_ok=True)\n", "\n", "device = torch.device('cuda')" ] }, { "cell_type": "markdown", "id": "hidden-spring", "metadata": {}, "source": [ "# Instantiate Network, Optimizer, Dataset and Dataloader" ] }, { "cell_type": "code", "execution_count": 6, "id": "exposed-claim", "metadata": {}, "outputs": [], "source": [ "net = Network().to(device)\n", "\n", "optimizer = torch.optim.RAdam(net.parameters(), lr=lr, weight_decay=1e-5)\n", "\n", "# Datasets\n", "training_set = PilotNetDataset(\n", " train=True, \n", " transform=transforms.Compose([\n", " transforms.Resize([33, 100]),\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n", " ]), \n", ")\n", "testing_set = PilotNetDataset(\n", " train=False, \n", " transform=transforms.Compose([\n", " transforms.Resize([33, 100]),\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n", " ]),\n", ")\n", "\n", "train_loader = DataLoader(dataset=training_set, batch_size=batch, shuffle=True, num_workers=8)\n", "test_loader = DataLoader(dataset=testing_set , batch_size=batch, shuffle=True, num_workers=8)\n", "\n", "stats = slayer.utils.LearningStats()\n", "assistant = slayer.utils.Assistant(\n", " net=net,\n", " error=lambda output, target: F.mse_loss(output.flatten(), target.flatten()),\n", " optimizer=optimizer,\n", " stats=stats,\n", " count_log=True,\n", " lam=lam\n", " )" ] }, { "cell_type": "markdown", "id": "fifty-comfort", "metadata": {}, "source": [ "# Training loop\n", "\n", "Training loop mainly consists of looping over epochs and calling `assistant.train` and `assistant.test` utilities over training and testing dataset. The `assistant` utility takes care of statndard backpropagation procedure internally.\n", "\n", "* `stats` can be used in print statement to get formatted stats printout.\n", "* `stats.testing.best_loss` can be used to find out if the current iteration has the best testing loss. Here, we use it to save the best model.\n", "* `stats.update()` updates the stats collected for the epoch.\n", "* `stats.save` saves the stats in files." ] }, { "cell_type": "code", "execution_count": 7, "id": "antique-combining", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Epoch 49/200] Train loss = 0.08724 (min = 0.06125) | Test loss = 0.07638 (min = 0.07048)\n", "[Epoch 59/200] Train loss = 0.05042 (min = 0.06125) | Test loss = 0.06043 (min = 0.05631)\n", "Learning rate reduction from 0.001\n", "[Epoch 99/200] Train loss = 0.03670 (min = 0.03420) | Test loss = 0.04726 (min = 0.04006)\n", "[Epoch 119/200] Train loss = 0.03588 (min = 0.03177) | Test loss = 0.04133 (min = 0.03812)\n", "Learning rate reduction from 0.0003\n", "[Epoch 149/200] Train loss = 0.03995 (min = 0.02701) | Test loss = 0.04514 (min = 0.03812)\n", "[Epoch 159/200] Train loss = 0.03351 (min = 0.02701) | Test loss = 0.04028 (min = 0.03812)\n", "Learning rate reduction from 8.999999999999999e-05\n", "[Epoch 199/200] Train loss = 0.03122 (min = 0.02523) | Test loss = 0.04434 (min = 0.03812)\n" ] } ], "source": [ "for epoch in range(epochs):\n", " if epoch in steps:\n", " for param_group in optimizer.param_groups: \n", " print('\\nLearning rate reduction from', param_group['lr'])\n", " param_group['lr'] /= 10/3\n", " \n", " for i, (input, ground_truth) in enumerate(train_loader): # training loop\n", " assistant.train(input, ground_truth)\n", " print(f'\\r[Epoch {epoch:3d}/{epochs}] {stats}', end='')\n", " \n", " for i, (input, ground_truth) in enumerate(test_loader): # testing loop\n", " assistant.test(input, ground_truth)\n", " print(f'\\r[Epoch {epoch:3d}/{epochs}] {stats}', end='')\n", " \n", " if epoch%50==49: print() \n", " if stats.testing.best_loss: \n", " torch.save(net.state_dict(), trained_folder + '/network.pt')\n", " stats.update()\n", " stats.save(trained_folder + '/')\n", " \n", " # gradient flow monitoring\n", " net.grad_flow(trained_folder + '/')\n", " \n", " # checkpoint saves\n", " if epoch%10 == 0:\n", " torch.save({'net': net.state_dict(), 'optimizer': optimizer.state_dict()}, logs_folder + f'/checkpoint{epoch}.pt') " ] }, { "cell_type": "markdown", "id": "favorite-float", "metadata": {}, "source": [ "# Learning plots.\n", "\n", "Plotting the learning curves is as easy as calling `stats.plot()`." ] }, { "cell_type": "code", "execution_count": 8, "id": "boolean-command", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "stats.plot(figsize=(15, 5))" ] }, { "cell_type": "markdown", "id": "fleet-monday", "metadata": {}, "source": [ "# Export the best trained model\n", "\n", "Load the best model during training and export it as hdf5 network. It is supported by `lava.lib.dl.netx` to automatically load the network as a lava process." ] }, { "cell_type": "code", "execution_count": 9, "id": "maritime-manchester", "metadata": {}, "outputs": [], "source": [ "net.load_state_dict(torch.load(trained_folder + '/network.pt'))\n", "net.export_hdf5(trained_folder + '/network.net')" ] }, { "cell_type": "markdown", "id": "reserved-quilt", "metadata": {}, "source": [ "# Operation count of trained model\n", "\n", "Here, we compare the synaptic operation and neuron activity of the trained SDNN and an ANN of iso-architecture." ] }, { "cell_type": "markdown", "id": "quarterly-technical", "metadata": {}, "source": [ "## Event statistics on testing dataset" ] }, { "cell_type": "code", "execution_count": 10, "id": "southeast-character", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Event count : 2170.5430, 224.2953, 372.4000, 507.9524, 28.2095, 4.7143, 1.0000, 0.1333, 0.4286 | loss = 0.04975 (min = 0.03812)" ] } ], "source": [ "counts = []\n", "for i, (input, ground_truth) in enumerate(test_loader):\n", " _, count = assistant.test(input, ground_truth)\n", " count = (count.flatten()/(input.shape[-1]-1)/input.shape[0]).tolist() # count skips first events\n", " counts.append(count) \n", " print('\\rEvent count : ' + ', '.join([f'{c:.4f}' for c in count]), f'| {stats.testing}', end='') \n", " \n", "counts = np.mean(counts, axis=0)" ] }, { "cell_type": "markdown", "id": "frequent-sight", "metadata": {}, "source": [ "# Event and Synops comparion with ANN" ] }, { "cell_type": "code", "execution_count": 11, "id": "reasonable-volleyball", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "|-----------------------------------------------------------------------------|\n", "| | SDNN | ANN |\n", "|-----------------------------------------------------------------------------|\n", "| | Shape | Events | Synops | Activations| MACs |\n", "|-----------------------------------------------------------------------------|\n", "| layer-0 | (100, 33, 3) | 2475.93 | | 9900 | |\n", "| layer-1 | ( 49, 16, 24) | 239.46 | 133700.12 | 18816 | 534600 |\n", "| layer-2 | ( 24, 7, 36) | 422.39 | 19395.86 | 6048 | 1524096 |\n", "| layer-3 | ( 22, 4, 64) | 558.71 | 121649.28 | 5632 | 1741824 |\n", "| layer-4 | ( 20, 2, 64) | 29.90 | 321818.47 | 2560 | 3244032 |\n", "| layer-5 | ( 1, 1,100) | 4.80 | 2989.97 | 100 | 256000 |\n", "| layer-6 | ( 1, 1, 50) | 1.35 | 240.00 | 50 | 5000 |\n", "| layer-7 | ( 1, 1, 10) | 0.16 | 13.46 | 10 | 500 |\n", "| layer-8 | ( 1, 1, 1) | 0.44 | 0.16 | 1 | 10 |\n", "|-----------------------------------------------------------------------------|\n", "| Total | | 3733.14 | 599807.33 | 43117 | 7306062 |\n", "|-----------------------------------------------------------------------------|\n", "\n", "\n", "MSE : 0.038123 sq. radians\n", "Total neurons : 43117\n", "Events sparsity: 11.55x\n", "Synops sparsity: 12.18x\n" ] } ], "source": [ "utils.compare_ops(net, counts, mse=stats.testing.min_loss)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }