Andrew Trask has an old article called A neural network in 11 lines of Python which I thought was rather cute, so I knocked up a little runnable demo of the 2 layer part of the article which allows you to step through epoch by epoch and watch it calculate preds->error->delta->new weights.

It’s an old article using a sigmoid activation but if you play with the demo while reading the article, that actually makes it easier to visualise what’s happening and how it updates more/less when it’s more/less certain based on the predictions’ positions on the curve.

import numpy as np

def sigmoid(x,deriv=False):
  return x*(1-x) if deriv else 1/(1+np.exp(-x))

X = np.array([ [0,0,1],
               [1,1,1] ])

y = np.array([[0,0,1,1]]).T

w1 = 2*np.random.random((3,1))-1         # init weights with mean 0

for i in range(10000):
  l1 = sigmoid(,w1))             # forward
  l1_error = y - l1                      # error
  l1_delta = l1_error * sigmoid(l1,True) # how much we missed * slope of the sigmoid
  w1 +=,l1_delta)             # update weights

You can play with a step-by-step runnable version here:

If you teach, this might be a useful visual aid for you. It's a little like the NN equivalent of