I am back. After an over two-week long hiatus, I finally found the time to write a new post. The saga finally continues. This is what everybody has been waiting for. LLT: AI #2!

After the last episode, in which we let our computer, figure correlation out on its own, by using linear regression, I got some mails from people complaining that this wasn’t actually artificial intelligence, but rather “just basic math”. This blog entry, my dear math snobs, is dedicated to you.

Disclaimer: this blog series is not really meant for people who want to learn AI themselves but much rather there to document my learning process and my descent into the rabbit hole that is artificial intelligence. I will link all the resources I use to learn somewhere in my blog posts.

I had hit a dead-end and really didn’t know what to learn next. So I asked a colleague from school for help. And he introduced me into the marvelous world of perceptrons and neural networks. I needed some time to figure this out but I think that I have a pretty good understanding of this now.

So what is a perceptron?

A perceptron is basically like a neuron. A single cell-ish unit that does some predefined computation. A perceptron has inputs and outputs. Every input value has a weight and every perceptron has a bias value.

So when a perceptron predicts something, it multiplies the inputs with their corresponding weights, adds those results together and then adds the bias value to the result.

A perceptron with a single input would look like so:

Wait…am I the only one seeing this? This is our old friend, again! Absolutely mind-blowing.

via GIPHY

Unfortunately, we get more than a single input in most cases. So our prediction formula has looks a bit different when it comes to multi-input perceptrons.

But where do the weights and the bias get their values from? Good question!
The answer is quite simple: the training loop.

What is a training loop?
Stop asking so many questions, damn it!

A training loop is basically a teacher. It teaches our perceptron how to predict new values, based on the values we provide in our dataset. We call this method of training supervised learning.

The training loop makes use of something called gradient descent. Gradient descent is an optimization algorithm that tries to minimize the error of our perceptron.

Okay…lets go through this once more, but slowly. So in order to use gradient descent, we need an error function. The error function is really simple as it is just

As soon as we know our error, we can proceed with calculating the gradient. Our gradient is composed of the error, the last input our perceptron received and something called the learning rate (or lr). The learning rate is just a fixed value that lowers the value of the gradient and, in turn, lowers the effect our gradient has on the weights and the bias, per training iteration.

Now that we know the gradient, we can make changes to the weights and bias.

If we visualize this, our training loop would look like so:

But enough with all the math bla bla…let’s start coding.

Let’s start with all the helper stuff. I’ve generated a data-set and checked the correlation in GeoGebra, so we actually get a nice result this time.

import matplotlib.pyplot as plt


# Helper function to separate the columns of our training data
def column(matrix, i):
    return [row[i] for row in matrix]


# Our training data
points = [[245, 1400],
          [312, 1600],
          [279, 1700],
          [308, 1875],
          [199, 1350],
          [219, 1550],
          [405, 2350],
          [324, 1780],
          [319, 1600],
          [255, 1700]]

xa = column(points, 0)
ya = column(points, 1)

Next, we need to initialize our values (weight, bias, learning rate). To get to these initial values, I wrote the perceptron, initialized the values with 1 and then figured out what numbers they approach. For the learning rate I had a different approach: I just made it smaller by a factor of 1E-1 until I stopped getting OverflowError from my prediction.

weight = 10
bias = 100
lr = 0.000001
errorAvg = 0

The prediction itself and the training loop are fairly trivial.

def predict(xp):
    return xp * weight + bias


# Train
for iteration in range(10):
    for x, y in points:
        prediction = predict(x)
        error = y - prediction
        errorAvg += error
        gradient = x * error * lr

        # Change bias and weight
        bias += gradient
        weight += weight * gradient
    errorAvg = errorAvg / len(xa)
    print("Iteration: {}\n--------------\nError: {}; Weight: {}; Bias: {}\n".format(iteration, errorAvg, weight, bias))
    
print("Final\n--------------\nError: {}; Weight: {}; Bias: {}".format(errorAvg,weight, bias))

Now we just need to visualize it all:

lineX = []
lineY = []
for i in range(len(xa)):
    lineX += [xa[i]]
    lineY += [predict(xa[i])]

plt.plot(xa, ya, 'ro')
plt.plot(lineX, lineY)
plt.show()

We should see something like this:

lltai2_plot

This is cool and all…but I want more. I want a multi-input perceptron! Let’s first make the code more modular by making a perceptron class. Also, let’s change our value initialization to random values.

from random import random


class Perceptron:
    def __init__(self, nrOfInputs):
        self.weight = []
        for i in range(nrOfInputs):
            self.weight += [random()]

    def predict(self, input):
        prediction = 0
        for i, val in enumerate(input):
            prediction += self.weight[i] * val

        return prediction

    def train(self, inputs, outputs, lr, ti):
        errorAvg = 0
        for it in range(ti):
            for input_i, input in enumerate(inputs):
                prediction = self.predict(input)
                error = outputs[input_i] - prediction
                errorAvg += error

                for val_i, val in enumerate(input):
                    gradient = val * error * lr
                    self.weight[val_i] += self.weight[val_i] * gradient

            errorAvg = errorAvg / len(inputs)
        return errorAvg, self.weight

Now, we need a training function.

def generate(_in, i, o, ti=1000000, lr=0.001):
    perceptron = Perceptron(_in)
    err, weights = perceptron.train(i, o, lr, ti)
    print("Error: {}; Weights: {}".format(err, weights))
    return perceptron

Let’s test this. I generated test data by formulae and let the perceptron figure out the formula I used.

# Test with two inputs; (input + input)*2 = result
print("Two inputs:")
inputs = [[1, 1], [2, 3], [5, 8], [13, 21], [4, 4], [5, 5], [4, 8]]
outputs = [4, 10, 26, 68, 16, 20, 24]
print(generate(2, inputs, outputs).predict([10, 10]))

# Test with three inputs; (input + input)*2+input = result
print("\nThree inputs:")
inputs = [[1, 1, 1], [2, 3, 4], [5, 8, 9], [13, 21, 22], [4, 4, 5], [5, 5, 6], [4, 8, 9]]
outputs = [5, 14, 35, 90, 21, 26, 33]
print(generate(3, inputs, outputs).predict([10, 10, 5]))

# Test with three inputs; (input + input)*2+input+input*0.1 = result
print("\nFour inputs:")
inputs = [[1, 1, 1, 1], [2, 3, 4, 5], [5, 8, 9, 9], [13, 21, 22, 7], [4, 4, 5, 6], [5, 5, 6, 7], [4, 8, 9, 3]]
outputs = [5.1, 14.5, 35.9, 90.7, 21.6, 26.7, 33.3]
print(generate(4, inputs, outputs).predict([10, 10, 5, 4]))

Running this takes a bit longer than running the single-input perceptron (even on my machine).

The output should look like this:

Two inputs:
Error: 6.587323279442596e-15; Weights: [1.999999999999973, 2.0000000000000178]
39.99999999999991

Three inputs:
Error: -2.6626848873926672e-14; Weights: [1.9999999999999627, 1.999999999999918, 1.0000000000000988]
44.999999999999304

Four inputs:
Error: -5.3290705182007514e-14; Weights: [2.0000000000000413, 1.9999999999991658, 1.0000000000007956, 0.09999999999988507]
45.399999999995586

Pretty good! I’d say this is a good first success. I planned to include a simple neural network in this blog post, but since it doesn’t really work as well as I want it to work and the post would’ve become quite big, I decided to make a dedicated LLT:AI for this (I won’t let you wait as long this time…I promise 😅).