While I am quite proficient in some fields, I’m a hopeless n00b in others (there…I just described every human ever). One of those fields is machine learning.

I mean…I get the gist of it. You feed some clever algorithm a bunch of data and expect a good result. I also know that it is the buzzwordiest of buzzwords (next to “cloud computing”, “NoSQL”, “Blockchain” and other Pokémon) and people slap it on to virtually everything.

There are a couple cool libraries like TensorFlow or PyTorch that make things, to my knowledge, a heck of a lot easier for beginners and allow anyone who can follow a basic copy-and-paste style tutorial to do machine learning. But only using these libraries without understanding them is, in my opinion, just cheating yourself. I want to know what I’m doing and not just follow a tutorial.

Now, learning ml was always up there on my to-do list and I always found excuses not to do it. But recently, I thought to myself: screw it! I have this blog and I need to give my 3 readers some good reading material by absolutely failing every aspect of me learning how to AI.

In this series I will attempt to document everything I learn. Not to educate others (as I said…I’m currently learning it myself) but rather to share my experiences in learning something completely new and showing people that getting into something like AI isn’t as scary and complicated as it looks like. So follow me through the adventure that is artificial intelligence!

Disclaimer before we start this: this particular blog series is absolutely not meant for people who want to learn AI themselves. I will link all the resources I use to learn somewhere in my blog posts.

Now…I’ve been watching some tutorials and reading some books about AI and the first thing all of those teach you is regression. So let’s just start with that.

I really don’t know how to efficiently explain linear regression (I don’t even know if I understood everything correctly), but to my understanding it’s just having a bunch of points in a coordinate system and “eyeballing” (through some, really-not-so-complicated, math) a linear function into it. So as we all, hopefully, know:

k is the slope and d is the y intercept of the function. So how do we AI the crap out of this? In theory, we construct a line which, roughly, lines up with our training points, and then we can insert x-values into this function and get something useful out. Of course, this only works if your data has any correlation to one-another. Sure, you can correlate nearly everything (relevant XKCD), but this will usually result in completely useless predictions.

So let’s start by implementing something like this. I will be using C# for this. I could use Python but I really don’t want to.

As previously mentioned, the math for getting that function is really not to complicated. You can get k with this rather pretty formula:

And we can get d like so:

In C# these formulae can be implemented like so:

public static (Func<decimal, decimal> function, decimal k, decimal d)GetLine(List<(decimal x, decimal y)> points)
{
var k = GetKValue(points);
var d = GetDValue(k.k, k.xAverage, k.yAverage);
return (i => k.k * i + d, k.k, d);
}

public static (decimal k, decimal xAverage, decimal yAverage) GetKValue(List<(decimal x, decimal y)> points)
{
var xAverage = points.Average(a => a.x);
var yAverage = points.Average(a => a.y);
var xyAverage = points.Average(a => a.x * a.y);
var x2Average = points.Average(a => a.x * a.x);

return (((xAverage * yAverage) - xyAverage) / ((xAverage * xAverage) - x2Average), xAverage, yAverage);
}

public static decimal GetDValue(decimal k, decimal xAverage, decimal yAverage)
{
return yAverage - k * xAverage;
}


Let’s check if this really works. We’ll check if our implementation is correct by taking some points, throwing them at GeoGebra, throwing them at our implementation and then comparing the two results.

For our testing data I really just took the first couple points I could find on the internet. I ended up with x values that describe the price of a house in 1k$and the size of said house. Of course, this is far from being precise by not accounting for different neighborhoods, housing market etc. but this is not supposed to be a good example, just a working one. House price in 1k$ (x) Square feet (y)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

The testing code would look like so:

static void Main(string[] args)
{
var points = new List<(decimal x, decimal y)>
{
(245, 1400),
(312, 1600),
(279, 1700),
(308, 1875),
(199, 1100),
(219, 1550),
(405, 2350),
(324, 2450),
(319, 1425),
(255, 1700)
};

var result = GetLine(points);
points.ForEach(a => Console.WriteLine($"x: {a.x}; y: {a.y}")); Console.WriteLine($"k = {result.k}");
Console.WriteLine(\$"d = {result.d}");
}


In GeoGebra, you import the data by going into the Spreadsheet mode (Ctrl+Shift+S), pasting the content of the table, selecting all the points and then choosing “Two Variable Regression Analysis” in the menu bar. Then you just need to choose what regression model you want to use (“Linear” in our case) and you’re good to go. Nice. Now we have a reference value. Let’s see if our implementation is as good as this: Woah! Did we just manage to do this first try?! I think we did! Perfect!

Ok…I think that’s pretty much it for now. I am quite honestly pretty impressed at how easy this just was (I was writing this code while writing this blog entry) and now more hyped than ever before. Updates will follow whenever I want to and am not too tired for this.