๐คImplement your first AI
A Machine Learning approach in predicting diabetes
We will go through the whole process of an implementation for this Framework:
The Problem
Firstly, we have to think about the Problem we want to solve: For this Example we will try to predict if someone has diabetes based on some simple input
Pregnancies
Glucose
BloodPressure
SkinThickness
Insulin
BMI
DiabetesPedigreeFunction
Age
The needed dataset can be downloaded here
The data can - just as any other source code - also be found in the GitHub Repository
Step 1. Input Parsing
Data Object
Let's start with parsing the Input in a Java Class. For this, we create a class called DiabetesDataSet.
It will be used to hold the data.
The toVector()
Method is used to parse our Object into something that a neural Network can take as Input. The constructor of Vector takes double varags.
Reading the Input
As the next step we will read the data from the file.
In Java, there are several ways to read data from a File. Since Java 8 we do have the convenient method Files.lines(...)
though, which returns us a Stream of each line of the file. Using this the resulting class InputParser
may look like this:
Let's add one more method that returns us one random element from our trainings data:
The input data is quite heavily biased on having datasets without diabetes (around 65%) If we don't make sure that the input is roughly 50% our Algorithm will learn that always predicting "no diabetes" has a success rate of 65% We want our Algorithm to be based on the input values though and not on statistical probabilities
Step 2. Defining a Fitness Function
The next thing is to define the Fitness Function for our Diabetes Prediction. This is actually much easier than it may seem. Considering there are only 2 outcomes (has diabetes or not) we may only return 1 or 0 as fitness values.
nn is a instance of aNeuralNet
. The calcOutput(...)
method will return us a Vector. In this model the Vector has the size 1 and only contains 1 value - our prediction
Step 3. Create and Configure the Algorithm
The Brain (Neural Network)
To create a brain for our Machine Learning Algorithm, we will have to create a NeuralNetSupplier
using the Builder
of the NeuralNet
class.
This Neural Network will have a input size of 8, no hidden layers, and a output size of 1.
To add a hidden layer of size n just add.addLayer(n)
to the code
NeuralNetSupplier
This class will provide our Algorithm with its first population. We did actually already implement everything we need for this:
Selector, Recombiner and Mutator
We now have to decide which implementations we want to use for those Interfaces. Feel free to alter the values or choose even completely other implementations!
If you dont know what they do this is no Problem. Just copy them and try out some stuff on your own. You cant break anything, all implementations work with every other!
Genetic Algorithm
Now we can put everything together with the Builder of the GeneticAlgorithm
class
The GENS
field will decide how many generations the trainings process will take. But be carefull there is such thing called overfitting which can happen if the algorithm is trained too long on the sample data.
So more is not necessarily better!
With geneticAlgorithm.solve()
we finally start the solving process!
As a return value, we get our trained Neural Network.
Step 4. Check the result
Hurray you wrote your first AI! But wait, so far we only trained it and have not used it yet! Exactly for this, we saved 100 unseen datasets earlier in this tutorial. Let's write a method that tests our new trained Neural Network:
It basically works similar to our Fitness Function we wrote earlier. The only difference is that we are counting the correct results, and we added some extra logging as well.
Step 5. Run the code
It's finally time to execute our code. To do so, just call the geneticAlgorithm.solve()
method and give its result to our test method:
For me the test method always yielded an accuracy of roughly 65%- 70%.
Notice though that a Genetic Algorithm is based on randomness, results may therefore highly differ from each time you execute the code!
Last updated