We will go through the whole process of an implementation for this Framework:
The Problem
Firstly, we have to think about the Problem we want to solve:
For this Example we will try to predict if someone has diabetes based on some simple input
Pregnancies
Glucose
BloodPressure
SkinThickness
Insulin
BMI
DiabetesPedigreeFunction
Age
The needed dataset can be downloaded here
The data can - just as any other source code - also be found in the GitHub Repository
Step 1. Input Parsing
Data Object
Let's start with parsing the Input in a Java Class. For this, we create a class called DiabetesDataSet. It will be used to hold the data.
The toVector() Method is used to parse our Object into something that a neural Network can take as Input. The constructor of Vector takes double varags.
Reading the Input
As the next step we will read the data from the file.
In Java, there are several ways to read data from a File. Since Java 8 we do have the convenient method Files.lines(...) though, which returns us a Stream of each line of the file. Using this the resulting class InputParser may look like this:
publicclassInputParser {privatestaticfinalPath dataOrigin =Path.of(<source>);publicstaticfinalint amountOfUnseenData =100;privatefinalList<DiabetesDataSet> trainingsData;privatefinalList<DiabetesDataSet> unseenData; publicInputParser() throwsIOException {List<DiabetesDataSet> inputData =Files.lines(dataOrigin).skip(1) //Skip the headers of the .csv file.map(set ->set.split(",")).map(DiabetesDataSet::new).collect(Collectors.toList());//We don't want our Algorithm to just "remember" the data//Therefore we split the data up in trainings data and in unseen data,//with which we can check our result afterwardsthis.trainingsData=inputData.subList(amountOfUnseenData,inputData.size());this.unseenData=inputData.subList(0, amountOfUnseenData); }//Getter and Setter}
Let's add one more method that returns us one random element from our trainings data:
The input data is quite heavily biased on having datasets without diabetes (around 65%)
If we don't make sure that the input is roughly 50% our Algorithm will learn that always predicting "no diabetes" has a success rate of 65%
We want our Algorithm to be based on the input values though and not on statistical probabilities
Step 2. Defining a Fitness Function
The next thing is to define the Fitness Function for our Diabetes Prediction. This is actually much easier than it may seem. Considering there are only 2 outcomes (has diabetes or not) we may only return 1 or 0 as fitness values.
finalInputParser inputParser =newInputParser();NeuralNetFitnessFunction fitnessFunction = (nn) -> {DiabetesDataSet data =inputParser.getRandomTrainingsDataSet();//Due to the default activation function in the Neural Network the output is between 0 and 1//If the output is greater than 0.5 we will interpret this as "Patient has diabetes"boolean prediction =nn.calcOutput(data.toVector()).get(0) >0.5;//If the prediction was correct return 1 otherwise return 0return prediction ==data.hasDiabetes() ?1:0; };
nn is a instance of aNeuralNet. The calcOutput(...) method will return us a Vector. In this model the Vector has the size 1 and only contains 1 value - our prediction
Step 3. Create and Configure the Algorithm
The Brain (Neural Network)
To create a brain for our Machine Learning Algorithm, we will have to create a NeuralNetSupplier using the Builder of the NeuralNet class.
This Neural Network will have a input size of 8, no hidden layers, and a output size of 1.
To add a hidden layer of size n just add.addLayer(n) to the code
NeuralNetSupplier
This class will provide our Algorithm with its first population. We did actually already implement everything we need for this:
We now have to decide which implementations we want to use for those Interfaces. Feel free to alter the values or choose even completely other implementations!
If you dont know what they do this is no Problem. Just copy them and try out some stuff on your own. You cant break anything, all implementations work with every other!
Genetic Algorithm
Now we can put everything together with the Builder of the GeneticAlgorithm class
publicstaticfinalint GENS =2000;//...GeneticAlgorithm<NeuralNetIndividual> geneticAlgorithm =newGeneticAlgorithm.Builder<>(nnPopSup, GENS, SELECTOR).withRecombiner(RECOMBINER).withMutator(MUTATOR).withMultiThreaded(16) //uses 16 Threads to process.withLoggers(newIntervalConsoleLogger(100)) //used to print logging info in the console every 100th generation.build();
The GENS field will decide how many generations the trainings process will take. But be carefull there is such thing called overfitting which can happen if the algorithm is trained too long on the sample data.
So more is not necessarily better!
With geneticAlgorithm.solve() we finally start the solving process!
As a return value, we get our trained Neural Network.
Step 4. Check the result
Hurray you wrote your first AI! But wait, so far we only trained it and have not used it yet! Exactly for this, we saved 100 unseen datasets earlier in this tutorial.
Let's write a method that tests our new trained Neural Network:
publicstaticvoidtestModel(InputParser inputParser,NeuralNet model){long correctGuesses =inputParser.getUnseenData().stream().filter(data -> {boolean prediction =model.calcOutput(data.toVector()).get(0) >0.5;System.out.println("prediction is "+ prediction +" - patient has diabetes: "+data.hasDiabetes());return prediction ==data.hasDiabetes(); }).count();System.out.println("The model guessed "+ ((100d*correctGuesses)/InputParser.amountOfUnseenData) +"% correct");}
It basically works similar to our Fitness Function we wrote earlier. The only difference is that we are counting the correct results, and we added some extra logging as well.
Step 5. Run the code
It's finally time to execute our code. To do so, just call the geneticAlgorithm.solve() method and give its result to our test method:
NeuralNetIndividual result =geneticAlgorithm.solve();testModel(inputParser,result.getNN());
For me the test method always yielded an accuracy of roughly 65%- 70%.
The model guessed 67.0% correct
Notice though that a Genetic Algorithm is based on randomness, results may therefore highly differ from each time you execute the code!