๐Ÿค–Implement your first AI

A Machine Learning approach in predicting diabetes

We will go through the whole process of an implementation for this Framework:

The Problem

Firstly, we have to think about the Problem we want to solve: For this Example we will try to predict if someone has diabetes based on some simple input

  • Pregnancies

  • Glucose

  • BloodPressure

  • SkinThickness

  • Insulin

  • BMI

  • DiabetesPedigreeFunction

  • Age

The needed dataset can be downloaded here

The data can - just as any other source code - also be found in the GitHub Repository

Step 1. Input Parsing

Data Object

Let's start with parsing the Input in a Java Class. For this, we create a class called DiabetesDataSet. It will be used to hold the data.

public class DiabetesDataSet {
    private final int pregnancies;
    private final int glucose;
    private final int bloodPressure;
    private final int skinThickness;
    private final int insulin;
    private final double bmi;
    private final double diabetesPedigreeFunction;
    private final int age;
    private final boolean diabetes;

    public DiabetesDataSet(String[] values){
        pregnancies = Integer.parseInt(values[0]);
        glucose = Integer.parseInt(values[1]);
        bloodPressure = Integer.parseInt(values[2]);
        skinThickness = Integer.parseInt(values[3]);
        insulin = Integer.parseInt(values[4]);
        bmi = Double.parseDouble(values[5]);
        diabetesPedigreeFunction = Double.parseDouble(values[6]);
        age = Integer.parseInt(values[7]);
        diabetes = values[8].equals("1");
    }
    
    public Vector toVector(){
        return new Vector(pregnancies,
            glucose,
            bloodPressure,
            skinThickness,
            insulin,
            bmi,
            diabetesPedigreeFunction,
            age);
    }
    
    //Getter and Setter...
}

The toVector() Method is used to parse our Object into something that a neural Network can take as Input. The constructor of Vector takes double varags.

Reading the Input

As the next step we will read the data from the file. In Java, there are several ways to read data from a File. Since Java 8 we do have the convenient method Files.lines(...) though, which returns us a Stream of each line of the file. Using this the resulting class InputParser may look like this:

public class InputParser {
    private static final Path dataOrigin = Path.of(<source>);
    public static final int amountOfUnseenData = 100;

    private final List<DiabetesDataSet> trainingsData;
    private final List<DiabetesDataSet> unseenData;  

    public InputParser() throws IOException {
        List<DiabetesDataSet> inputData = Files.lines(dataOrigin)
                .skip(1) //Skip the headers of the .csv file
                .map(set -> set.split(","))
                .map(DiabetesDataSet::new)
                .collect(Collectors.toList());

        //We don't want our Algorithm to just "remember" the data
        //Therefore we split the data up in trainings data and in unseen data,
        //with which we can check our result afterwards
        this.trainingsData = inputData.subList(amountOfUnseenData, inputData.size());
        this.unseenData = inputData.subList(0, amountOfUnseenData);
    }
    //Getter and Setter
}

Let's add one more method that returns us one random element from our trainings data:

private final AtomicBoolean lastOneHadDiabetes = new AtomicBoolean();
private final Random random = new Random();

public DiabetesDataSet getRandomTrainingsDataSet(){
   DiabetesDataSet returnValue = trainingsData.get(random.nextInt(trainingsData.size()));
   while(returnValue.hasDiabetes() == lastOneHadDiabetes.get())
       returnValue = trainingsData.get(random.nextInt(trainingsData.size()));
   lastOneHadDiabetes.set(returnValue.hasDiabetes());
   return returnValue;
}

The input data is quite heavily biased on having datasets without diabetes (around 65%) If we don't make sure that the input is roughly 50% our Algorithm will learn that always predicting "no diabetes" has a success rate of 65% We want our Algorithm to be based on the input values though and not on statistical probabilities

Step 2. Defining a Fitness Function

The next thing is to define the Fitness Function for our Diabetes Prediction. This is actually much easier than it may seem. Considering there are only 2 outcomes (has diabetes or not) we may only return 1 or 0 as fitness values.

final InputParser inputParser = new InputParser();

NeuralNetFitnessFunction fitnessFunction = (nn) -> {
    DiabetesDataSet data = inputParser.getRandomTrainingsDataSet();
    //Due to the default activation function in the Neural Network the output is between 0 and 1
    //If the output is greater than 0.5 we will interpret this as "Patient has diabetes"
    boolean prediction = nn.calcOutput(data.toVector()).get(0) > 0.5;
    //If the prediction was correct return 1 otherwise return 0
    return  prediction == data.hasDiabetes() ? 1 : 0; 
};

nn is a instance of aNeuralNet. The calcOutput(...) method will return us a Vector. In this model the Vector has the size 1 and only contains 1 value - our prediction

Step 3. Create and Configure the Algorithm

The Brain (Neural Network)

To create a brain for our Machine Learning Algorithm, we will have to create a NeuralNetSupplier using the Builder of the NeuralNet class.

NeuralNetSupplier neuralNetSupplier = ( ) -> 
    new NeuralNet.Builder( 8, 1 )
//  .addLayer(4)
    .build( );

This Neural Network will have a input size of 8, no hidden layers, and a output size of 1. To add a hidden layer of size n just add.addLayer(n) to the code

NeuralNetSupplier

This class will provide our Algorithm with its first population. We did actually already implement everything we need for this:

public static final int POP_SIZE = 1000;
//...
NeuralNetPopulationSupplier nnPopSup = new NeuralNetPopulationSupplier(neuralNetSupplier, fitnessFunction, POP_SIZE);

Selector, Recombiner and Mutator

We now have to decide which implementations we want to use for those Interfaces. Feel free to alter the values or choose even completely other implementations!

private static final Selector<NeuralNetIndividual> SELECTOR = new EliteSelector<>( 0.3 );
private static final Recombiner<NeuralNetIndividual> RECOMBINER = new NNUniformCrossoverRecombiner(2);
private static final Mutator<NeuralNetIndividual> MUTATOR = new NNRandomMutator( 0.3, 0.10, new Randomizer( -0.5, 0.5 ), 0.01 );

If you dont know what they do this is no Problem. Just copy them and try out some stuff on your own. You cant break anything, all implementations work with every other!

Genetic Algorithm

Now we can put everything together with the Builder of the GeneticAlgorithm class

public static final int GENS = 2000;
//...
GeneticAlgorithm<NeuralNetIndividual> geneticAlgorithm = new GeneticAlgorithm.Builder<>(nnPopSup, GENS, SELECTOR)
        .withRecombiner(RECOMBINER)
        .withMutator(MUTATOR)
        .withMultiThreaded(16) //uses 16 Threads to process
        .withLoggers(new IntervalConsoleLogger(100)) //used to print logging info in the console every 100th generation
        .build();

The GENS field will decide how many generations the trainings process will take. But be carefull there is such thing called overfitting which can happen if the algorithm is trained too long on the sample data. So more is not necessarily better!

With geneticAlgorithm.solve() we finally start the solving process! As a return value, we get our trained Neural Network.

Step 4. Check the result

Hurray you wrote your first AI! But wait, so far we only trained it and have not used it yet! Exactly for this, we saved 100 unseen datasets earlier in this tutorial. Let's write a method that tests our new trained Neural Network:

public static void testModel(InputParser inputParser, NeuralNet model){
    long correctGuesses = inputParser.getUnseenData()
            .stream()
            .filter(data -> {
                boolean prediction = model.calcOutput(data.toVector()).get(0) > 0.5;
                System.out.println("prediction is " + prediction + " - patient has diabetes: " + data.hasDiabetes());
                return prediction == data.hasDiabetes();
            })
            .count();
    System.out.println("The model guessed " + ((100d*correctGuesses)/InputParser.amountOfUnseenData) + "% correct");
}

It basically works similar to our Fitness Function we wrote earlier. The only difference is that we are counting the correct results, and we added some extra logging as well.

Step 5. Run the code

It's finally time to execute our code. To do so, just call the geneticAlgorithm.solve() method and give its result to our test method:

NeuralNetIndividual result = geneticAlgorithm.solve();
testModel(inputParser, result.getNN());

For me the test method always yielded an accuracy of roughly 65%- 70%.

The model guessed 67.0% correct

Notice though that a Genetic Algorithm is based on randomness, results may therefore highly differ from each time you execute the code!

Last updated