Explore the use of the kullback-leibler divergence

In this problem, we explore the use of the Kullback-Leibler divergence (KLD) to derive a supervised-learning algorithm for multilayer perceptrons (Hopfield, 1987; Baum and Wilczek, 1988). To be specific, consider a multilayer perceptron consisting of an input layer, a hidden layer, and an output layer. Given a case or example α presented to the input, the output of neuron k in the output layer is assigned the probabilistic interpretation

2427_d00ea0a9-0815-4309-a08b-51de360c3ec7.png

Correspondingly, let qk|α denote the actual (true) value of the conditional probability that the proposition k is true, given the input case α.The KLD for the multilayer perceptron is defined by

1725_8ba1afa7-757e-4402-baa0-0dec27b146c7.png

find the cost of your paper