Neural Network Homework

Neural Network Homework
Complete all the questions in the document. Coding if needed. If you write code, please include both the code and the running result. Also, if there’s a coding part,
do not copy from Google. Thank you very much!

Homework # 5 Neural Network
Suppose we have a single neuron perceptron as following. In this one-layer perceptron
model, the neuron calculates a weighted sum of inputs. And there is a threshold to the
result: if the sum if larger than zero, the output is 1. Otherwise, the output is zero.
Consider the following examples, where O is the desired output.
X1 X2 O
0 0 0
0 1 1
1 0 1
1 1 1
In this question, you will apply the on-line gradient descent to automatically learn the
network’s weights, so that it classifies correctly all the training examples. The algorithm
is simple: Iterate through the training examples, one by one (if the last example was used,
and the algorithm hasn’t converged yet, start again from the first example, and so forth.)
For every example i:
(1) Calculate the net’s output �”.
(2) Multiply the error (�” − �”) by the learning rate η. Add this correction to any
weight for which the input in the example was non-zero.
That is, if for the current example i X1 = 1, then update W1
ʹ
→ W1 + η(�” − �”),
etc.
(3) If the network outputs the correct result for all the training set examples, conclude.
output
Questions:
1. Please apply the algorithm for the training examples provided above. Use learning
rate η = 0.2. Suppose the initial values W1 = 0.1, W2 = 0.3. Give your results as
specified in the table as following. You should expect to getting the final weights
within only a few passes over the training examples. (Please give details of your
calculation.)
X1 X2 W1 W2 O Z Error W1 W2
0 0 0.1 0.3
0 1
1 0
1 1
0 0
0 1
1 0
1 1

2. The perceptron training algorithm is in fact a simple gradient descent update. In
this question, you will derive this algorithm. The approach for training a
perceptron here is to minimize a squared error function.
• Give the definition of a squared error function, E, in terms of W1, W2, Xi1, Xi2
and �”.
• Each weight should now be updated by taking a small step in the opposite
direction of its gradient (so as to minimize the error): Wʹ
=W−η∇E(W)
Show how this translates into the algorithm that you applied in the previous
question (You do not need to write code, present your algorithm in mathematic).
3. In practice, the training example may be noisy. Suppose that there are
contradicting examples in the training set: for example, an additional example,
where X1 = 1, X2 = 1, O= 0. How do you think this will affect the algorithm’s
behavior?

find the cost of your paper