## Neural Network Homework

Neural Network Homework

Complete all the questions in the document. Coding if needed. If you write code, please include both the code and the running result. Also, if there’s a coding part,

do not copy from Google. Thank you very much!

Homework # 5 Neural Network

Suppose we have a single neuron perceptron as following. In this one-layer perceptron

model, the neuron calculates a weighted sum of inputs. And there is a threshold to the

result: if the sum if larger than zero, the output is 1. Otherwise, the output is zero.

Consider the following examples, where O is the desired output.

X1 X2 O

0 0 0

0 1 1

1 0 1

1 1 1

In this question, you will apply the on-line gradient descent to automatically learn the

network’s weights, so that it classifies correctly all the training examples. The algorithm

is simple: Iterate through the training examples, one by one (if the last example was used,

and the algorithm hasn’t converged yet, start again from the first example, and so forth.)

For every example i:

(1) Calculate the net’s output �”.

(2) Multiply the error (�” − �”) by the learning rate η. Add this correction to any

weight for which the input in the example was non-zero.

That is, if for the current example i X1 = 1, then update W1

ʹ

→ W1 + η(�” − �”),

etc.

(3) If the network outputs the correct result for all the training set examples, conclude.

output

Questions:

1. Please apply the algorithm for the training examples provided above. Use learning

rate η = 0.2. Suppose the initial values W1 = 0.1, W2 = 0.3. Give your results as

specified in the table as following. You should expect to getting the final weights

within only a few passes over the training examples. (Please give details of your

calculation.)

X1 X2 W1 W2 O Z Error W1 W2

0 0 0.1 0.3

0 1

1 0

1 1

0 0

0 1

1 0

1 1

…

2. The perceptron training algorithm is in fact a simple gradient descent update. In

this question, you will derive this algorithm. The approach for training a

perceptron here is to minimize a squared error function.

• Give the definition of a squared error function, E, in terms of W1, W2, Xi1, Xi2

and �”.

• Each weight should now be updated by taking a small step in the opposite

direction of its gradient (so as to minimize the error): Wʹ

=W−η∇E(W)

Show how this translates into the algorithm that you applied in the previous

question (You do not need to write code, present your algorithm in mathematic).

3. In practice, the training example may be noisy. Suppose that there are

contradicting examples in the training set: for example, an additional example,

where X1 = 1, X2 = 1, O= 0. How do you think this will affect the algorithm’s

behavior?