Multi-Layer ANNs

Multi-Layer Networks Built from Perceptron Units z

Perceptrons not able to learn certain concepts –

z

Can only learn linearly separable functions

But they can be the basis for larger structures – –

Which can learn more sophisticated concepts Say that the networks have “perceptron units”

1

Problem With Perceptron Units z

The learning rule relies on differential calculus –

z

Step functions aren’t differentiable –

z

Finding minima by differentiating, etc. They are not continuous at the threshold

Alternative threshold function sought – –

Must be differentiable Must be similar to step function z

z

i.e., exhibit a threshold so that units can “fire” or not fire

Sigmoid units used for backpropagation –

There are other alternatives that are often used

Sigmoid Units z

Take in weighted sum of inputs, S and output:

z

Advantages: – – –

Looks very similar to the step function Is differentiable Derivative easily expressible in terms of σ itself:

2

Example ANN with Sigmoid Units z

Feed forward network Feed inputs in on the left, propagate numbers forward



z

Suppose we have this ANN With weights set arbitrarily



Propagation of Example z z

Suppose input to ANN is 10, 30, 20 First calculate weighted sums to hidden layer: – –

z

SH1 = (0.2*10) + (-0.1*30) + (0.4*20) = 2-3+8 = 7 SH2 = (0.7*10) + (-1.2*30) + (1.2*20) = 7-6+24= -5

Next calculate the output from the hidden layer: – – – –

Using: σ(S) = 1/(1 + e-S) σ(SH1) = 1/(1 + e-7) = 1/(1+0.000912) = 0.999 σ(SH2) = 1/(1 + e5) = 1/(1+148.4) = 0.0067 So, H1 has fired, H2 has not

3

Propagation of Example z

Next calculate the weighted sums into the output layer: – –

z

Finally, calculate the output from the ANN – –

z

SO1 = (1.1 * 0.999) + (0.1 * 0.0067) = 1.0996 SO2 = (3.1 * 0.999) + (1.17 * 0.0067) = 3.1047 σ(SO1) = 1/(1+e-1.0996) = 1/(1+0.333) = 0.750 σ(SO2) = 1/(1+e-3.1047) = 1/(1+0.045) = 0.957

Output from O2 > output from O1 – –

So, the ANN predicts category associated with O2 For the example input (10,30,20)

Backpropagation Learning Algorithm z

Same task as in perceptrons – –

z

Learn a multi-layer ANN to correctly categorise unseen examples We’ll concentrate on ANNs with one hidden layer

Overview of the routine –

Fix architecture and sigmoid units within architecture z

i.e., number of units in hidden layer; the way the input units represent example; the way the output units categorises examples



Randomly assign weights to the the whole network



Use each example in the set to retrain the weights Have multiple epochs (iterations through training set)

z



z

Use small values (between –0.5 and 0.5)

Until some termination condition is met (not necessarily 100% acc)

4

Weight Training Calculations (Overview) z

Use notation wij to specify:

Weight between unit i and unit j



z z

Look at the calculation with respect to example E Going to calculate a value ∆ij for each wij And add ∆ij on to wij



z z

Do this by calculating error terms for each unit The error term for output units is found And then this information is used to calculate the error terms for the hidden units



z

So, the error is propagated back through the ANN

Propagate E through the Network z z

Feed E through the network (as in example above) Record the target and observed values for example E – – –

z

Note that for categorisation learning tasks, – –

z

i.e., determine weighted sum from hidden units, do sigmoid calc Let ti(E) be the target values for output unit i Let oi(E) be the observed value for output unit i Each ti(E) will be 0, except for a single tj(E), which will be 1 But oi(E) will be a real valued number between 0 and 1

Also record the outputs from the hidden units –

Let hi(E) be the output from hidden unit i

5

Error terms for each unit z

The Error Term for output unit k is calculated as:

z

The Error Term for hidden unit k is:

z

In English: – –

For hidden unit h, add together all the errors for the output units, multiplied by the appropriate weight. Then multiply this sum by hk(E)(1 – hk(E))

Final Calculations z z

Choose a learning rate, η (= 0.1 again, perhaps) For each weight wij – – –

z

For each weight wij – – –

z

Between input unit i and hidden unit j Calculate: Where xi is the input to the system to input unit i for E Between hidden unit i and output unit j Calculate: Where hi(E) is the output from hidden unit i for E

Finally, add on each ∆ij on to wij

6

Worked Backpropagation Example z

Start with the previous ANN

z

We will retrain the weights – – –

In the light of example E = (10,30,20) Stipulate that E should have been categorised as O1 Will use a learning rate of η = 0.1

Previous Calculations z

Need the calculations from when we propagated E through the ANN:

z t1(E) z

= 1 and t2(E) = 0 [from categorisation] o1(E) = 0.750 and o2(E) = 0.957

7

Error Values for Output Units z t1(E) z z

= 1 and t2(E) = 0 [from categorisation] o1(E) = 0.750 and o2(E) = 0.957 So:

Error Values for Hidden Units z z z

δO1 = 0.0469 and δO2 = -0.0394 h1(E) = 0.999 and h2(E) = 0.0067 So, for H1, we add together: –

(w11*δ01) + (w12*δO2) = (1.1*0.0469)+(3.1*-0.0394) = -0.0706



And multiply by: h1(E)(1-h1(E)) to give us: z

z

-0.0706 * (0.999 * (1-0.999)) = 0.0000705 = δH1

For H2, we add together: –

(w21*δ01) + (w22*δO2) = (0.1*0.0469)+(1.17*-0.0394) = -0.0414



And multiply by: h2(E)(1-h2(E)) to give us: z

-0.0414 * (0.067 * (1-0.067)) = -0.00259= δH2

8

Calculation of Weight Changes z

For weights between the input and hidden layer

Calculation of Weight Changes z

For weights between hidden and output layer

z

Weight changes are not very large – –

Small differences in weights can make big differences in calculations But it might be a good idea to increase η

9

Neural Network-Example 0.1

0.3

A=0.35

0.8

Output

0.4 0.1

B=0.9 0.6

Neural Network-Example (Contd.) z

Assume that the neurons have a Sigmoid activation function and (i) Perform a forward pass on the network. (ii) Perform a reverse pass (training) once (target = 0.5). (iii) Perform a further forward pass and comment on the result.

10

Neural Network-Example (Contd.) z

z z

z

Input to top neuron = (0.35x0.1)+(0.9x0.8)=0.755. Out = 0.68 (1/(1+e-0.755)=0.68). Input to bottom neuron = (0.9x0.6)+(0.35x0.4) = 0.68. Out = 0.6637. Input to final neuron = (0.3x0.68)+(0.9x0.6637) = 0.80133. Out = 0.69.

Neural Network-Example (Contd.) (ii) Output error δ=(t-o)(1-o)o = (0.5-0.69)(1-0.69)0.69 = -0.0406. New weights for output layer w1+ = w1+(δ x input) = 0.3 + (-0.0406x0.68) = 0.272392. w2+ = w2+(δ x input) = 0.9 + (-0.0406x0.6637) = 0.87305. Errors for hidden layers: δ1 = δ x w1 = -0.0406 x 0.272392 x (1-o)o = -2.406x10-3 δ2= δ x w2 = -0.0406 x 0.87305 x (1-o)o = -7.916x10-3 New hidden layer weights: w3+=0.1 + (-2.406 x 10-3 x 0.35) = 0.09916. w4+ = 0.8 + (-2.406 x 10-3 x 0.9) = 0.7978. w5+ = 0.4 + (-7.916 x 10-3 x 0.35) = 0.3972. w6+ = 0.6 + (-7.916 x 10-3 x 0.9) = 0.5928.

11

Neural Network-Example (iii) Now the network after first pass is given below 0.099 0.272

A=0.35

0.798

Output

0.397 0.873

B=0.9 0.593

Neural Network-Example (Contd.) z

z z z z

Input to top neuron = (0.35x0.099)+(0.9x0.798)=0.7528. Out = 0.6797 (1/(1+e-0.7528)=0.6797). Input to bottom neuron = (0.9x0.593)+(0.35x0.397) = 0.67265. Out = 0.662. Input to final neuron = (0.3x0.6797)+(0.9x0.662) = 0.7594. Out = 0.6812. Error=0.5-0.6812=-0.1812 Thus the error has reduced from -0.19 to -0.1812

12

Calculation of Network Error z

Could calculate Network error as

z

But there are multiple output units, with numerical output





z

Proportion of mis-categorised examples So we use a more sophisticated measure:

Not as complicated as it looks –

Square the difference between target and observed z

z

Squaring ensures we get a positive number

Add up all the squared differences –

For every output unit and every example in training set

Problems with Local Minima z

Therefore backpropagation –

z

Can find its ways into local minima

One partial solution: –

Random re-start: learn lots of networks z

– –

z

Starting with different random weight settings

Can take best network Or can set up a “committee” of networks to categorise examples

Another partial solution: Momentum

13

Adding Momentum z

Imagine rolling a ball down a hill

Gets stuck here

Without Momentum

With Momentum

Momentum in Backpropagation z

For each weight –

z

In the current epoch –

z

Remember what was added in the previous epoch

Add on a small amount of the previous ∆

The amount is determined by – –

The momentum parameter, denoted α α is taken to be between 0 and 1

14

How Momentum Works z

If direction of the weight doesn’t change – – – –

z

Then the movement of search gets bigger The amount of additional extra is compounded in each epoch May mean that narrow local minima are avoided May also mean that the convergence rate speeds up

Caution: – –

May not have enough momentum to get out of local minima Also, too much momentum might carry search z

Back out of the global minimum, into a local minimum

Problems with Overfitting z

Plot training example error versus test example error:

z

Test set error is increasing! – – –

Network is overfitting the data Learning idiosyncrasies in data, not general principles Big problem in Machine Learning (ANNs in particular)

15

Avoiding Overfitting z z

Bad idea to use training set accuracy to terminate One alternative: Use a validation set – – –

Hold back some of the training set during training Like a miniature test set (not used to train weights at all) If the validation set error stops decreasing, but the training set error continues decreasing z



Then it’s likely that overfitting has started to occur, so stop

Be careful, because validation set error could get into a local minima itself z

Worthwhile running the training for longer, and wait and see

Suitable Problems for ANNs z

Examples and target categorisation –

Can be expressed as real values z

z



Than understanding what the machine has learned z

z

Can take hours and days to train networks

Execution of learned function must be quick –

Learned networks can categorise very quickly z

z

Black box non-symbolic approach, not easy to digest

Slow training times are OK –

z

ANNs are just fancy numerical functions

Predictive accuracy is more important

Very useful in time critical situations (is that a tank, car or old lady?)

Error: ANNs are fairly robust to noise in data

16

Multi-Layer ANNs Multi-Layer Networks Built from ...

Say that the networks have “perceptron units” ... Note that for categorisation learning tasks,. – Each ti(E) will be 0, .... Squaring ensures we get a positive number.

1MB Sizes 0 Downloads 240 Views

Recommend Documents

EPUB Building Cisco Multilayer Switched Networks (Bcmsn ...
Building Cisco Multilayer Switched Networks (Bcmsn) (Authorized Self-Study Guide) ... Cisco(R) authorized, self-paced learning tool for CCNP(R) switching ... switched networks by installing, configuring, monitoring, and troubleshooting.

multilayer systems
The width of the first wide barrier for real systems can be taken equal to 8 instead of 20, required for perfect system with no losses. Accounting for losses shows that the increase of the first barrier only increases losses in the whole range of ene

Multilayer plastic substrates
Jul 12, 2004 - See application ?le for complete search history. (56). References Cited ..... mance Differences Arising From Use OF UV Or Electron.

Multilayer optical fiber coupler
Dec 6, 2003 - This patent is subject to a terminal dis claimer. (21) App1.No. ... cal Elements by the Use of EiBeam Directed Write on an. Analog Resist and a ...

Multilayer optical fiber coupler
Dec 6, 2003 - comprise glass having an index of refraction that approxi mately matches the index of the optical ?ber, and an optical epoxy is used to af?x the ...

Energy-Aware Design of Multilayer Core Networks - Semantic Scholar
of power consumption in future networks. Today's network ... future work. II. RELATED WORK. We provide an overview of the works focusing on the green network design. While the term “design” may be used in dif- ferent contexts, the main ..... an I

Multilayer Interference Filter.pdf
filter utilizes solid dielectric medium bound on each side by a thin metal semitransparent film,. and it was called a Metal Dielectric Metal (MDM) interferometer. An even later form utilizes. stacks of thin films for both the metal and the dielectric

Energy-Aware Design of Multilayer Core Networks - Semantic Scholar
Today's network core segments are usually implemented using two separate layers: an optical layer exploiting the WDM technology and an electrical layer taking care of transporting IP traffic. .... interfaces, diversified lightpath capacities, multipl

Multilayer reverse osmosis membrane of polyamide-urea
Oct 29, 1991 - support of polyester sailcloth. The solution is cast at a knife clearance of 5.5 mi]. The sailcloth bearing the cast polyethersulfone solution is ...

Exponentiated backpropagation algorithm for multilayer ...
3Department of Mathematics and Computer Applications. 4Department of Information Technology. Sri Venkateswara College of Engineering, Sriperumbudur ...

MULTILAYER PERCEPTRON WITH SPARSE HIDDEN ...
Center for Language and Speech Processing,. Human Language Technology, Center of ... phoneme recognition task, the SMLP based system trained using perceptual linear prediction (PLP) features performs ..... are extracted from the speech signal by usin

Nano-sized Ag-inserted amorphous ZnSnO3 multilayer ...
X-ray photoelectron spectroscopy (XPS) depth profiling. After the electrodes ..... agreement with the measured transmittance data (Fig. 5(a)). Similar increase in ...

Capacity of Multicarrier Multilayer Broadcast and ...
research on offering HDTV through mobile network is in progress and some ... telephony and data access (private information) over bi- directional channels ...

Collective Mechanical Behavior of Multilayer Colloidal Arrays of ...
Mar 14, 2012 - Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 ... of Mechanical Engineering and Materials Science, Rice University, ...

Sparse Multilayer Perceptron for Phoneme Recognition
Center of Excellence, Johns Hopkins University, Baltimore, USA, (phone:+1-. 410-516-7031; fax ..... (STFT) is applied on the speech signal with an analysis win-.

Sparse Multilayer Perceptron for Phoneme Recognition
[12], [13], and a support vector machine in [6]. III. THEORY ..... 365–370, 2009. [5] D. Imseng .... International Computer Science Institute at Berkeley,. California.

Nano-sized Ag-inserted amorphous ZnSnO3 multilayer ...
In spite of the potential of organic solar cells (OSCs) as low-cost and eco-friendly energy-harvesting devices [1–5], the interface instability between poly(3 ...

Collective Mechanical Behavior of Multilayer Colloidal Arrays of ...
Mar 14, 2012 - Department of Materials Science and Engineering,. §. Institute for Soldier Nanotechnologies,. Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States. ∥. Department of Mechanical Engineering and Material

Multilayer printed wiring board and its manufacturing method, and ...
Oct 12, 1990 - metal ions causes development of a short circuit between the conductor layer and ...... ide to obtain a substrate provided With conductor circuits.

Multilayer printed wiring board and its manufacturing method, and ...
Oct 12, 1990 - 0a. 27, 2009. Sheet 1 of8. US RE40,947 E. E: 2 E. 2N vow m2: x: d we. 5 w. mcN cc." Fig.1 um ...... By Way of illustration, the core substrate is obtained by laminating ... exhibit a satisfactory anchoring effect. A complexing agent ..

Multilayer Silicon Nitride-on-Silicon Integrated Photonic Platform for ...
[email protected], [email protected]. Abstract: A photonic platform with two passive silicon nitride layers atop an active silicon layer is demonstrated.

Marcus, The Algebraic Mind, Multilayer Perceptrons.pdf
Page 3 of 28. Marcus, The Algebraic Mind, Multilayer Perceptrons.pdf. Marcus, The Algebraic Mind, Multilayer Perceptrons.pdf. Open. Extract. Open with. Sign In.