The delta rule Learn from your mistakes

Yüklə 178 Kb.

The delta rule

Learn from your mistakes

If it ain’t broke, don’t fix it.

Outline

Supervised learning

Example: handwritten digits

Delta rule

Two types of mistakes

Objective function

Perceptron convergence theorem

If examples are nonseparable

Memorization & generalization

contrast with Hebb rule

Is the delta rule biological?

Objective function

Supervised vs. unsupervised

Smooth activation function

Objective function

Smooth activation functions are important for generalizing the delta rule to multilayer perceptrons.

Yüklə 178 Kb.

Dostları ilə paylaş:

The delta rule Learn from your mistakes

The delta rule

Learn from your mistakes

If it ain’t broke, don’t fix it.

Outline

Supervised learning problem

Delta rule

Delta rule as gradient descent

Hebb rule

Supervised learning

Given examples

Example: handwritten digits

Find a perceptron that detects “two”s.

Delta rule

Learning from mistakes.

“delta”: difference between desired and actual output.

Also called “perceptron learning rule”

Two types of mistakes

False positive

False negative

The update is always proportional to x.

Objective function

Gradient update

Stochastic gradient descent on

E=0 means no mistakes.

Perceptron convergence theorem

Cycle through a set of examples.

Suppose a solution with zero error exists.

The perceptron learning rule finds a solution in finite time.

If examples are nonseparable

The delta rule does not converge.

Objective function is not equal to the number of mistakes.

No reason to believe that the delta rule minimizes the number of mistakes.

Memorization & generalization

Prescription: minimize error on the training set of examples

What is the error on a test set of examples?

Vapnik-Chervonenkis theory

contrast with Hebb rule

Assume that the teacher can drive the perceptron to produce the desired output.

What are the objective functions?

Is the delta rule biological?

Actual output: anti-Hebbian

Desired output: Hebbian

Contrastive

Objective function

Hebb rule

Delta rule

Supervised vs. unsupervised

Classification vs. generation

I shall not today attempt further to define the kinds of material [pornography] … but I know it when I see it.

Smooth activation function

same except for slope of f

update is small when the argument of f has large magnitude.

Objective function

Gradient update

Stochastic gradient descent on

E=0 means zero error.

Smooth activation functions are important for generalizing the delta rule to multilayer perceptrons.