The delta rule Learn from your mistakes
Yüklə
178 Kb.
tarix
17.09.2018
ölçüsü
178 Kb.
#68939
The
delta rule
Learn from your mistakes
If it ain’t broke, don’t fix it.
Outline
Supervised learning problem
Delta rule
Delta rule as gradient descent
Hebb rule
Supervised
learning
Given examples
Example: handwritten digits
Find a perceptron that detects “two”s.
Delta rule
Learning from mistakes.
“delta”: difference between desired and actual output.
Also called “perceptron learning rule”
Two types of mistakes
False
positive
Make
w
less like
x
.
False negative
Make
w
more like
x
.
The update
is always proportional to
x.
Objective function
Gradient update
Stochastic
gradient descent on
E
=0 means no mistakes.
Perceptron convergence theorem
Cycle through a set of examples.
Suppose a solution with zero error exists.
The perceptron learning rule finds a solution in finite time.
If
examples are nonseparable
The delta rule does not converge.
Objective function is not equal to the number of mistakes.
No reason to believe that the delta rule minimizes the number of mistakes.
Memorization & generalization
Prescription: minimize error on the training set of examples
What is the error on a test set of examples?
Vapnik-Chervonenkis theory
assumption: examples are drawn from a probability distribution
conditions for generalization
contrast
with Hebb rule
Assume that the teacher can drive the perceptron to produce the desired output.
What are the objective functions?
Is the delta rule biological?
Actual output: anti-Hebbian
Desired output: Hebbian
Contrastive
Objective function
Hebb rule
distance from inputs
Delta rule
error
in reproducing the output
Supervised vs. unsupervised
Classification vs. generation
I shall not today attempt further to define the kinds of material [pornography] … but I know it when I see it.
Justice Potter Stewart
Smooth
activation function
same except for slope of f
update is small when the argument of
f
has large magnitude.
Objective function
Gradient update
Stochastic gradient descent on
E
=0 means zero error.
Smooth activation functions are important for generalizing the delta rule to multilayer perceptrons.
Yüklə
178 Kb.
Dostları ilə paylaş:
Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət
Ana səhifə
Psixologiya