April 22nd-28th 2023 Ukraine’s game plan

Yüklə 5,71 Mb.

Pdf görüntüsü

səhifə	95/110
tarix	29.11.2023
ölçüsü	5,71 Mb.
	#139759

1 ... 91 92 93 94 95 96 97 98 ... 110

The Economist

Process the data
The recent success of
LLM
s in generating
convincing text, as well as their startling
emergent abilities, is due to the coales
cence of three things: vast quantities of da
ta, algorithms capable of learning from
them and the computational power to do
so (see chart on next page). The details of
GPT
4’s construction and function are not
yet public, but those of
GPT
3 were pub
lished in 2020 by Open
AI
in a paper titled
→
Continue this sentence: “The promise of large language models is that they ________”
Sources: OpenAI; Hugging Face
The
promise
of
large
language
models
is
that
they
The
promise
of
large
language
models
is
that
they
0.60
0.05
0.08
0.07
0.01
0.02
0.12
0.04
0.01
3. Attention
The model weights each pair of tokens, showing how
much to “pay attention” to one when processing the other.
This bakes in an ability to form connections
1. Tokenisation
The text input is converted into a sequence of numbers
by splitting it into tokens and mapping these tokens to
their unique IDs
2. Embedding
Token IDs are put into a “meaning space” (2D shown;
GPT-3’s is 12,288D). Words with similar meanings tend to
be located closer together
4. Completion
The result after attention is a series of probabilities, for
each token in the LLM’s vocabulary, of that word going
next. The model chooses one. Rinse and repeat
promise
potential
potentiality
aptitude
vocabulary
tongue
speech
talent
ability
capability
capacity
model
language
representation
facsimile
replica
imitation
lookalike
duplicate
will
11%
are
7%
capture
2%
could
2%
The promise of large language
models is that they
can
62%
↓
The promise of large language
models is that they
______
464 6991 286 1588 3303
4981 318 326 484
sen for the response—how the
LLM
makes
this choice depends on how creative the
model has been told to be by its operators.
The
LLM
generates a word and then
feeds the result back into itself. The ﬁrst
word is generated based on the prompt
alone. The second word is generated by in
cluding the ﬁrst word in the response, then
the third word by including the ﬁrst two
generated words, and so on. This process—
called autoregression—repeats until the
LLM
has ﬁnished.
Although it is possible to write down
the rules for how they work,
LLM
s’ outputs
are not entirely predictable; it turns out
that these extremely big abacuses can do
things which smaller ones cannot, in ways
which surprise even the people who make
them. Jason Wei, a researcher at Open
AI
,
has counted 137 socalled “emergent” abili
ties across a variety of diﬀerent
LLM
s.
The abilities that emerge are not mag
ic—they are all represented in some form
within the
LLM
s’ training data (or the
prompts they are given) but they do not be
come apparent until the
LLM
s cross a cer
tain, very large, threshold in their size. At
one size, an
LLM
does not know how to
write genderinclusive sentences in Ger
man any better than if it was doing so at
random. Make the model just a little big
ger, however, and all of a sudden a new
ability pops out.
GPT
4 passed the Ameri
012

71
The Economist
April 22nd 2023
Science & technology
“Language Models are FewShot Learners”.
Before it sees any training data, the
weights in
GPT
3’s neural network are
mostly random. As a result, any text it gen
erates will be gibberish. Pushing its output
towards something which makes sense,
and eventually something that is ﬂuent,
requires training.
GPT
3 was trained on
several sources of data, but the bulk of it
comes from snapshots of the entire inter
net between 2016 and 2019 taken from a da
tabase called Common Crawl. There’s a lot
of junk text on the internet, so the initial 45
terabytes were ﬁltered using a diﬀerent
machinelearning model to select just the
highquality text: 570 gigabytes of it, a da
taset that could ﬁt on a modern laptop. In
addition,
GPT
4 was trained on an un
known quantity of images, probably sever
al terabytes. By comparison AlexNet, a
neural network that reignited imagepro
cessing excitement in the 2010s, was
trained on a dataset of 1.2m labelled imag
es, a total of 126 gigabytes—less than a
tenth of the size of
GPT
4’s likely dataset.
To train, the
LLM
quizzes itself on the
text it is given. It takes a chunk, covers up
some words at the end, and tries to guess
what might go there. Then the
LLM
un
covers the answer and compares it to its
guess. Because the answers are in the data
itself, these models can be trained in a
“selfsupervised” manner on massive da
tasets without requiring human labellers.
The model’s goal is to make its guesses
as good as possible by making as few errors
as possible. Not all errors are equal,
though. If the original text is “I love ice
cream”, guessing “I love ice hockey” is bet
ter than “I love ice are”. How bad a guess is
is turned into a number called the loss.
After a few guesses, the loss is sent back
into the neural network and used to nudge
the weights in a direction that will produce
better answers.

Yüklə 5,71 Mb.

Dostları ilə paylaş:

1 ... 91 92 93 94 95 96 97 98 ... 110