70
The Economist
April 22nd 2023
Science & technology
vertigo caused by software which has im
proved suddenly to the point where it can
perform tasks that had been exclusively in
the domain of human intelligence.
Despite that feeling of magic, an
LLM
is,
in reality, a giant exercise in statistics.
Prompt Chat
GPT
to finish the sentence:
“The promise of large language models is
that they…” and you will get an immediate
response. How does it work?
These written words
First, the language of the query is convert
ed from words, which neural networks
cannot handle, into a representative set of
numbers (see graphic).
GPT
3, which po
wered an earlier version of Chat
GPT
, does
this by splitting text into chunks of charac
ters, called tokens, which commonly occur
together. These tokens can be words, like
“love” or “are”, affixes, like “dis” or “ised”,
and punctuation, like “?”.
GPT
3’s dictio
nary contains details of 50,257 tokens.
GPT
3 is able to process a maximum of
2,048 tokens at a time, which is around the
length of a long article in
The Economist
.
GPT
4, by contrast, can handle inputs up to
32,000 tokens long—a novella. The more
text the model can take in, the more con
text it sees, and the better its answers will
be. There is a catch—the required compu
tation rises exponentially with the length
of the input, meaning slightly longer in
puts need much more computing power.
The tokens are then assigned the equiv
alent of definitions by embedding them
into a “meaning space” (as shown in the
top right quadrant of the graphic) where
words that have similar meanings are lo
cated in nearby areas.
The
LLM
then deploys its “attention net
work” to make connections between dif
ferent parts of the prompt. Someone read
ing our prompt, “the promise of large lan
guage models is that they…”, would know
how English grammar works and under
stand the concepts behind the words in the
sentence. It would be obvious to them
which words relate to each other—it is the
model that is large, for example. An
LLM
,
however, must learn these associations
from scratch during its training phase—
over billions of training runs, its attention
network slowly encodes the structure of
the language it sees as numbers (called
“weights”) within its neural network. If it
understands language at all, an
LLM
only
does so in a statistical, rather than a gram
matical, way. It is much more like an aba
cus than it is like a mind.
Once the prompt has been processed,
the
LLM
initiates a response. At this point,
for each of the tokens in the model’s vocab
ulary, the attention network has produced
a probability of that token being the most
appropriate one to use next in the sentence
it is generating. The token with the highest
probability score is not always the one cho
can Uniform Bar Examination, designed to
test the skills of lawyers before they be
come licensed, in the 90th percentile. The
slightly smaller
GPT
3.5 flunked it.
Emergent abilities are exciting, because
they hint at the untapped potential of
LLM
s. Jonas Degrave, an engineer at Deep
Mind, an
AI
research company owned by
Alphabet, has shown that Chat
GPT
can be
convinced to act like the command line
terminal of a computer, appearing to com
pile and run programs accurately. Just a lit
tle bigger, goes the thinking, and the mod
els may suddenly be able to do all manner
of useful new things. But experts worry for
the same reason. One analysis shows that
certain social biases emerge when models
become large. It is not easy to tell what
harmful behaviours might be lying dor
mant, waiting for just a little more scale in
order to be unleashed.
Dostları ilə paylaş: