|
![](/i/favi32.png) April 22nd-28th 2023 Ukraine’s game planThe EconomistTrailblazing a daze
The
LLM
’s attention network is key to
learning from such vast amounts of data. It
builds into the model a way to learn and
use associations between words and con
cepts even when they appear at a distance
from each other within a text, and it allows
it to process reams of data in a reasonable
amount of time. Many different attention
networks operate in parallel within a typi
cal
LLM
and this parallelisation allows the
process to be run across multiple
GPU
s.
Older, nonattentionbased versions of
language models would not have been able
to process such a quantity of data in a rea
sonable amount of time. “Without atten
tion, the scaling would not be computa
tionally tractable,” says Yoshua Bengio, sci
entific director of Mila, a prominent
AI
re
search institute in Quebec.
The sheer scale at which
LLM
s can pro
cess data has been driving their recent
growth.
GPT
3 has hundreds of layers, bil
lions of weights, and was trained on hun
dreds of billions of words. By contrast, the
first version of
GPT
, created five years ago,
was just one tenthousandth of the size.
But there are good reasons, says Dr Ben
gio, to think that this growth cannot con
tinue indefinitely. The inputs of
LLM
s—da
ta, computing power, electricity, skilled la
bour—cost money. Training
GPT
3, for ex
ample, used 1.3 gigawatthours of
electricity (enough to power 121 homes in
America for a year), and cost Open
AI
an es
timated $4.6m.
GPT
4, which is a much
larger model, will have cost disproportion
ately more (in the realm of $100m) to train.
Since computingpower requirements
scale up dramatically faster than the input
data, training
LLM
s gets expensive faster
than it gets better. Indeed, Sam Altman, the
boss of Open
AI
, seems to think an inflec
tion point has already arrived. On April 13th
he told an audience at the Massachusetts
Institute of Technology: “I think we’re at
the end of the era where it’s going to be
these, like, giant, giant models. We’ll make
them better in other ways.”
But the most important limit to the
continued improvement of
LLM
s is the
amount of training data available.
GPT
3
has already been trained on what amounts
to all of the highquality text that is avail
able to download from the internet. A pa
per published in October 2022 concluded
that “the stock of highquality language
data will be exhausted soon; likely before
2026.” There is certainly more text avail
able, but it is locked away in small
amounts in corporate databases or on per
sonal devices, inaccessible at the scale and
low cost that Common Crawl allows.
Computers will get more powerful over
time, but there is no new hardware forth
coming which offers a leap in performance
as large as that which came from using
GPU
s in the early 2010s, so training larger
models will probably be increasingly ex
pensive—perhaps why Mr Altman is not
enthused by the idea. Improvements are
possible, including new kinds of chips
such as Google’s Tensor Processing Units,
but the manufacturing of chips is no lon
ger improving exponentially through
Moore’s law and shrinking circuits.
There will also be legal issues. Stability
AI
, a company which produces an image
generation model called Stable Diffusion,
has been sued by Getty Images, a photogra
phy agency. Stable Diffusion’s training data
comes from the same place as
GPT
3 and
GPT
4, Common Crawl, and it processes it
in very similar ways, using attention net
works. Some of the most striking examples
of
AI
’s generative prowess have been imag
es. People on the internet are now regularly
getting caught up in excitement about ap
parent photos of scenes that never took
place: the pope in a Balenciaga jacket; Do
nald Trump being arrested.
Getty points to images produced by
Stable Diffusion which contain its copy
right watermark, suggesting that Stable
Diffusion has ingested and is reproducing
copyrighted material without permission
(Stability
AI
has not yet commented pub
licly on the lawsuit). The same level of evi
dence is harder to come by when examin
ing Chat
GPT
’s text output, but there is no
doubt that it has been trained on copy
righted material. Open
AI
will be hoping
that its text generation is covered by “fair
use”, a provision in copyright law that al
lows limited use of copyrighted material
for “transformative” purposes. That idea
will probably one day be tested in court.
Dostları ilə paylaş: |
|
|