|
![](/i/favi32.png) April 22nd-28th 2023 Ukraine’s game planThe EconomistIt’s a long long road...
Such techniques certainly help. But users
have already found ways to get
LLM
s to do
things their creators would prefer they did
not. When Microsoft Bing’s chatbot was
first released it did everything from threat
ening users who had made negative posts
about it to explaining how it would coax
bankers to reveal sensitive information
about their clients. All it required was a bit
of creativity in posing questions to the
chatbot and a sufficiently long conversa
tion. Even
GPT
4, which has been exten
sively redteamed, is not infallible. So
called “jailbreakers” have put together
websites littered with techniques for get
ting around the model’s guardrails, such as
by telling the model that it is roleplaying
in a fictional world.
Sam Bowman of New York University
and also of Anthropic, an
AI
firm, thinks
that prelaunch screening “is going to get
harder as systems get better”. Another risk
is that
AI
models learn to game the tests,
says Holden Karnofsky, an adviser to
ARC
and former board member of Open
AI
. Just
as people “being supervised learn the pat
terns…they learn how to know when some
one is trying to trick them”. At some point
AI
systems might do that, he thinks.
Another idea is to use
AI
to police
AI
. Dr
Bowman has written papers on techniques
like “Constitutional
AI
”, in which a second
ary
AI
model is asked to assess whether
output from the main model adheres to
certain “constitutional principles”. Those
critiques are then used to finetune the
main model. One attraction is that it does
not need human labellers. And computers
tend to work faster than people, so a con
stitutional system might catch more pro
blems than one tuned by humans alone—
though it leaves open the question of who
writes the constitution. Some researchers,
including Dr Bowman, think what ulti
mately may be necessary is what
AI
re
searchers call “interpretability”—a deep
understanding of how exactly models pro
duce their outputs. One of the problems
with machinelearning models is that they
are “black boxes”. A conventional program
is designed in a human’s head before being
committed to code. In principle, at least,
that designer can explain what the mach
ine is supposed to be doing. But machine
learning models program themselves.
What they come up with is often incom
prehensible to humans.
Progress has been made on very small
models using techniques like “mechanis
tic interpretability”. This involves reverse
engineering
AI
models, or trying to map in
dividual parts of a model to specific pat
terns in its training data, a bit like neuro
scientists prodding living brains to work
out which bits seem to be involved in vi
sion, say, or memory. The problem is this
method becomes exponentially harder
with bigger models.
The lack of progress on interpretability
is one reason why many researchers say
that the field needs regulation to prevent
“extreme scenarios”. But the logic of com
merce often pulls in the opposite direc
tion: Microsoft recently fired its
AI
ethics
team, for example. Indeed, some research
ers think the true “alignment” problem is
that
AI
firms, like polluting factories, are
not aligned with the aims of society. They
financially benefit from powerful models
but do not internalise the costs borne by
the world of releasing them prematurely.
Even if efforts to produce “safe” models
work, future opensource versions could
get around them. Bad actors could fine
tune models to be unsafe, and then release
them publicly. For example
AI
models have
already made new discoveries in biology. It
is not inconceivable that they one day de
sign dangerous biochemicals. As
AI
pro
gresses, costs will fall, making it far easier
for anyone to access them. Alpaca, a model
built by academics on top of
LL
a
MA
, an
AI
developed by Meta, was made for less than
$600. It can do just as well as an older ver
sion of Chat
GPT
on individual tasks.
The most extreme risks, in which
AI
s
become so clever as to outwit humanity,
seem to require an “intelligence explo
sion”, in which an
AI
works out how to
make itself cleverer. Mr Karnofsky thinks
that is plausible if
AI
could one day auto
mate the process of research, such as by
improving the efficiency of its own algo
rithms. The
AI
system could then put itself
into a selfimprovement “loop” of sorts.
That is not easy. Matt Clancy, an econo
mist, has argued that only full automation
would suffice. Get 90% or even 99% of the
way there, and the remaining, humande
pendent fraction will slow things down.
Few researchers think that a threaten
ing (or oblivious) superintelligence is
close. Indeed, the
AI
researchers them
selves may even be overstating the long
term risks. Ezra Karger of the Chicago Fed
eral Reserve and Philip Tetlock of the Uni
versity of Pennsylvania pitted
AI
experts
against “superforecasters”, people who
have strong track records in prediction and
have been trained to avoid cognitive bias
es. In a study to be published this summer,
they find that the median
AI
expert gave a
3.9% chance to an existential catastrophe
(where fewer than 5,000 humans survive)
owing to
AI
by 2100. The median superfore
caster, by contrast, gave a chance of 0.38%.
Why the difference? For one,
AI
experts
may choose their field precisely because
they believe it is important, a selection
bias of sorts. Another is they are not as sen
sitive to differences between small proba
bilities as the forecasters are.
Dostları ilə paylaş: |
|
|