73
The Economist
April 22nd 2023
Science & technology
duce the likelihood of producing harmful
content when given similar prompts in the
future. One obvious drawback of this
method is that humans themselves often
disagree about what counts as “appropri
ate”. An irony, says one
AI
researcher, is
that
RLHF
also made Chat
GPT
far more ca
pable in conversation, and therefore
helped propel the
AI
race.
Another approach, borrowed from war
gaming, is called “redteaming”. Open
AI
worked with the Alignment Research Cen
tre (
ARC
), a nonprofit, to put its model
through a battery of tests. The redteamer’s
job was to “attack” the model by getting it
to do something it should not, in the hope
of anticipating mischief in the real world.
Dostları ilə paylaş: