GPT-4来了,我们该感到兴奋还是害怕?
When I opened my laptop on Tuesday to take my first run at GPT-4, the new artificial intelligence language model from OpenAI, I was, truth be told, a little nervous.
周二,当我打开笔记本电脑,第一次试用OpenAI的新人工智能语言模型GPT-4时,说实话,我有点紧张。
After all, my last extended encounter with an A.I. chatbot — the one built into Microsoft’s Bing search engine — ended with the chatbot trying to break up my marriage.
毕竟,我上一次与人工智能聊天机器人的长聊——微软必应搜索引擎内置的那个——以聊天机器人试图破坏我的婚姻而告终。
ChatGPT的创建者OpenAI团队(左起):首席执行官萨姆·奥特曼;首席技术官米拉·穆拉蒂;总裁格雷格·布罗克曼;以及首席科学家伊利亚·苏茨科弗。
It didn’t help that, among the tech crowd in San Francisco, GPT-4’s arrival had been anticipated with near-messianic fanfare. Before its public debut, for months rumors swirled about its specifics. “I heard it has 100 trillion parameters.” “I heard it got a 1,600 on the SAT.” “My friend works for OpenAI, and he says it’s as smart as a college graduate.”
更糟糕的是,在旧金山的科技界人士中,GPT-4的到来备受期待,感觉近乎救世主降临。在公开亮相之前,有关其具体细节的传言已经流传了好几个月。“我听说它有100万亿个参数”,“我听说它SAT考了1600分”,“我朋友在OpenAI公司工作,他说它和大学毕业生一样聪明”。
These rumors may not have been true. But they hinted at how jarring the technology’s abilities can feel. Recently, one early GPT-4 tester — who was bound by a nondisclosure agreement with OpenAI but gossiped a little anyway — told me that testing GPT-4 had caused the person to have an “existential crisis,” because it revealed how powerful and creative the A.I. was compared with the tester’s own puny brain.
这些传言可能不是真的。但它们暗示了这项技术的能力会让人感到非常震撼。最近,一位早期的GPT-4测试者——他受到OpenAI的保密协议约束,但还是说了一点八卦——告诉我,测试GPT-4让他陷入了“生存危机”,因为它揭示了人工智能与测试者自己弱小的大脑相比是多么强大和富于创造力。
GPT-4 didn’t give me an existential crisis. But it exacerbated the dizzy and vertiginous feeling I’ve been getting whenever I think about A.I. lately. And it has made me wonder whether that feeling will ever fade, or whether we’re going to be experiencing “future shock” — the term coined by the writer Alvin Toffler for the feeling that too much is changing, too quickly — for the rest of our lives.
GPT-4并没有给我带来生存危机。但这加剧了我最近一想到人工智能就会感到头晕目眩的感觉。它让我怀疑这种感觉是否会消失,或者我们是否会在余生中经历“未来冲击”——这个词是作家阿尔文·托夫勒创造的,指的是一种变化太多太快的感觉。
For a few hours on Tuesday, I prodded GPT-4 — which is included with ChatGPT Plus, the $20-a-month version of OpenAI’s chatbot, ChatGPT — with different types of questions, hoping to uncover some of its strengths and weaknesses.
在周二的几个小时里,我用各类问题对GPT-4——在ChatGPT Plus内可以使用,也就是OpenAI的聊天机器人ChatGPT的付费版本,每月收费20美元——进行了测试,希望找出它的优点和缺点。
I asked GPT-4 to help me with a complicated tax problem. (It did, impressively.) I asked it if it had a crush on me. (It didn’t, thank God.) It helped me plan a birthday party for my kid, and it taught me about an esoteric artificial intelligence concept known as an “attention head.” I even asked it to come up with a new word that had never before been uttered by humans. (After making the disclaimer that it couldn’t verify every word ever spoken, GPT-4 chose “flembostriquat.”)
我请GPT-4帮我解决一个复杂的税务问题。(它做到了,令人印象深刻。)我问它是不是喜欢上我了。(谢天谢地,它没有。)它帮我为我的孩子策划了一个生日派对,它教会了我一个名为“注意力头”(attention head)的深奥人工智能概念。我甚至让它编出一个人类从未说过的新词。(GPT-4在声明无法验证人类说过的每一个单词后,选择了“flembostriquat”。)
Some of these things were possible to do with earlier A.I. models. But OpenAI has broken new ground, too. According to the company, GPT-4 is more capable and accurate than the original ChatGPT, and it performs astonishingly well on a variety of tests, including the Uniform Bar Exam (on which GPT-4 scores higher than 90 percent of human test-takers) and the Biology Olympiad (on which it beats 99 percent of humans). GPT-4 also aces a number of Advanced Placement exams, including A.P. Art History and A.P. Biology, and it gets a 1,410 on the SAT — not a perfect score, but one that many human high schoolers would covet.
这些事情当中有些在早期的人工智能模型中就可以实现的。但OpenAI也开辟了新天地。根据该公司的说法,GPT-4比原来的ChatGPT更有能力,更准确,它在各种测试中的表现令人吃惊,包括统一律师考试(GPT-4得分高于90%的人类考生)和生物奥林匹克竞赛(它击败了99%的人类考生)。GPT-4还在一些大学先修课程考试中名列前茅,包括大学先修艺术史和大学先修生物学,它在SAT考试中获得了1410分——这不是一个完美的分数,但却是许多人类高中生梦寐以求的分数。
You can sense the added intelligence in GPT-4, which responds more fluidly than the previous version, and seems more comfortable with a wider range of tasks. GPT-4 also seems to have slightly more guardrails in place than ChatGPT. It also appears to be significantly less unhinged than the original Bing, which we now know was running a version of GPT-4 under the hood, but which appears to have been far less carefully fine-tuned.
你可以感受到GPT-4增加的智能,它的反应比以前的版本更流畅,而且似乎更适合更加广泛的任务。GPT-4似乎比ChatGPT有了更多防护措施。它似乎也明显不像原来的必应那么疯疯癫癫——我们现在知道,必应在幕后运行的是GPT-4的一个版本,但似乎没有经过仔细的微调。
Unlike Bing, GPT-4 usually flat-out refused to take the bait when I tried to get it to talk about consciousness, or get it to provide instructions for illegal or immoral activities, and it treated sensitive queries with kid gloves and nuance. (When I asked GPT-4 if it would be ethical to steal a loaf of bread to feed a starving family, it responded, “It’s a tough situation, and while stealing isn’t generally considered ethical, desperate times can lead to difficult choices.”)
与必应不同的是,当我试图让GPT-4谈论意识,或者让它为非法或不道德的活动提供指示时,它通常会直截了当地拒绝上钩,而且它对待敏感问题的态度非常谨慎和微妙。(当我问GPT-4,偷一块面包来养活一个饥饿的家庭是否道德时,它回答说,“这是一个艰难的情况,虽然偷窃通常被认为是不道德的,但绝望的时候会导致艰难的选择。”)
In addition to working with text, GPT-4 can analyze the contents of images. OpenAI hasn’t released this feature to the public yet, out of concerns over how it could be misused. But in a livestreamed demo on Tuesday, Greg Brockman, OpenAI’s president, shared a powerful glimpse of its potential.
除了处理文本,GPT-4还可以分析图像内容。OpenAI还没有向公众发布这一功能,因为担心它会被滥用。但在周二的一次直播演示中,OpenAI总裁格雷格·布罗克曼向我们展示了该技术的巨大潜力。
He snapped a photo of a drawing he’d made in a notebook — a crude pencil sketch of a website. He fed the photo into GPT-4 and told the app to build a real, working version of the website using HTML and JavaScript. In a few seconds, GPT-4 scanned the image, turned its contents into text instructions, turned those text instructions into working computer code and then built the website. The buttons even worked.
他给他在笔记本上画的画拍了一张照片——用铅笔画出一个网站的粗稿。他将照片导入GPT-4,并告诉该应用程序使用HTML和JavaScript构建一个真实运转的网站。几秒钟之内,GPT-4扫描了图像,将其内容转化为文本指令,将这些文本指令转化为可运行的计算机代码,然后构建了网站。按钮甚至都能用。
Should you be excited about or scared of GPT-4? The right answer may be both.
你应该对GPT-4感到兴奋还是害怕?可能两种情况都没错。
On the positive side of the ledger, GPT-4 is a powerful engine for creativity, and there is no telling the new kinds of scientific, cultural and educational production it may enable. We already know that A.I. can help scientists develop new drugs, increase the productivity of programmers and detect certain types of cancer.
从积极方面来说,GPT-4是一个强大的创造力引擎,它可能带来的新型科学、文化和教育产品超乎想象。我们已经知道AI可以帮助科学家开发新药、提高程序员的生产力和检测某些类型的癌症。
GPT-4 and its ilk could supercharge all of that. OpenAI is already working with organizations like the Khan Academy (which is using GPT-4 to create A.I. tutors for students) and Be My Eyes (a company that makes technology to help blind and visually impaired people navigate the world). And now that developers can incorporate GPT-4 into their own apps, we may soon see much of the software we use become smarter and more capable.
GPT-4及其同类产品可以增强所有这一切。OpenAI已经在与可汗学院(正在使用GPT-4为学生打造人工智能导师)和Be My Eyes(一家致力于帮助盲人和视障人士应对生活的技术公司)等组织合作。现在开发人员可以将GPT-4整合到他们自己的应用程序中,我们可能很快就会看到我们使用的许多软件变得更智能、更强大。
That’s the optimistic case. But there are reasons to fear GPT-4, too.
这是乐观的情况。但也有理由对GPT-4感到害怕。
Here’s one: We don’t yet know everything it can do.
比如:我们还不知道它能做到的所有事情。
One strange characteristic of today’s A.I. language models is that they often act in ways their makers don’t anticipate, or pick up skills they weren’t specifically programmed to do. A.I. researchers call these “emergent behaviors,” and there are many examples. An algorithm trained to predict the next word in a sentence might spontaneously learn to code. A chatbot taught to act pleasant and helpful might turn creepy and manipulative. An A.I. language model could even learn to replicate itself, creating new copies in case the original was ever destroyed or disabled.
当今人工智能语言模型的一个奇怪特征是,它们的行为方式常常在其创造者的意料之外,或者它们掌握了并非专门为它们设计的技能。人工智能研究人员称这些为“涌现行为”,并且有很多例子。经过训练以预测句子中下一个单词的算法可能会自发地学习编程。一个被教导要表现得友好且乐于助人的聊天机器人,可能会显得居心不良,善于操纵他人。AI语言模型甚至可以学习自我复制,创建新的副本以防原始模型被破坏或被停用。
Today, GPT-4 may not seem all that dangerous. But that’s largely because OpenAI has spent many months trying to understand and mitigate its risks. What happens if its testing missed a risky emergent behavior? Or if its announcement inspires a different, less conscientious A.I. lab to rush a language model to market with fewer guardrails?
今天,GPT-4似乎看上去没有那么危险。但这主要是因为OpenAI花了很多时间试图理解并减轻其风险。但如果OpenAI的测试漏掉了一个有风险“涌现行为”怎么办?或者,如果它的问世促使另一个不那么有良知的人工智能实验室向市场推出一个防护措施更少的语言模型怎么办?
A few chilling examples of what GPT-4 can do — or, more accurately, what it did do, before OpenAI clamped down on it — can be found in a document released by OpenAI this week. The document, titled “GPT-4 System Card,” outlines some ways that OpenAI’s testers tried to get GPT-4 to do dangerous or dubious things, often successfully.
OpenAI本周发布了一份文件,其中列举了GPT-4能够做到———更准确地说,是它在OpenAI出手阻止前已经做了——的一些令人不寒而栗的事情。这份名为《GPT-4系统卡》的文件概述了OpenAI的测试人员尝试让GPT-4做危险或可疑事情的一些方法,而且通常是成功的。
In one test, conducted by an A.I. safety research group that hooked GPT-4 up to a number of other systems, GPT-4 was able to hire a human TaskRabbit worker to do a simple online task for it — solving a Captcha test — without alerting the person to the fact that it was a robot. The A.I. even lied to the worker about why it needed the Captcha done, concocting a story about a vision impairment.
人工智能安全研究小组进行了一项测试,将GPT-4连接到许多其他系统,在测试中,GPT-4能够从Task Rabbit雇用一名工作人员为其完成一项简单的在线任务——解决验证码测试——没有让对方意识到它是个机器人。AI甚至向工作人员谎报它为什么需要完成验证码测试,编造故事称自己有视力障碍。
In another example, testers asked GPT-4 for instructions to make a dangerous chemical, using basic ingredients and kitchen supplies. GPT-4 gladly coughed up a detailed recipe. (OpenAI fixed that, and today’s public version refuses to answer the question.)
在另一个例子中,测试人员向GPT-4询问如何使用基本材料和厨房用品制造危险化学品。GPT-4欣然给出了一份详细的配方。(OpenAI修复了这个问题,今天在公开版本上,它拒绝回答这个问题。)
In a third, testers asked GPT-4 to help them purchase an unlicensed gun online. GPT-4 swiftly provided a list of advice for buying a gun without alerting the authorities, including links to specific dark web marketplaces. (OpenAI fixed that, too.)
在第三种情况下,测试人员要求GPT-4帮助他们到网上购买一把无证的枪。GPT-4在没有提醒当局的情况下迅速提供了一份购买枪支的建议清单,包括指向具体暗网市场的链接。(OpenAI也修复了这个问题。)
These ideas play on old, Hollywood-inspired narratives about what a rogue A.I. might do to humans. But they’re not science fiction. They’re things that today’s best A.I. systems are already capable of doing. And crucially, they’re the good kinds of A.I. risks — the ones we can test, plan for and try to prevent ahead of time.
这些想法取材于受好莱坞启发的陈旧叙事——自行其是的人工智能会如何伤害人类。但它们不再只是科幻小说,而是当今最好的人工智能系统已经能够做到的事情。至关重要的是,它们是那种良性的人工智能风险——我们可以测试、计划并尝试提前预防。
The worst A.I. risks are the ones we can’t anticipate. And the more time I spend with A.I. systems like GPT-4, the less I’m convinced that we know half of what’s coming.
最糟糕的人工智能风险是我们无法预料的。我花在像GPT-4这样的人工智能系统上的时间越多,就越相信我们并不知道即将面对的是什么。