The Blind Leading the Blind by Pieter Bruegel the Elder, 1568

There’s a fascinating research area in AI the press doesn’t talk about: mechanistic interpretability. A more marketable name would be: “How AI works.” Or, being rigorous, “how neural networks work.”

I took a peek at recent discoveries from the leading labs (Anthropic and OpenAI). What I’ve found intrigues and unsettles me.

To answer how neural nets work we need to know what they are. Here’s my boring definition: A brain-inspired algorithm that learns by itself from data. Its synapses (parameters) change their value during training to model the data and adapt the network to solve a target task. One typical target task is next-word prediction (language models like GPT-4). You can also recognize cat breeds.

A neural net isn’t magic, just a program stored as files inside your PC (or the cloud, which is slightly magical). You can go and look inside the files. You’ll find decimal numbers (the parameters). Millions of them. But, how do they recognize cats? The answer is hiding in plain sight, in numeric patterns you can’t comprehend. Humans can’t decode how they cause behavior. Not even our best tools can.

That’s why neural networks are called “black boxes.”

You witness in awe how your little program names one hundred cat breeds just from pictures but when you go to the files and look inside—where’s the cat?

It’s in the boxobviously.

You’re a dog person anyway so you wonder: “Where are the useful neural networks?”

Well, everywhere. ChatGPT has one. Google Translate, DeepMind’s chess player AlphaZero, and TikTok’s For You algorithm do, too. Also Tesla, Waymo, and Cruise’s attempts at self-driving cars. Top-tier media apps—Spotify, Netflix, and YouTube—use them to show you stuff you may like. They’re applied in medical diagnosis (now, a few years agoand way, way earlier than you imagine), biological research, weather forecasting (been a while), space exploration (now and then), and military purposes.

They are not new and they are not niche.

Neural nets remain black boxes despite the continuous effort throughout summers and winters and despite being present in hundreds of scientific areas and phone apps—some you use as a consumer and some you’re subject to as a citizen. You use neural nets daily and neural nets are daily used on you.

“I see ancient black boxes everywhere” sounds like the perfect horror-science fiction crossover. But we’re no blend for this mash-up.

Thankfully, interpretability researchers are solving this question, right?

They receive millions of dollars from funders like Open Philanthropy. Add to that Anthropic’s budget (not OpenAI’s, they’re not serious). That’s a lot compared to almost anything except, ironically, the billions of dollars companies and investors are pouring into making their work harder. Want to understand these black boxes? Ha! I’ll make them bigger, more complex, opaque—and self-improving! Want to study their flaws? Sure, after I productize and integrate them into every service.

Why is the last bastion against our ignorance in AI so underfunded, while the forces pushing it into obscurity receive vast sums? One can only wonder what interests lie behind the gaps in our knowledge. On the road to answering the most important question of the most important invention of our times, we’re bound hand and foot by the golden chains of profitability.

The only progress we’ve made under these twisted conditions is that we now know that we know nothing.

If you ask the experts at the forefront of interpretability research, they readily admit it. Dario AmodeiCEO of Anthropic, says “Maybe we . . . understand 3% of how [neural nets] work.” Leo Gaoresearcher at OpenAI, says it plainly: “We don’t understand how neural networks work,” a statement “strongly seconded” by Neel Nandalead of interpretability at Google DeepMind.

They’re top people, at the top labs, at the top of their game.

They don’t know; no one does.

I’m not sure if this unsettles or excites me more.

What I sure feel—as vibrantly as when I found AI in 2015—is a deep curiosity. I’m not interested in knowing how AI works because I’m afraid of it. Mine is sheer scientific curiosity.

AI isn’t an invention like a computer or a calculator. Nor is it a discovery like the theory of Relativity. It’s a discovered inventionlike a forgotten artifact an ancient alien species left behind (except we designed it). Nothing sparks my curiosity as much.

This truth-seeking curiosity is long gone from the scientific arena. AI grew complex, inciting our inquiry, but it also grew useful, shutting it down. The trade-off was in the hands of money—utility over scrutability. Researchers shifted the focus from explanatory theories to predictive tools and statistical models. The result? We’re engineering an intelligence our intelligence can’t reverse-engineer.

About this, Noam Chomsky said that “Statistical models . . . provide no insight.” Peter Norvig responded with a long essay (recommended lecture). Here’s a relevant excerpt:

. . . it can be difficult to make sense of a model containing billions of parameters. Certainly a human can’t understand such a model by inspecting the values of each parameter individually. But one can gain insight by examining the properties of the model—where it succeeds and fails, how well it learns as a function of data, etc.

I agree. We can inspect AI’s behavior from the outside. But we don’t know what causes such behavior. We’re only beginning to create tools to steer it at will.

Norvig wrote this 13 years ago. He says “a model containing billions of parameters.” 10 billion is small now. If we were already hopeless at such a “tiny” scale, imagine now as we race “through the OOMs [orders of magnitude]”? Datacenters are the size of an airport with energy requirements the size of a city. The largest AI systems are counted in the trillions of parametersnot far below the size of the human brain.

Their performance and intelligence grow fast. Our knowledge is slow.

Perhaps it’s time we ask how we got here.

Or, more importantly, why we keep going forward.

We are incurring a huge intellectual debt in the form of unintelligible tech that we use without wisdom or restraint. A debt that, with each passing day, we are further away from paying off.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *