Article originally appeared on Replicate.
There are some weeks where it seems like open-source AI will never catch up. The powerful models everyone’s talking about are locked behind $20 subscription, if they’re released to the public at all. Open-weight models close the capability gap after a year or so. Actual open-source models, with weights, dataset, training code and inference code? Forget about it.
Don’t be discouraged, though. Sure, the megacorps are training multimodal world models in datacenters the size of cities. Maybe they really will build a god in there. I’m not qualified to say.
But those huge datacenters need electricity, and climate control, and security personnel. They need GPUs, new ones, bigger ones, and there’s ever more competition for those. They need the compute to do research, not just write your emails.
And let’s be honest: do you need a god to write your emails? Does that require superhuman intelligence? Not really. You don’t even need a whole human’s worth of intelligence — otherwise you’d be doing it yourself!
What if, instead of building the most intelligent thing ever, we unbundled intelligence? An eye here, a mouth there. Document-scanning intelligence, decision-making intelligence, email-sending intelligence. A world of tiny smart functions, lubricating the frictions of our daily lives.
That’s something that’s possible now. All we have to do is build.
The big news of the week in AI models for coding was — well, if I’m honest, it was Claude 3.5 Sonnet model, which everyone says you absolutely must try, it’s so good it’s uncanny, it makes them scared for their jobs, etc.
But before that came out, the big news was DeepSeek-Coder-V2, the open-weights model that finally beat GPT-4o at coding tasks. It held the top spot on the leaderboard for a respectable four days before being beaten by the new Claude.
The Mixture of Experts model comes in Full and Lite sizes and has a context length of 128K tokens.
Another step in the march to bigger models running faster on smaller devices. PowerInfer-2 takes advantage of locality and sparsity in neural networks. Some “hot” neurons are always activated, and others only respond to specific inputs. They keep the hot neurons on the GPU and the rest on CPU, for less memory consumption and GPU/CPU transfer.
Along with the inference engine they release TurboSparse-Mistral-7B and TurboSparse-Mixtral-47B, tuned to be easier for the engine to predict.
This has been the buzz in San Francisco for a while now, but it seemed like nobody wanted to reveal their edge before anyone else did.
That is, until Aidan McLaughlin dropped AI Search: The Bitter-er Lesson. The essay walks through a brief history of scaling vs. search, and how they will interact to create superhuman intelligence even sooner than we think.
What if we could start automating AI research today? What if we didn’t have to wait for a 2030 supercluster to cure cancer? What if ASI was in the room with us already?
Granting foundation models ‘search’ (the ability to think for longer) might upend Scaling Laws and change AI’s trajectory.
The story goes like this: prediction heuristics allow LLMs to “see” further down the branches of possible texts without actually generating every branch to reject the bad ones. This means they can get more leverage out of their compute, and thus advance their AI research faster than everyone else.
This post seems to have unlocked Pandora’s Box. Several papers have been published or popularized in the days since. A twitter user posted a one-page version using JAX. And OpenAI’s silence is not one I find comforting…
post | paper | paper | paper | paper | paper | code
As you may know, we have a cool AI support bot that has access to our documentation and public code. It’s available 24/7 to answer your questions or teach you more about Replicate on our support page, where you can also always contact a human as well.
We also have a great Discord server where people hang out, help each other and show off cool stuff they’re building.
Now you can get helpful bot answers right in the Discord server, by typing the prefix !support
before your question.
Goodbye, dear reader. Very real reader. Reader who exists and experiences thoughts. Send me your thoughts. Real thoughts, that you definitely had, because someone is definitely reading these things, and I’m not just hucking them into the void.
Right? Right. Of course. Real.
Thanks for reading (probably)
— deepfates