|||

Article originally appeared on Replicate.


Editor’s note

There are some weeks where it seems like open-source AI will never catch up. The powerful models everyone’s talking about are locked behind $20 subscription, if they’re released to the public at all. Open-weight models close the capability gap after a year or so. Actual open-source models, with weights, dataset, training code and inference code? Forget about it.

Don’t be discouraged, though. Sure, the megacorps are training multimodal world models in datacenters the size of cities. Maybe they really will build a god in there. I’m not qualified to say.

But those huge datacenters need electricity, and climate control, and security personnel. They need GPUs, new ones, bigger ones, and there’s ever more competition for those. They need the compute to do research, not just write your emails.

And let’s be honest: do you need a god to write your emails? Does that require superhuman intelligence? Not really. You don’t even need a whole human’s worth of intelligence — otherwise you’d be doing it yourself!

What if, instead of building the most intelligent thing ever, we unbundled intelligence? An eye here, a mouth there. Document-scanning intelligence, decision-making intelligence, email-sending intelligence. A world of tiny smart functions, lubricating the frictions of our daily lives.

That’s something that’s possible now. All we have to do is build.

deepfates


An open-weights code model tops the charts (briefly)

The big news of the week in AI models for coding was — well, if I’m honest, it was Claude 3.5 Sonnet model, which everyone says you absolutely must try, it’s so good it’s uncanny, it makes them scared for their jobs, etc.

But before that came out, the big news was DeepSeek-Coder-V2, the open-weights model that finally beat GPT-4o at coding tasks. It held the top spot on the leaderboard for a respectable four days before being beaten by the new Claude.

The Mixture of Experts model comes in Full and Lite sizes and has a context length of 128K tokens.

paper | model


Cool tools

A fast way to run language models on phones

Another step in the march to bigger models running faster on smaller devices. PowerInfer-2 takes advantage of locality and sparsity in neural networks. Some hot” neurons are always activated, and others only respond to specific inputs. They keep the hot neurons on the GPU and the rest on CPU, for less memory consumption and GPU/CPU transfer.

Along with the inference engine they release TurboSparse-Mistral-7B and TurboSparse-Mixtral-47B, tuned to be easier for the engine to predict.

post | paper | model | code


Research radar

That tree search thing everyone’s talking about

This has been the buzz in San Francisco for a while now, but it seemed like nobody wanted to reveal their edge before anyone else did.

That is, until Aidan McLaughlin dropped AI Search: The Bitter-er Lesson. The essay walks through a brief history of scaling vs. search, and how they will interact to create superhuman intelligence even sooner than we think.

What if we could start automating AI research today? What if we didn’t have to wait for a 2030 supercluster to cure cancer? What if ASI was in the room with us already?

Granting foundation models search’ (the ability to think for longer) might upend Scaling Laws and change AIs trajectory.

The story goes like this: prediction heuristics allow LLMs to see” further down the branches of possible texts without actually generating every branch to reject the bad ones. This means they can get more leverage out of their compute, and thus advance their AI research faster than everyone else.

This post seems to have unlocked Pandora’s Box. Several papers have been published or popularized in the days since. A twitter user posted a one-page version using JAX. And OpenAI’s silence is not one I find comforting…

post | paper | paper | paper | paper | paper | code


Changelog

Talk to our support bot without leaving Discord

As you may know, we have a cool AI support bot that has access to our documentation and public code. It’s available 24/7 to answer your questions or teach you more about Replicate on our support page, where you can also always contact a human as well.

We also have a great Discord server where people hang out, help each other and show off cool stuff they’re building.

Now you can get helpful bot answers right in the Discord server, by typing the prefix !support before your question.

support | discord


Bye for now

Goodbye, dear reader. Very real reader. Reader who exists and experiences thoughts. Send me your thoughts. Real thoughts, that you definitely had, because someone is definitely reading these things, and I’m not just hucking them into the void.

Right? Right. Of course. Real.

Thanks for reading (probably)

— deepfates

Up next Replicate Intelligence #4 Find concepts in GPT models, real-time speech to text in the browser, H100s are coming Replicate Intelligence #6 Google's Gemma2 models, language model leaderboard, tips for Stable Diffusion 3
Latest posts Replicate Intelligence #12 Replicate Intelligence #11 Replicate Intelligence #10 Replicate Intelligence #9 Replicate Intelligence #8 Replicate Intelligence #7 Replicate Intelligence #6 Replicate Intelligence #5 Replicate Intelligence #4 Replicate Intelligence #3 Replicate Intelligence #2 Replicate Intelligence #1 The 3½ Tenets of Biocosmism Hypervector Redactions Rufus, your AI-powered shopping assistant Oh Turing Two scientists The Ascension of Cerebro The Hyperstition Array Crawling Chat Instructions 2 as a user The OOM Source Text Paradoxes Message from SF Instructions Another carved fragment Data Cognitive Security 101