Article originally appeared on Replicate.
The big open source AI news this week is the release of Stable Diffusion 3 Medium. People are already doing cool things with it, but public reaction has been mixed.
On a personal note, I got banned from X Dot Com. Apparently it is against the rules to change your profile picture to the old Twitter logo and announcing “WE ARE SO BACK”.
Anyway, here’s some things that caught my eye this week. Find me on Bluesky, I guess.
The long-awaited image generation model is related in the 2B size (no word yet about the larger 8B version).
Users say the model is much better at creating legible text, but that it has problems with anatomy and composition.
Model weights are available under a non-commercial license.
OpenAI does dictionary learning on their own models to extract and interpret patterns that may to specific concepts. Similar technique to the one Anthropic used to create Golden Gate Claude.
They release a research paper and feature explorer, but also code that will steer the (practically retro at this point) GPT-2-small model.
post | paper | github | visualizer
The Transformers.js project has implemented OpenAI’s Whisper model in JavaScript. This means you can open a browser tab, talk to it, and get an accurate transcript of your words in real time. No coding required.
Researchers at ByteDance, find a way to encode images into a single short vector instead of a 2D grid of patches. The new vectors can be as short as 32 elements, instead of 256 or even 1024 for existing methods.
This could make multimodal models and image generators much more compute efficient.
We’ll soon be adding support for NVIDIA’s powerful H100 GPUs.
If you’re interested in getting early access to H100s, email support@replicate.com
How am I doing so far? You going to keep opening these letters? Let me know, so I can fix everything to be exactly perfect. Thanks in advance.
— deepfates