The meme search problem is more important than I thought.
We’re keeping our personal and collective memories in screenshots, memes, and photos. Yet you can only scroll through and do Recognition - look for the right day or the right color and hope you don’t blink. There’s no Recall.
This is a world-scale problem. How many minutes, hours, days are lost to people scrolling through photos? How many datasets have biased or junk images in them but are too big to sift by hand? How many messages are lost to deletion or account bans, but stored in screenshots?
We could be so much smarter.
The Google Photos app, on my Google brand phone, is pretty good. It does custom face recognition for people and pets, and it can detect various categories. These are probably different neural nets trained on-phone or in the cloud, and deployed differently based on what you shoot.
Mine can detect these “Things”: Selfies, Cliffs, Sky, Mountains, Canyons, Hiking, Screenshots, Cooking, Cars, Cats, Posters (actually books), Deserts, Backpacking, Bikes, Menus (also books), Sunset, Food, Skylines, Forests, Baking, Flowers (mostly AI generated).
If anyone wants to send me a screenshot or list of their Things, that would be cool btw. Or whatever you use to organize your photos, memes and screenshots. I have been working on a tool for this but I need help making it usable for everyone.
Google phone also has Lens, which is a specialized neural network camera. It can do real-time OCR on text, and place it in mostly the right alignment on the page. It can translate text from every language I’ve tried it on. It can search for objects on the internet. It can do your homework.
But Google is big and slow and ships their org chart. And they have to be conservative, because they’re going for a broad market. An open source version could move faster, lead the way. And then if they want to copy it, oh well. At least it is available to all the people.
My answer is a neural search library called 𝗺𝗲𝗺𝗲𝗿𝘆, which is still alpha but usable. It lets you search through thousands of images with natural language. You can find memery at http://github.com/deepfates/memery
Memery uses CLIP under the hood, which marries image and text transformers to make a semantic space representing both. I’ve written about this before
. When they released CLIP they must have had this same feeling. You’re transforming humanity forever.CLIP encodes visual image and text about that image into the same embedding space. You've probably seen this image, maybe with a caption that says "AI is too dumb to tell the difference between an apple and an iPod lol"
— web weaver (@deepfates) May 25, 2021
but this is actually amazing pic.twitter.com/upCTu9OQJV
This is a great point. Memery is awesome for curating datasets, and AI artists and artificers can use it to add or subtract images from their collections easier than ever https://x.com/ThePatanoiac/status/1403818500334907392?s=20
You can just ask it for “noise” and find the messy datapoints you need to clean up
searched circles dataset for "noise" and got some pretty accurate results actually https://t.co/Oje3USdANi pic.twitter.com/E7D6rTIAWx
— web weaver (@deepfates) April 5, 2021
It’s not about meme indexing, it’s about world knowledge
We're discussing (and trying) this with @JinaAI_ . I'm guessing one issue is that memes mutate so rapidly that getting a good dataset to index is tough.
— Alex C-G (@alexcg) June 12, 2021
We can already do image to text all kinds of cross modal.
I mean who coulda seen the Attack of the Clones meme coming?