Note

This post is a thought. It's a short note that I make about someone else's content online. Learn more about the process thoughts

Here's my thought on ๐Ÿ’ญ Simon Willison on X: "Anyone got a lead on a good embedding model that can embed both images and text into the same space, so you can search for "dog" and get back images most likely to contain a dog? It looks like VisualBERT is one, what are others?" / X


Kinda mindblown that this is even possible. This is so far outside of my current thinking that i didn't even think of an elegant way to implement semantic search accross images and text at the same time. I know it happens at Google, but I envision that as still text search accross tags and meta data about the image.

Based on the number of responses CLIP is the thing that does this.


This post was a thought by Waylon Walker see all my thoughts at https://waylonwalker.com/thoughts