Member-only story

Embedding models: OpenAI vs Google Cloud

Bilal
2 min readSep 23, 2023

--

Embedding is a list of floating-point numbers, represented as a vector. It quantifies the relatedness of text strings. The degree of relatedness between two vectors is gauged by their distance. Smaller distances imply stronger relatedness, while larger distances signify weaker relatedness. Uses of embeddings include:

  1. Search: Arranging results based on their relevance to a search query.
  2. Clustering: Grouping text strings based on their similarity.
  3. Recommendations: Suggesting items associated with similar text strings.
  4. Anomaly detection: Identifying outliers with minimal relatedness.
  5. Diversity assessment: Analyzing the distribution of similarities.
  6. Classification: Categorizing text strings based on their closest label.

Both OpenAI and Google Cloud provide embedding models that can be used to get embedding vectors for text data. OpenAI’s latest model at the time of writing is text-embedding-ada-002. For Google Cloud (Vertex AI), the latest model can be accessed via textembedding-gecko@latest.

I decided to run a small benchmark to test the response times of the two embedding models. The cost of both models is the same i.e. $0.0001 / 1K tokens. The text that I use for generating embeddings is:

--

--

Bilal
Bilal

Written by Bilal

Learning new things everyday. Writing about things that I learn and observe. PhD in computer science. https://www.linkedin.com/in/mbilalce/

No responses yet