Embeddings and vectors

#tree

I've been playing Semantle with my partner lately, not realizing I'd been interacting with a game built using an NLP embedding method. I should've figured, really, but I had an epiphany while studying LangChain and Salesforce Agentforce at work.

I learned:

About the different types of embedding models
- Frequency-based, where the importance or significance of words are inferred
  - TF-IDF (term frequency inverse document frequency) scores words highly if they appear often in a certain document but not in the entire corpus
- Prediction-based, which uses semantic relationships and contextual information (e.g. dog + wag, bark, tail)
  - Word2Vec by Google (Semantle uses this!)
  - GLOVE by Stanford
- Contextual, where the representation of words changes based on context, allowing for more nuanced meanings and relationships
  - Transformers
About different use cases, such as
- LLM long-term memory
- Semantic search
- Similarity search for images, audio, and video
- Recommendation engines

The theory behind it is interesting, but here's what really fascinates me.

I went down a rabbit hole after realizing that Semantle uses Word2Vec, and came across an example from the original research paper that said the vector for the word "Queen" should roughly equal "King - Man + Woman." I knew that bias in training data creates biased models, but this further solidified the concept for me. The way we use words to uplift or degrade gets embedded in the model's relationships between words themselves. Hate and ignorance have the potential to shape meaning in the eyes of the algorithm.

Can we truly quantify the intricacies of all language, the basis of human interaction? I think not, given the complexity and diversity of human thought.

I used to dislike the idea of adopting AI due to its cognitive impacts and resource depletion. But I believe it's an amazing tool for facilitating information search and retrieval. Instead of it taking over human creativity, I hope we can use it to create a more organized and harmonious society, and I dream of a day where its implementation, from hardware to software, is ethical and mindful of life beyond the tech bubble.