ThinkNimble Research

AI Search with LLM Embeddings
Archived

AI Search with LLM Embeddings

Semantic search with vector embeddings

Prototype application demonstrating semantic search capabilities using natural language, enabling job description matching through vector embeddings and PostgreSQL with pgvector.

Overview

AI Search with LLM Embeddings is a prototype application that showcases the power of semantic search using vector embeddings. It enables users to search job descriptions using natural language queries, demonstrating how AI-powered search can understand intent and context rather than just matching keywords.

Video Tutorials

Quick Introduction

Watch this short introduction to get a quick overview of the AI search capabilities and see the demo in action.

Deep Dive Tutorial

This comprehensive tutorial walks through the entire implementation, explaining the concepts behind vector embeddings, semantic search, and how to build similar applications.

Features

How It Works

1. Data Ingestion

Job descriptions are processed and converted into vector embeddings using OpenAI’s embedding models. These high-dimensional vectors capture the semantic meaning of the text.

2. Storage

Embeddings are stored in PostgreSQL using the pgvector extension, which provides efficient indexing and similarity search capabilities for vector data.

3. Query Processing

When a user enters a search query:

4. Results

The most semantically similar job descriptions are returned, even if they don’t contain the exact keywords from the query.

Technical Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend  │────▢│   Django    │────▢│  PostgreSQL  β”‚
β”‚    (Web)    β”‚     β”‚   Backend   β”‚     β”‚  + pgvector  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  OpenAI API β”‚
                    β”‚ (Embeddings)β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Installation

# Clone the repository
git clone https://github.com/thinknimble/embeddings-search-demo.git
cd embeddings-search-demo

# Set up Python environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up PostgreSQL with pgvector
# Ensure PostgreSQL is installed with pgvector extension
createdb embeddings_demo
psql -d embeddings_demo -c "CREATE EXTENSION vector;"

# Configure environment variables
cp .env.example .env
# Add your OpenAI API key and database credentials to .env

# Run migrations
python manage.py migrate

# Load sample data (if provided)
python manage.py load_sample_data

# Start the development server
python manage.py runserver

Usage Example

# Example: Search for a candidate
query = "Experienced Python developer with machine learning background and strong communication skills"

# The system will find job descriptions that semantically match this query,
# even if they use different terminology like:
# - "ML engineer with Python expertise"
# - "Data scientist proficient in Python and deep learning"
# - "Software engineer with AI/ML experience"

Key Technologies

pgvector

PostgreSQL extension that provides vector similarity search capabilities:

OpenAI Embeddings

Django

Use Cases

This demo showcases techniques applicable to:

Learning Resources

This project serves as an educational resource for developers interested in:

Performance Considerations

Future Enhancements

Potential areas for expansion:

Contributing

This is a demonstration project designed for learning. We welcome contributions that:

Visit our GitHub repository to explore the code and contribute!