diff --git a/README.md b/README.md index 2b397e6..cb4add3 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,8 @@ # EmbeddingBuddy -A Python Dash application for interactive exploration and visualization of -embedding vectors through dimensionality reduction techniques. +A web application for interactive exploration and visualization of embedding +vectors through dimensionality reduction techniques. Compare documents and prompts +in the same embedding space to understand semantic relationships. ![Screenshot of 3d graph and UI for Embedding Buddy](./embedding-buddy-screenshot.png) @@ -9,28 +10,45 @@ embedding vectors through dimensionality reduction techniques. EmbeddingBuddy provides an intuitive web interface for analyzing high-dimensional embedding vectors by applying various dimensionality reduction algorithms and -visualizing the results in interactive 2D and 3D plots. +visualizing the results in interactive 2D and 3D plots. The application supports +dual dataset visualization, allowing you to compare documents and prompts to +understand how queries relate to your content. ## Features -- **Dimensionality Reduction**: Support for PCA, t-SNE, and UMAP algorithms -- **Interactive Visualizations**: 2D and 3D plots using Plotly -- **Web Interface**: Built with Python Dash for easy accessibility -- **Vector Analysis**: Tools for exploring embedding vector relationships and - patterns +- **Dual file upload** - separate drag-and-drop for documents and prompts +- **Multiple dimensionality reduction methods**: PCA, t-SNE, and UMAP +- **Interactive 2D/3D visualizations** with toggle between views +- **Color coding options** by category, subcategory, or tags +- **Visual distinction**: Documents appear as circles, prompts as diamonds with desaturated colors +- **Prompt visibility toggle** - show/hide prompts to reduce visual clutter +- **Point inspection** - click points to view full content and identify document vs prompt +- **Reset functionality** - clear all data to start fresh +- **Sidebar layout** with controls on left, large visualization area on right +- **Real-time visualization** optimized for small to medium datasets ## Data Format -EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files where each line contains an embedding document with the following structure: +EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files for both documents +and prompts. Each line contains an embedding with the following structure: + +**Documents:** ```json {"id": "doc_001", "embedding": [0.1, -0.3, 0.7, ...], "text": "Sample text content", "category": "news", "subcategory": "politics", "tags": ["election", "politics"]} {"id": "doc_002", "embedding": [0.2, -0.1, 0.9, ...], "text": "Another example", "category": "review", "subcategory": "product", "tags": ["tech", "gadget"]} ``` +**Prompts:** + +```json +{"id": "prompt_001", "embedding": [0.15, -0.28, 0.65, ...], "text": "Find articles about machine learning applications", "category": "search", "subcategory": "technology", "tags": ["AI", "research"]} +{"id": "prompt_002", "embedding": [0.72, 0.18, -0.35, ...], "text": "Show me product reviews for smartphones", "category": "search", "subcategory": "product", "tags": ["mobile", "reviews"]} +``` + **Required Fields:** -- `embedding`: Array of floating-point numbers representing the vector +- `embedding`: Array of floating-point numbers representing the vector (must be same dimensionality for both documents and prompts) - `text`: String content associated with the embedding **Optional Fields:** @@ -40,15 +58,7 @@ EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files where each line con - `subcategory`: Secondary classification - `tags`: Array of string tags for flexible labeling -## Features - -- **Drag-and-drop file upload** for NDJSON embedding datasets -- **Multiple dimensionality reduction methods**: PCA, t-SNE, and UMAP -- **Interactive 2D/3D visualizations** with toggle between views -- **Color coding options** by category, subcategory, or tags -- **Point inspection** - click points to view full document content -- **Sidebar layout** with controls on left, large visualization area on right -- **Real-time visualization** optimized for small to medium datasets +**Important:** Document and prompt embeddings must have the same number of dimensions to be visualized together. ## Installation & Usage @@ -68,7 +78,10 @@ uv run python app.py 3. **Open your browser** to http://127.0.0.1:8050 -4. **Test with sample data** by dragging and dropping the included `sample_data.ndjson` file +4. **Test with sample data**: + - Upload `sample_data.ndjson` (documents) + - Upload `sample_prompts.ndjson` (prompts) to see dual visualization + - Use the "Show prompts" toggle to compare how prompts relate to documents ## Tech Stack diff --git a/embedding-buddy-screenshot.png b/embedding-buddy-screenshot.png index 89185b7..8ecee66 100644 Binary files a/embedding-buddy-screenshot.png and b/embedding-buddy-screenshot.png differ