update readme and screenshot

This commit is contained in:
2025-08-12 18:53:09 -07:00
parent 64685b9b4f
commit 76be59254c
2 changed files with 33 additions and 20 deletions

View File

@@ -1,7 +1,8 @@
# EmbeddingBuddy
A Python Dash application for interactive exploration and visualization of
embedding vectors through dimensionality reduction techniques.
A web application for interactive exploration and visualization of embedding
vectors through dimensionality reduction techniques. Compare documents and prompts
in the same embedding space to understand semantic relationships.
![Screenshot of 3d graph and UI for Embedding Buddy](./embedding-buddy-screenshot.png)
@@ -9,28 +10,45 @@ embedding vectors through dimensionality reduction techniques.
EmbeddingBuddy provides an intuitive web interface for analyzing high-dimensional
embedding vectors by applying various dimensionality reduction algorithms and
visualizing the results in interactive 2D and 3D plots.
visualizing the results in interactive 2D and 3D plots. The application supports
dual dataset visualization, allowing you to compare documents and prompts to
understand how queries relate to your content.
## Features
- **Dimensionality Reduction**: Support for PCA, t-SNE, and UMAP algorithms
- **Interactive Visualizations**: 2D and 3D plots using Plotly
- **Web Interface**: Built with Python Dash for easy accessibility
- **Vector Analysis**: Tools for exploring embedding vector relationships and
patterns
- **Dual file upload** - separate drag-and-drop for documents and prompts
- **Multiple dimensionality reduction methods**: PCA, t-SNE, and UMAP
- **Interactive 2D/3D visualizations** with toggle between views
- **Color coding options** by category, subcategory, or tags
- **Visual distinction**: Documents appear as circles, prompts as diamonds with desaturated colors
- **Prompt visibility toggle** - show/hide prompts to reduce visual clutter
- **Point inspection** - click points to view full content and identify document vs prompt
- **Reset functionality** - clear all data to start fresh
- **Sidebar layout** with controls on left, large visualization area on right
- **Real-time visualization** optimized for small to medium datasets
## Data Format
EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files where each line contains an embedding document with the following structure:
EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files for both documents
and prompts. Each line contains an embedding with the following structure:
**Documents:**
```json
{"id": "doc_001", "embedding": [0.1, -0.3, 0.7, ...], "text": "Sample text content", "category": "news", "subcategory": "politics", "tags": ["election", "politics"]}
{"id": "doc_002", "embedding": [0.2, -0.1, 0.9, ...], "text": "Another example", "category": "review", "subcategory": "product", "tags": ["tech", "gadget"]}
```
**Prompts:**
```json
{"id": "prompt_001", "embedding": [0.15, -0.28, 0.65, ...], "text": "Find articles about machine learning applications", "category": "search", "subcategory": "technology", "tags": ["AI", "research"]}
{"id": "prompt_002", "embedding": [0.72, 0.18, -0.35, ...], "text": "Show me product reviews for smartphones", "category": "search", "subcategory": "product", "tags": ["mobile", "reviews"]}
```
**Required Fields:**
- `embedding`: Array of floating-point numbers representing the vector
- `embedding`: Array of floating-point numbers representing the vector (must be same dimensionality for both documents and prompts)
- `text`: String content associated with the embedding
**Optional Fields:**
@@ -40,15 +58,7 @@ EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files where each line con
- `subcategory`: Secondary classification
- `tags`: Array of string tags for flexible labeling
## Features
- **Drag-and-drop file upload** for NDJSON embedding datasets
- **Multiple dimensionality reduction methods**: PCA, t-SNE, and UMAP
- **Interactive 2D/3D visualizations** with toggle between views
- **Color coding options** by category, subcategory, or tags
- **Point inspection** - click points to view full document content
- **Sidebar layout** with controls on left, large visualization area on right
- **Real-time visualization** optimized for small to medium datasets
**Important:** Document and prompt embeddings must have the same number of dimensions to be visualized together.
## Installation & Usage
@@ -68,7 +78,10 @@ uv run python app.py
3. **Open your browser** to http://127.0.0.1:8050
4. **Test with sample data** by dragging and dropping the included `sample_data.ndjson` file
4. **Test with sample data**:
- Upload `sample_data.ndjson` (documents)
- Upload `sample_prompts.ndjson` (prompts) to see dual visualization
- Use the "Show prompts" toggle to compare how prompts relate to documents
## Tech Stack