2025-08-12 15:48:22 -07:00
2025-08-12 15:48:22 -07:00
2025-08-12 14:29:27 -07:00
2025-08-12 15:48:22 -07:00
2025-08-12 14:29:27 -07:00
2025-08-12 15:48:22 -07:00
2025-08-12 15:48:22 -07:00

EmbeddingBuddy

A Python Dash application for interactive exploration and visualization of embedding vectors through dimensionality reduction techniques.

Overview

EmbeddingBuddy provides an intuitive web interface for analyzing high-dimensional embedding vectors by applying various dimensionality reduction algorithms and visualizing the results in interactive 2D and 3D plots.

Features

  • Dimensionality Reduction: Support for PCA, t-SNE, and UMAP algorithms
  • Interactive Visualizations: 2D and 3D plots using Plotly
  • Web Interface: Built with Python Dash for easy accessibility
  • Vector Analysis: Tools for exploring embedding vector relationships and patterns

Data Format

EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files where each line contains an embedding document with the following structure:

{"id": "doc_001", "embedding": [0.1, -0.3, 0.7, ...], "text": "Sample text content", "category": "news", "subcategory": "politics", "tags": ["election", "politics"]}
{"id": "doc_002", "embedding": [0.2, -0.1, 0.9, ...], "text": "Another example", "category": "review", "subcategory": "product", "tags": ["tech", "gadget"]}

Required Fields:

  • embedding: Array of floating-point numbers representing the vector
  • text: String content associated with the embedding

Optional Fields:

  • id: Unique identifier (auto-generated if missing)
  • category: Primary classification
  • subcategory: Secondary classification
  • tags: Array of string tags for flexible labeling

Features (Initial Version)

  • Drag-and-drop file upload for NDJSON embedding datasets
  • PCA dimensionality reduction (automatically applied)
  • Interactive 2D/3D visualizations with toggle between views
  • Color coding options by category, subcategory, or tags
  • Point inspection - click points to view full document content
  • Real-time visualization optimized for small to medium datasets

Installation & Usage

This project uses uv for dependency management.

  1. Install dependencies:
uv sync
  1. Run the application:
uv run python app.py
  1. Open your browser to http://127.0.0.1:8050

  2. Test with sample data by dragging and dropping the included sample_data.ndjson file

Tech Stack

  • Python Dash: Web application framework
  • Plotly: Interactive plotting and visualization
  • scikit-learn: PCA implementation
  • NumPy/Pandas: Data manipulation and analysis
  • uv: Modern Python package and project manager
Description
A webapp that lets you view your Embedding Vectors
Readme 1 MiB
v0.3.0 Latest
2025-08-14 19:08:01 -07:00
Languages
Python 100%