Latent Space Cartography

Applying standard TransE-style relational displacement analysis to frozen general-purpose text embeddings over Wikidata. Relations run as cheap vector displacements on embeddings that were never trained for them — and along the way the work surfaced a silent production defect that makes mxbai-embed-large nearly useless for any text with diacritics.

1 · Relational inference on frozen embeddings

Relations implemented as displacement-vector operations (h + r ≈ t) on existing embeddings — orders of magnitude cheaper than full model inference.

2 · Cross-model relational structure

Three independent general-purpose models (mxbai-embed-large, nomic-embed-text, all-minilm) encode the same 30 universal relations as consistent vector displacements — a property of the semantic relationships, not any single model.

3 · A silent production defect

The [UNK]-token dominance defect causes 147,687 cross-entity embedding collisions on diacritical text when served via Ollama. Missed by standard benchmarks like MTEB.

Latent Space Cartography

The three contributions

1 · Relational inference on frozen embeddings

2 · Cross-model relational structure

3 · A silent production defect