We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Entity Resolution: Connecting the Intelligence Dots
How Prismatic resolves entities across multiple data sources using graph databases, confidence scoring, and the Nabla epistemic framework to build verified intelligence profiles.
Tomas Korcak (korczis)
Prismatic Platform
When you query a company name across 157 OSINT sources, you get back hundreds of records. Some are duplicates. Some refer to different entities with the same name. Some contain partial information that only makes sense when combined. Entity resolution is the process of determining which records refer to the same real-world entity and merging them into a coherent profile.
The Entity Resolution Pipeline
Prismatic's entity resolution pipeline operates in four stages:
Stage 1: Normalization
Raw records from different sources use different formats. A company might appear as:
Normalization strips prefixes, suffixes, and legal form designators, standardizes case, and expands abbreviations:
defmodule PrismaticOsintCore.EntityNormalizer do
@legal_forms ~w(s.r.o. a.s. k.s. v.o.s. s.p. z.s. o.p.s.)
def normalize_company(name) do
name
|> String.trim()
|> remove_legal_form()
|> String.downcase()
|> collapse_whitespace()
|> transliterate_diacritics()
end
end
Stage 2: Candidate Generation
Comparing every record against every other record is O(n^2). For 10,000 records, that is 100 million comparisons. We use blocking strategies to reduce the search space:
This reduces comparisons to candidates that have a reasonable chance of matching.
Stage 3: Similarity Scoring
For each candidate pair, we compute similarity across multiple dimensions:
|-----------|--------|--------|
A weighted score above 0.85 is an automatic merge. Between 0.65 and 0.85, the pair is flagged for human review. Below 0.65, the entities are treated as distinct.
Stage 4: Graph Integration
Resolved entities are stored in KuzuDB, a graph database that captures relationships:
[Company A] ββownsβββΊ [Company B]
β β
βββdirectorβββΊ [Person X] βββshareholderββ [Company C]
Graph queries reveal relationships that are invisible in tabular data: ownership chains, circular ownership, beneficial ownership through intermediaries.
Confidence with Nabla
Every entity resolution carries uncertainty. The Nabla epistemic framework quantifies this:
%NablaConfidence{
value: 0.87,
epistemic: 0.05, # Uncertainty from incomplete data
aleatoric: 0.08, # Uncertainty from inherent ambiguity
sources: [:ares, :justice, :whois],
evidence_count: 12
}
Epistemic uncertainty decreases as more data becomes available. If we have only a name match, epistemic uncertainty is high. Adding an ICO match reduces it.
Aleatoric uncertainty reflects inherent ambiguity. Two companies with the same name in the same city might genuinely be different entities. No amount of additional data eliminates this uncertainty.
The distinction matters for decision-making: epistemic uncertainty suggests we should gather more data, while aleatoric uncertainty suggests we should present both possibilities to the analyst.
Cross-Source Verification
Prismatic's 157 OSINT adapters span six categories:
When an entity appears in multiple categories with consistent information, confidence increases. When sources conflict (e.g., different addresses), the system flags the discrepancy for investigation.
Real-World Example
A due diligence investigation on "Navigara s.r.o." produces:
2. Justice Registry: Same ICO, 2 directors, 1 shareholder (foreign entity)
3. Trade Register: Same ICO, industry code 6201 (IT services)
4. Insolvency Registry: No records (positive signal)
5. Domain WHOIS: navigara.cz registered to same address
6. LinkedIn: Company page with 15 employees
Entity resolution merges these into a single profile with confidence 0.94 (high -- multiple identifier matches across official registries). The graph database records the shareholder relationship to the foreign entity, enabling ownership chain analysis.
Conclusion
Entity resolution transforms raw intelligence from multiple sources into verified, confidence-scored entity profiles. The combination of blocking strategies for performance, multi-dimensional similarity scoring for accuracy, and the Nabla epistemic framework for uncertainty quantification produces profiles that analysts can trust -- and understand the limits of.
Explore the [OSINT Capabilities](/osint/) or try the [Interactive Labs](/lab/) for hands-on entity resolution exercises.