We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Hawkeye: Visitor Intelligence with Privacy by Design
Building a visitor analytics system with IP geolocation, browser fingerprinting, session tracking, and anomaly detection while maintaining GDPR compliance.
Tomas Korcak (korczis)
Prismatic Platform
Web analytics tools tell you what happened. Visitor intelligence tells you what it means. Hawkeye is Prismatic's visitor intelligence subsystem, designed to transform raw HTTP request data into actionable insights about who visits your application, where they come from, and whether their behavior is normal.
Why Build Our Own
We evaluated existing solutions: Plausible, Matomo, PostHog. Each has merit, but none satisfied our requirements simultaneously:
2. Real-time anomaly detection: We need to identify suspicious patterns as they happen, not in a daily report.
3. Integration with OSINT: Visitor intelligence feeds into our broader intelligence pipeline. A suspicious IP should automatically trigger enrichment through our 157 OSINT adapters.
4. GDPR by architecture: Privacy compliance should be a structural property of the system, not a checkbox.
Architecture
Hawkeye operates as two umbrella apps: prismatic_hawkeye (backend logic) and prismatic_hawkeye_web (LiveView dashboards). The data flow is:
HTTP Request β Plug Pipeline β Event Collector β Enrichment β Storage β Analysis β Dashboard
Event Collection
Every request passes through a Plug that extracts visitor metadata without blocking the response:
defmodule PrismaticHawkeye.Plug.Collector do
@moduledoc """
Non-blocking visitor event collection plug.
Extracts request metadata and dispatches to the
async enrichment pipeline.
"""
@behaviour Plug
@impl true
def init(opts), do: opts
@impl true
def call(conn, _opts) do
event = %{
ip_hash: hash_ip(conn.remote_ip),
user_agent: get_user_agent(conn),
path: conn.request_path,
method: conn.method,
referer: get_header(conn, "referer"),
accept_language: get_header(conn, "accept-language"),
timestamp: DateTime.utc_now()
}
Task.Supervisor.async_nolink(
PrismaticHawkeye.TaskSupervisor,
fn -> PrismaticHawkeye.Pipeline.process(event) end
)
conn
end
defp hash_ip(ip_tuple) do
ip_string = :inet.ntoa(ip_tuple) |> to_string()
salt = Application.get_env(:prismatic_hawkeye, :ip_salt)
:crypto.hash(:sha256, "#{salt}:#{ip_string}")
|> Base.encode16(case: :lower)
|> binary_part(0, 16)
end
end
Notice the critical design decision: we never store raw IP addresses. The IP is immediately hashed with a rotating salt. This gives us consistent visitor identification within a salt rotation period (default: 24 hours) while making it mathematically impossible to recover the original IP.
IP Geolocation
Geolocation runs against a local MaxMind GeoLite2 database, ensuring no external API calls:
defmodule PrismaticHawkeye.Geo do
@moduledoc """
Local IP geolocation using MaxMind GeoLite2 database.
No external API calls. Database refreshed weekly.
"""
@spec locate(tuple()) :: {:ok, map()} | {:error, :not_found}
def locate(ip_tuple) do
case Geolix.lookup(ip_tuple, where: :city) do
%{city: city, country: country, location: location} ->
{:ok, %{
country_code: country.iso_code,
country_name: country.name,
city: city.name,
latitude: location.latitude,
longitude: location.longitude,
accuracy_radius_km: location.accuracy_radius
}}
_ ->
{:error, :not_found}
end
end
end
Geolocation data is computed from the raw IP before hashing. The result is stored alongside the hashed IP, so subsequent analytics queries can answer "how many visitors from Prague?" without ever knowing actual IP addresses.
Browser Fingerprinting
We compute a lightweight browser fingerprint from headers alone -- no JavaScript required:
defmodule PrismaticHawkeye.Fingerprint do
@moduledoc """
Server-side browser fingerprinting from HTTP headers.
Generates a consistent device identifier without client-side JavaScript.
"""
@spec compute(map()) :: String.t()
def compute(event) do
components = [
event.user_agent,
event.accept_language,
extract_ua_platform(event.user_agent),
extract_ua_browser(event.user_agent)
]
:crypto.hash(:sha256, Enum.join(components, "|"))
|> Base.encode16(case: :lower)
|> binary_part(0, 12)
end
end
This yields lower uniqueness than JavaScript-based fingerprinting (which can access canvas, WebGL, and installed fonts), but it requires zero client-side code and works even when JavaScript is disabled. For our use case -- distinguishing device types and detecting session anomalies -- the header-based approach provides sufficient granularity.
Session Tracking
Sessions are reconstructed server-side using a combination of the IP hash and browser fingerprint:
defmodule PrismaticHawkeye.SessionTracker do
@moduledoc """
Server-side session reconstruction using IP hash
and browser fingerprint correlation.
"""
use GenServer
@session_timeout_ms :timer.minutes(30)
@spec track_event(map()) :: {:ok, String.t()}
def track_event(enriched_event) do
session_key = "#{enriched_event.ip_hash}:#{enriched_event.fingerprint}"
GenServer.call(__MODULE__, {:track, session_key, enriched_event})
end
@impl true
def handle_call({:track, key, event}, _from, state) do
now = System.monotonic_time(:millisecond)
{session_id, sessions} =
case Map.get(state.sessions, key) do
%{last_activity: last} = session when now - last < @session_timeout_ms ->
updated = %{session |
last_activity: now,
page_count: session.page_count + 1,
pages: [event.path | session.pages]
}
{session.id, Map.put(state.sessions, key, updated)}
_ ->
id = generate_session_id()
new_session = %{
id: id,
started_at: now,
last_activity: now,
page_count: 1,
pages: [event.path],
geo: event.geo
}
{id, Map.put(state.sessions, key, new_session)}
end
{:reply, {:ok, session_id}, %{state | sessions: sessions}}
end
end
Anomaly Detection
The anomaly detection system runs continuously, analyzing visitor patterns for three categories of suspicious behavior:
Rate Anomalies
A visitor making 100 requests per minute to different endpoints is likely a bot or scraper:
defmodule PrismaticHawkeye.Anomaly.RateDetector do
@moduledoc """
Detects abnormal request rates using sliding window counters.
"""
@window_size_ms :timer.minutes(1)
@threshold_requests 60
@spec check(String.t(), integer()) :: :normal | {:anomaly, :high_rate, map()}
def check(ip_hash, current_count) do
if current_count > @threshold_requests do
{:anomaly, :high_rate, %{
ip_hash: ip_hash,
count: current_count,
window_ms: @window_size_ms,
threshold: @threshold_requests
}}
else
:normal
end
end
end
Geographic Anomalies
A session that originates in Prague and then appears in Tokyo 5 minutes later indicates either a VPN switch or credential sharing:
defp detect_geo_anomaly(session, new_event) do
case {session.geo, new_event.geo} do
{%{country_code: same}, %{country_code: same}} ->
:normal
{%{latitude: lat1, longitude: lon1}, %{latitude: lat2, longitude: lon2}} ->
distance_km = haversine(lat1, lon1, lat2, lon2)
elapsed_hours = elapsed_time_hours(session.last_activity)
max_possible_km = elapsed_hours * 900 # ~speed of commercial flight
if distance_km > max_possible_km do
{:anomaly, :impossible_travel, %{distance_km: distance_km}}
else
:normal
end
_ ->
:normal
end
end
Behavioral Anomalies
Visitors who exclusively hit API endpoints without ever loading a page, or who access admin paths without authentication, trigger behavioral flags.
Privacy Compliance
GDPR compliance is enforced at the architectural level through several mechanisms:
Data minimization: We collect the minimum data needed for analysis. No cookies, no localStorage, no tracking pixels.
Pseudonymization: IP addresses are hashed before storage. The salt rotates daily, making cross-day correlation impossible without the historical salt.
Right to erasure: Because we store IP hashes rather than raw IPs, individual erasure requests are handled by rotating the salt and purging all sessions older than the retention period.
Retention limits: All visitor data is automatically purged after 90 days. The ETS-based session tracker purges inactive sessions after 30 minutes:
defp cleanup_expired_sessions(state) do
now = System.monotonic_time(:millisecond)
active_sessions =
state.sessions
|> Enum.reject(fn {_key, session} ->
now - session.last_activity > @session_timeout_ms
end)
|> Map.new()
%{state | sessions: active_sessions}
end
Consent: Hawkeye operates without cookies and without client-side tracking, so it falls under the "strictly necessary" exemption in most EU jurisdictions. However, we still provide a privacy notice and opt-out mechanism for transparency.
Dashboard
The Hawkeye dashboard at /hawkeye provides real-time visitor intelligence through Phoenix LiveView:
All dashboard data is computed from the pseudonymized storage -- the dashboard never has access to raw IP addresses or personally identifiable information.
OSINT Integration
When Hawkeye detects a suspicious pattern, it can trigger an OSINT enrichment workflow. The IP hash is not useful for OSINT (by design), but the associated metadata -- geographic origin, user agent patterns, access patterns -- can be correlated against threat intelligence feeds:
def handle_anomaly({:anomaly, type, details}) do
enrichment_request = %{
source: :hawkeye,
anomaly_type: type,
geo: details[:geo],
user_agent: details[:user_agent],
timestamp: DateTime.utc_now()
}
Phoenix.PubSub.broadcast(
Prismatic.PubSub,
"osint:enrichment_requests",
{:enrich, enrichment_request}
)
end
This integration means that a suspicious visitor pattern can automatically trigger checks against abuse databases, VPN exit node lists, and known bot networks without exposing any personal data to the OSINT pipeline.
Performance
Hawkeye processes visitor events with minimal overhead:
The system is designed to scale horizontally. Multiple nodes can run independent collectors, with session data synchronized through a shared PostgreSQL store for cross-node analytics.
Hawkeye demonstrates that meaningful visitor intelligence does not require invasive tracking. By designing privacy into the architecture rather than bolting it on as a compliance afterthought, we achieve both operational insight and regulatory compliance.