Back to Blog
Engineering March 22, 2026 | 9 min read

Telemetry-Driven Development: Observability from Day One

Building observable systems with the OTEL doctrine and :telemetry

Prismatic Engineering

Prismatic Platform

Observability Is Not Optional


The OTEL doctrine (Observability Telemetry Enforcement Layer) is one of the platform's 18 enforcement pillars. It mandates that every GenServer, controller, LiveView, and external API call must emit telemetry events. Observability is not something you bolt on after a production incident -- it is a design constraint from day one.


The :telemetry Foundation


Elixir's :telemetry library provides a lightweight, VM-native event system. Events are tuples of name, measurements, and metadata:



:telemetry.execute(

[:prismatic, :osint, :adapter, :execute],

%{duration: duration_ms, result_count: length(results)},

%{adapter: adapter_name, query: sanitized_query}

)


The key principle is separation of emission and handling. The code that does work emits events. Completely separate code decides what to do with those events -- log them, aggregate them, send them to Prometheus, or trigger alerts.


GenServer Telemetry


Every GenServer in the platform emits telemetry for three lifecycle events:


init/1



def init(config) do

start_time = System.monotonic_time()

# ... initialization logic ...

duration = System.monotonic_time() - start_time


:telemetry.execute(

[:prismatic, :genserver, :init],

%{duration: duration},

%{module: __MODULE__, config_keys: Map.keys(config)}

)


{:ok, state}

end


handle_call/handle_cast


Every message handler emits duration, queue length, and result status. This data reveals which GenServers are bottlenecks and which message types are slowest.


terminate/2


Termination events capture the reason and the final state size. Unexpected terminations (anything other than :normal or :shutdown) trigger escalation alerts.


Controller and LiveView Instrumentation


Phoenix already emits telemetry for HTTP requests via Plug.Telemetry. The platform extends this with:


Request Context



:telemetry.execute(

[:prismatic, :request, :complete],

%{duration: duration_ms, status: status_code},

%{

path: conn.request_path,

method: conn.method,

user_id: get_user_id(conn),

request_id: Logger.metadata()[:request_id]

}

)


LiveView Mount and Event Handling


LiveView mounts and event handlers emit timing data that feeds into the performance gate system. The PERF doctrine mandates:


  • LiveView mount: less than 150ms
  • Event handling: less than 100ms
  • Page load: less than 250ms

  • Any violation is flagged in the telemetry dashboard and blocks deployment if the violation persists across multiple measurements.


    Span Creation for Distributed Tracing


    For operations that span multiple processes or external calls, the platform creates trace spans:


    
    

    def execute_investigation(case_id) do

    span_id = generate_span_id()


    :telemetry.span(

    [:prismatic, :investigation, :execute],

    %{case_id: case_id, span_id: span_id},

    fn ->

    result = do_investigation(case_id)

    {result, %{entity_count: length(result.entities)}}

    end

    )

    end


    :telemetry.span/3 automatically emits start and stop events (or exception events on failure), with duration calculated precisely using monotonic time. Spans can be nested and correlated using span IDs.


    Metric Collection and Aggregation


    Raw telemetry events are ephemeral -- they fire and are gone. The platform uses Telemetry.Metrics to define persistent aggregations:


    Metric TypeExamplePurpose

    |-------------|---------|---------|

    Counterprismatic.osint.adapter.execute.countHow many times each adapter runs Sumprismatic.osint.adapter.execute.result_countTotal results produced Last Valueprismatic.genserver.mailbox_lengthCurrent mailbox depth Distributionprismatic.request.durationLatency percentiles (p50, p95, p99)

    These metrics are exported to Prometheus for long-term storage and Grafana for visualization.


    Structured Logging


    The platform uses structured logging exclusively. No unstructured string interpolation:


    
    

    # Correct: structured metadata

    Logger.info("Investigation completed",

    case_id: case_id,

    entity_count: length(entities),

    duration_ms: duration,

    source: :dd_engine

    )


    # Incorrect: unstructured string

    # Logger.info("Investigation #{case_id} found #{length(entities)} entities in #{duration}ms")


    Structured logs are machine-parseable, searchable, and can be correlated across services using request IDs and span IDs.


    The ErrorFeed Real-Time Dashboard


    The ErrorFeed is a LiveView dashboard at /admin/error-feed that provides real-time visibility into platform errors:


    Features


  • Live stream: Errors appear in real-time via PubSub subscription to "error_patterns"
  • Pattern detection: The PatternTracker identifies recurring error signatures and groups them
  • Severity classification: Errors are classified as critical, warning, or info based on type and frequency
  • Root cause linking: Each error links to the telemetry span that produced it, enabling one-click drill-down
  • Trend analysis: Rolling 1-hour, 6-hour, and 24-hour error rate charts

  • Architecture


    
    

    Application Code

    |

    v

    StreamLoggerBackend (captures all log levels)

    |

    v

    PatternTracker (classifies and groups)

    |

    v

    PubSub "error_patterns" topic

    |

    v

    ErrorFeedLive (renders in browser)


    The StreamLoggerBackend is a custom Logger backend that intercepts all log events at runtime. It filters for error-level events and forwards them to the PatternTracker, which maintains a sliding window of recent errors and identifies patterns.


    OTEL Doctrine Enforcement


    The OTEL pillar is enforced at two levels:


  • Pre-commit: The doctrine checker scans modified files for GenServers without telemetry emissions, controllers without request logging, and rescue blocks without error logging.

  • 2. CI/CD: mix check.doctrines --pillar otel runs a comprehensive audit of all modules, flagging any that lack the required telemetry integration.


    Violations are advisory in pre-commit (warning) but blocking in CI (the build fails). This gives developers a chance to fix issues before pushing while ensuring nothing reaches production without proper observability.




    If you cannot observe it, you cannot improve it. If you cannot measure it, you cannot manage it.


    Tags

    telemetry observability otel metrics structured-logging