We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
GenServer Patterns for Production Systems
Battle-tested GenServer patterns covering state management, handle_continue, timeout strategies, Registry usage, and bottleneck avoidance in production Elixir systems.
Tomas Korcak (korczis)
Prismatic Platform
GenServer is the workhorse of OTP applications. Every stateful process, every background worker, every cache in an Elixir system is likely a GenServer under the hood. But the gap between a tutorial GenServer and a production-grade one is significant. This post covers the patterns we have refined across the Prismatic Platform's 94 umbrella applications.
The Init Trap
The most common GenServer anti-pattern is doing heavy work in init/1. The init callback blocks the supervisor, which blocks the entire application startup if the operation is slow or fails:
# BAD: Blocks supervisor during startup
@impl true
def init(opts) do
data = HttpClient.fetch_large_dataset!() # Could take 30 seconds
{:ok, %{data: data}}
end
Use handle_continue/2 to defer initialization:
@impl true
def init(opts) do
{:ok, %{data: nil, status: :initializing}, {:continue, :load_data}}
end
@impl true
def handle_continue(:load_data, state) do
case load_data() do
{:ok, data} ->
{:noreply, %{state | data: data, status: :ready}}
{:error, reason} ->
Logger.error("Failed to load data: #{inspect(reason)}")
Process.send_after(self(), :retry_load, :timer.seconds(5))
{:noreply, %{state | status: :degraded}}
end
end
The process starts immediately, the supervisor moves on, and data loading happens asynchronously. The status field lets callers know whether the process is ready.
State Design
GenServer state should be a struct, not a bare map. Structs enforce required keys and make the state shape explicit:
defmodule Prismatic.OSINT.AdapterWorker do
use GenServer
defmodule State do
@moduledoc false
@enforce_keys [:adapter_name, :config]
defstruct [
:adapter_name,
:config,
:last_query_at,
status: :idle,
query_count: 0,
error_count: 0,
rate_limit_remaining: nil
]
end
@impl true
def init(opts) do
state = %State{
adapter_name: Keyword.fetch!(opts, :adapter_name),
config: Keyword.fetch!(opts, :config)
}
{:ok, state}
end
end
This immediately catches typos like %{stauts: :idle} at compile time and documents exactly what state the process manages.
Registry vs Named Processes
For singleton processes (one per application), use a simple name:
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
For dynamic pools of processes (one per entity, per adapter, per connection), use Registry:
# In application.ex
children = [
{Registry, keys: :unique, name: Prismatic.OSINT.AdapterRegistry}
]
# Starting a worker
def start_link(adapter_name) do
GenServer.start_link(
__MODULE__,
[adapter_name: adapter_name],
name: via_tuple(adapter_name)
)
end
defp via_tuple(adapter_name) do
{:via, Registry, {Prismatic.OSINT.AdapterRegistry, adapter_name}}
end
# Looking up a worker
def get_worker(adapter_name) do
case Registry.lookup(Prismatic.OSINT.AdapterRegistry, adapter_name) do
[{pid, _value}] -> {:ok, pid}
[] -> {:error, :not_found}
end
end
Registry is preferred over :global or :pg for node-local process lookups because it is faster and does not involve distributed consensus.
Timeout Patterns
GenServer supports three timeout mechanisms. Each serves a different purpose:
Inactivity Timeout
Returned from callbacks, triggers handle_info(:timeout, state) after the specified period of inactivity:
@impl true
def handle_call(:get_status, _from, state) do
{:reply, state.status, state, :timer.minutes(5)}
end
@impl true
def handle_info(:timeout, state) do
# No messages for 5 minutes, clean up
{:noreply, %{state | cache: %{}}}
end
Periodic Timer
Use Process.send_after/3 for periodic work:
@impl true
def init(opts) do
schedule_health_check()
{:ok, %State{}}
end
@impl true
def handle_info(:health_check, state) do
new_state = perform_health_check(state)
schedule_health_check()
{:noreply, new_state}
end
defp schedule_health_check do
Process.send_after(self(), :health_check, :timer.seconds(30))
end
Call Timeout
Protect callers from slow GenServers:
# Caller side - default is 5000ms
def get_status(server) do
GenServer.call(server, :get_status, :timer.seconds(10))
catch
:exit, {:timeout, _} ->
{:error, :timeout}
end
Avoiding the Bottleneck
A GenServer processes one message at a time. If every operation goes through a single GenServer, it becomes a bottleneck. Strategies to avoid this:
Read-Heavy: ETS Table
Store frequently-read data in ETS, update it from the GenServer:
@impl true
def init(opts) do
table = :ets.new(:adapter_cache, [:set, :protected, read_concurrency: true])
{:ok, %{table: table}, {:continue, :populate}}
end
@impl true
def handle_continue(:populate, state) do
Enum.each(load_adapters(), fn adapter ->
:ets.insert(state.table, {adapter.name, adapter})
end)
{:noreply, state}
end
# Public read function - bypasses GenServer entirely
def get_adapter(name) do
case :ets.lookup(:adapter_cache, name) do
[{^name, adapter}] -> {:ok, adapter}
[] -> {:error, :not_found}
end
end
Any number of processes can read from ETS concurrently without going through the GenServer.
Write-Heavy: Sharding
Partition work across multiple GenServers:
defmodule Prismatic.Events.ShardedCounter do
@shard_count 16
def increment(key) do
shard = :erlang.phash2(key, @shard_count)
GenServer.cast(:"counter_shard_#{shard}", {:increment, key})
end
def count(key) do
0..(@shard_count - 1)
|> Enum.map(fn shard ->
GenServer.call(:"counter_shard_#{shard}", {:get, key})
end)
|> Enum.sum()
end
end
Compute-Heavy: Offload to Tasks
Do not block the GenServer mailbox with expensive computations:
@impl true
def handle_call({:analyze, data}, from, state) do
Task.start(fn ->
result = expensive_analysis(data)
GenServer.reply(from, {:ok, result})
end)
{:noreply, state}
end
Telemetry Integration
The OTEL doctrine requires telemetry in GenServers. Emit events for lifecycle and key operations:
@impl true
def handle_call({:query, params}, _from, state) do
start_time = System.monotonic_time()
{result, new_state} = execute_query(params, state)
:telemetry.execute(
[:prismatic, :osint, :adapter, :query],
%{duration: System.monotonic_time() - start_time},
%{adapter: state.adapter_name, status: elem(result, 0)}
)
{:reply, result, new_state}
end
Graceful Shutdown
Implement terminate/2 for cleanup, but remember it is not guaranteed to run (e.g., if the process is killed with :kill):
@impl true
def terminate(reason, state) do
Logger.info("#{state.adapter_name} shutting down: #{inspect(reason)}")
flush_pending_writes(state)
:ok
end
For critical cleanup, use Process.flag(:trap_exit, true) in init/1 to ensure terminate/2 is called on normal shutdowns.
Process Hibernation
For processes that are mostly idle but hold large state, hibernation reduces memory usage:
@impl true
def handle_info(:timeout, state) do
{:noreply, state, :hibernate}
end
The BEAM garbage-collects the process heap and puts it to sleep. The next message wakes it up with a fresh heap. Use this for processes like per-user session servers that may be idle for minutes or hours.
Summary
|---|---|---|
handle_continue{:via, Registry, ...} namingThese patterns compose: a production GenServer often combines handle_continue initialization, ETS-backed reads, periodic timers, telemetry, and structured state in a single module. The key is knowing which patterns to apply and when.