Blockchain Indexing Explained
Reading data directly from a blockchain is slow and expensive. Indexing makes it fast. Here's how the stack works, from raw blocks to production APIs.
Why blockchains need indexers
Blockchains are optimized for writes, not reads. Querying historical data — 'show me all transfers from this address' or 'what was the TVL of this pool at block 18M' — requires scanning every block since genesis. That's fine for a one-off, but unusable for a dApp that needs to render a dashboard in under a second. Indexers process blocks as they're produced, extract events and state changes, and store them in a queryable database. They're the read layer of Web3.
The Graph: subgraphs and GraphQL
The Graph is the most widely used indexing protocol in Web3. You define a subgraph — a manifest that specifies which smart contracts to index, which events to track, and how to map those events to entities in a GraphQL schema. The Graph Node processes blocks, runs your mappings (written in AssemblyScript), and exposes a GraphQL endpoint. The hosted service is free for development; the decentralized network uses indexer payments via GRT tokens for production.
Ponder and Envio: next-gen indexers
Ponder is a newer indexing framework that uses TypeScript instead of AssemblyScript for mappings — a significant developer experience improvement. It runs on your own infrastructure (no hosted service dependency) and supports multi-chain indexing from a single codebase. Envio takes a similar approach with HyperSync, which can backfill months of historical data in minutes by using a parallelized block fetching architecture. Both are gaining traction with teams that outgrew The Graph's hosted service.
EVM event indexing
On EVM chains (Ethereum, Polygon, Arbitrum, Optimism, Base), indexing starts with events. Smart contracts emit events that are stored in transaction receipts. An indexer: (1) subscribes to new blocks, (2) filters for blocks containing your contract's events, (3) decodes the event data using the contract ABI, (4) transforms it into your data model, and (5) stores it in PostgreSQL or similar. For historical data, the indexer replays blocks from a starting point. The RPC provider you use for block fetching is the bottleneck — rate limits and reliability matter.
Non-EVM chains
Solana uses a different architecture — accounts and programs instead of contracts and events. Indexing Solana requires a Geyser plugin or RPC polling to track account changes. Near uses a chunk-based architecture that requires different extraction patterns. Fuel uses UTXO-based state. The indexing principles are the same (extract, transform, store, query) but the extraction layer is chain-specific. Multi-chain dApps need an abstraction layer that normalizes across these differences.
Production considerations
Running an indexer in production means handling: chain reorganizations (blocks that get reverted — your indexer needs to handle them gracefully), RPC provider failures (have at least two providers configured with automatic failover), historical backfill performance (can you re-index 2 years of blocks in under an hour?), and GraphQL query performance (add indexes on frequently queried fields, implement pagination, set query depth limits). An indexer that falls behind the chain head is an indexer that's not serving production traffic.
Need custom indexing infrastructure?
We build production indexers for EVM and non-EVM chains.
