Introduction
Elasticsearch has become the go-to solution for real-time search and analytics. In 2026, scalability and latency requirements demand precise control over cluster configuration, mappings, and indexing strategies. This tutorial walks you through setting up a robust production environment, from JVM sizing to advanced aggregations. You will learn how to avoid common memory pitfalls and optimize performance on multi-terabyte datasets.
Prerequisites
- Java 21+ and Elasticsearch 8.15+
- Solid knowledge of Linux and YAML
- Access to a cluster with at least 3 nodes
- curl or the official Elasticsearch client
Cluster Configuration
cluster.name: prod-cluster-2026
node.name: ${HOSTNAME}
network.host: 0.0.0.0
discovery.seed_hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
indices.memory.index_buffer_size: 30%
thread_pool.write.queue_size: 1000This file configures a high-availability cluster with multicast discovery disabled and an indexing buffer optimized for heavy workloads.
Advanced Index Template
{
"index_patterns": ["logs-2026-*"],
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1,
"refresh_interval": "30s"
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"message": { "type": "text", "analyzer": "standard" },
"level": { "type": "keyword" }
}
}
}The template automatically applies 5 shards and a slow refresh interval to reduce pressure on SSD disks.
Optimized Bulk Indexing
curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' --data-binary @- << EOF
{ "index" : { "_index" : "logs-2026-01" } }
{ "@timestamp": "2026-01-15T10:00:00Z", "message": "Erreur critique", "level": "ERROR" }
{ "index" : { "_index" : "logs-2026-01" } }
{ "@timestamp": "2026-01-15T10:00:01Z", "message": "Requête traitée", "level": "INFO" }
EOFBulk indexing with batches of 1000 documents reduces request overhead and improves throughput by up to 10x.
Query with Aggregations
{
"size": 0,
"query": { "range": { "@timestamp": { "gte": "now-1h" } } },
"aggs": {
"errors_per_minute": {
"date_histogram": { "field": "@timestamp", "calendar_interval": "1m" },
"aggs": { "error_count": { "filter": { "term": { "level": "ERROR" } } } }
}
}
}This date_histogram aggregation calculates errors per minute without retrieving all documents.
JVM Monitoring Script
#!/bin/bash
curl -s localhost:9200/_nodes/stats/jvm | jq '.nodes[].jvm.mem.heap_used_percent'Continuously monitor heap usage percentage to trigger alerts before garbage collection occurs.
Best Practices
- Always explicitly define the number of shards and replicas
- Use index templates to standardize mappings
- Configure refresh_interval based on data volume
- Monitor JVM heap and circuit breakers
- Prefer filtered queries over scoring queries when possible
Common Mistakes to Avoid
- Forgetting to limit result size (from + size)
- Creating too many shards on small indexes
- Ignoring circuit breaker warnings
- Using custom analyzers without testing relevance
Further Reading
Deepen your skills with our advanced Elasticsearch training.