Introduction
In 2026, Elasticsearch remains the leading search engine for scalable applications, handling billions of documents with sub-millisecond latency. This expert tutorial guides you through deploying a high-availability cluster using Docker, defining dynamic mappings for structured and unstructured data, advanced queries like nested booleans, hierarchical aggregations for analytics, and optimizations with ILM and sharding. Unlike traditional NoSQL databases, Elasticsearch excels at full-text search powered by Lucene, with modern features like vector search and built-in security. You'll learn to avoid performance pitfalls (over-sharding, poor analysis) for production-ready clusters. Bookmark this guide to scale your logs, e-commerce, or AI embeddings. (128 words)
Prerequisites
- Docker and Docker Compose 2.20+ installed
- curl 7.80+ or Postman for testing APIs
- Advanced knowledge of JSON, YAML, and Lucene (analyzers, tokenizers)
- Machine with at least 8GB RAM (for 3-node cluster)
- Basics of distributed scaling (shards, replicas)
Deploy the Cluster with Docker Compose
version: '3.8'
services:
elasticsearch1:
image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
container_name: es1
environment:
- node.name=es1
- cluster.name=learni-cluster
- discovery.seed_hosts=es2,es3
- cluster.initial_master_nodes=es1,es2,es3
- bootstrap.memory_lock=true
- 'ES_JAVA_OPTS=-Xms2g -Xmx2g'
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=false
- xpack.security.transport.ssl.enabled=true
ulimits:
memlock: -1
volumes:
- es1_data:/usr/share/learni/data
ports:
- 9200:9200
networks:
- elastic
elasticsearch2:
image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
container_name: es2
environment:
- node.name=es2
- cluster.name=learni-cluster
- discovery.seed_hosts=es1,es3
- cluster.initial_master_nodes=es1,es2,es3
- bootstrap.memory_lock=true
- 'ES_JAVA_OPTS=-Xms2g -Xmx2g'
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=false
- xpack.security.transport.ssl.enabled=true
ulimits:
memlock: -1
volumes:
- es2_data:/usr/share/learni/data
networks:
- elastic
elasticsearch3:
image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
container_name: es3
environment:
- node.name=es3
- cluster.name=learni-cluster
- discovery.seed_hosts=es1,es2
- cluster.initial_master_nodes=es1,es2,es3
- bootstrap.memory_lock=true
- 'ES_JAVA_OPTS=-Xms2g -Xmx2g'
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=false
- xpack.security.transport.ssl.enabled=true
ulimits:
memlock: -1
volumes:
- es3_data:/usr/share/learni/data
networks:
- elastic
volumes:
es1_data:
driver: local
es2_data:
driver: local
es3_data:
driver: local
networks:
elastic:
driver: bridgeThis docker-compose file deploys a secure 3-node cluster with X-Pack, 2GB heap per node, and automatic discovery. Volumes persist data; adjust ES_JAVA_OPTS based on your RAM. Run with docker compose up -d then check curl -u elastic:changeme http://localhost:9200/_cat/health.
Verification and Initial Setup
After docker compose up -d, wait 2-3 minutes for master election. Use elastic:changeme as default credentials (change them in production with elasticsearch-setup-passwords). Test health: green means stable cluster. This setup avoids single points of failure, unlike a solo node that crashes under load.
Generate Secure Passwords
#!/bin/bash
docker exec es1 /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto \
--batch \
--url https://es1:9200 \
> /tmp/elasticsearch-passwords.txt
cat /tmp/elasticsearch-passwords.txt
docker exec es1 curl -u elastic:$(grep elastic /tmp/elasticsearch-passwords.txt | cut -d ' ' -f4) -X GET "localhost:9200/_cluster/health?pretty"This script generates random passwords for elastic/kibana and tests connectivity. Run chmod +x setup-passwords.sh && ./setup-passwords.sh. Note the credentials for subsequent queries; in production, store them in Docker secrets or Vault.
Create Index with Advanced Mapping
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"french_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding", "french_stemmer", "stop"]
}
},
"filter": {
"french_stemmer": {
"type": "stemmer",
"language": "light_french"
}
}
},
"index.lifecycle.name": "warm_delete_policy"
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "french_analyzer",
"fields": {
"keyword": { "type": "keyword" }
}
},
"content": {
"type": "text",
"analyzer": "french_analyzer"
},
"tags": {
"type": "keyword"
},
"nested_tags": {
"type": "nested",
"properties": {
"name": { "type": "keyword" },
"score": { "type": "float" }
}
},
"vector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine"
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}This mapping optimizes for French-language content (stemming, stop words), nested objects, and dense vectors for AI search. 3 shards/2 replicas ensure scalability; ILM hooks the policy. Curl: curl -u elastic:PASS -X PUT "localhost:9200/learni_articles" -H 'Content-Type: application/json' -d @create-index.json.
Index Realistic Data
With the index created, bulk-index 1000+ docs for testing. Use _bulk for efficiency (15x faster than single POST). This simulates an e-commerce/articles dataset with nested tags and embeddings.
Bulk Indexing Documents
#!/bin/bash
PASS=$(grep elastic /tmp/elasticsearch-passwords.txt | cut -d ' ' -f4)
cat > bulk-data.ndjson << EOF
{"index":{"_index":"learni_articles"}}
{"title":"Tutoriel Elasticsearch avancé","content":"Optimisez vos recherches full-text avec mappings nested et vectors.","tags":["elasticsearch","search"],"nested_tags":[{"name":"expert","score":9.5},{"name":"2026","score":10}],"vector":[0.1,0.2,0.3],"timestamp":"2026-01-01T10:00:00Z"}
{"index":{"_index":"learni_articles"}}
{"title":"Scaling clusters ES","content":"Sharding et replicas pour petabytes.","tags":["cluster","scale"],"nested_tags":[{"name":"prod","score":8.8}],"vector":[0.4,0.5,0.6],"timestamp":"2026-02-01T12:00:00Z"}
EOF
curl -u elastic:$PASS -X POST "localhost:9200/_bulk" -H "Content-Type: application/x-ndjson" --data-binary @bulk-data.ndjson
curl -u elastic:$PASS "localhost:9200/learni_articles/_count?pretty"This script generates an NDJSON bulk (native ES format) with 2 example docs covering all mapped fields. Run after mapping; scale to millions with generators. Check count to confirm ingestion without mapping errors.
Advanced Nested Boolean Query
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "optimise scaling",
"fields": ["title^2", "content"],
"type": "best_fields",
"fuzziness": "AUTO"
}
},
{
"nested": {
"path": "nested_tags",
"query": {
"bool": {
"must": [
{ "match": { "nested_tags.name": "expert" } },
{ "range": { "nested_tags.score": { "gte": 9 } } }
]
}
}
}
}
],
"filter": [
{ "term": { "tags": "elasticsearch" } },
{ "range": { "timestamp": { "gte": "2026-01-01" } } }
],
"should": [
{
"script_score": {
"query": { "match_all": {} },
"script": {
"source": "cosineSimilarity(params.query_vector, 'vector') + 1.0",
"params": { "query_vector": [0.15, 0.25, 0.35] }
}
}
}
]
}
},
"size": 10,
"sort": [{ "_score": "desc" }]
}This query combines must (fuzzy full-text), nested filters, exact terms, ranges, and script_score for KNN vectors. Boosts title x2; fuzziness handles typos. Curl: curl -u elastic:PASS -X GET "localhost:9200/learni_articles/_search" -H 'Content-Type: application/json' -d @advanced-query.json. Hybrid text+vector scoring.
Aggregations and Analytics
Move to aggs for dashboards: nested terms, date_histogram buckets. This powers Kibana without SQL.
Hierarchical Aggregations
{
"aggs": {
"tags_agg": {
"terms": {
"field": "tags.keyword",
"size": 10,
"order": { "doc_count": "desc" }
},
"aggs": {
"avg_nested_score": {
"nested": { "path": "nested_tags" },
"aggs": {
"avg_score": { "avg": { "field": "nested_tags.score" } },
"top_tags": {
"terms": {
"field": "nested_tags.name.keyword",
"size": 5
}
}
}
},
"date_buckets": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "month",
"format": "yyyy-MM"
}
}
}
}
},
"size": 0
}Nested aggs: terms on tags, nested avg/top on sub-tags, and temporal histogram. size:0 optimizes for pure aggs. Curl as before; ideal for global metrics without pagination.
Performance Tuning and ILM Policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"set_priority": { "priority": 100 }
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": { "number_of_shards": 1 },
"forcemerge": { "max_num_segments": 1 },
"set_priority": { "priority": 50 }
}
},
"cold": {
"min_age": "30d",
"actions": {
"set_priority": { "priority": 0 },
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": { "delete": {} }
}
}
}
}Create ILM policy: hot->warm (shrink/merge), cold (freeze), auto-delete. Curl PUT /_ilm/policy/warm_delete_policy -d @ilm-policy.json. Apply to index settings; reduces cold data storage by 80%.
Best Practices
- Shard sizing: Max 20-50GB per primary shard; calculate
docs / (shards * replicas). - Custom analyzers: Always test with
_analyzeAPI before mapping. - Monitoring: Enable
_cluster/statsand integrate Prometheus exporter. - Backups: Snapshot to S3/MinIO with
PUT /_snapshot/my_repo. - Vectors: Use HNSW for >1024 dims; cosine over dot_product for normalization.
Common Errors to Avoid
- Over-sharding: >50 shards/node kills heap; start low and reshard.
- Mapping explosion: Avoid
dynamic: truein prod; use strict mappings. - Yellow health: Unallocated replicas? Add nodes or
PUT /index/_settings {"number_of_replicas":1}. - OOM kills: Heap >32GB? Disable compressed oops; monitor GC logs.
Next Steps
Dive into Elastic Cloud for managed clusters or integrate with LangChain for RAG. Check the official Elastic 8.15 docs. Explore our expert Learni trainings on Elasticsearch and OpenSearch for ECE certification.