Skip to content
Learni
View all tutorials
Elasticsearch

How to Deploy and Optimize Elasticsearch in 2026

Lire en français

Introduction

In 2026, Elasticsearch remains the leading search engine for scalable applications, handling billions of documents with sub-millisecond latency. This expert tutorial guides you through deploying a high-availability cluster using Docker, defining dynamic mappings for structured and unstructured data, advanced queries like nested booleans, hierarchical aggregations for analytics, and optimizations with ILM and sharding. Unlike traditional NoSQL databases, Elasticsearch excels at full-text search powered by Lucene, with modern features like vector search and built-in security. You'll learn to avoid performance pitfalls (over-sharding, poor analysis) for production-ready clusters. Bookmark this guide to scale your logs, e-commerce, or AI embeddings. (128 words)

Prerequisites

  • Docker and Docker Compose 2.20+ installed
  • curl 7.80+ or Postman for testing APIs
  • Advanced knowledge of JSON, YAML, and Lucene (analyzers, tokenizers)
  • Machine with at least 8GB RAM (for 3-node cluster)
  • Basics of distributed scaling (shards, replicas)

Deploy the Cluster with Docker Compose

docker-compose.yml
version: '3.8'
services:
  elasticsearch1:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
    container_name: es1
    environment:
      - node.name=es1
      - cluster.name=learni-cluster
      - discovery.seed_hosts=es2,es3
      - cluster.initial_master_nodes=es1,es2,es3
      - bootstrap.memory_lock=true
      - 'ES_JAVA_OPTS=-Xms2g -Xmx2g'
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
    ulimits:
      memlock: -1
    volumes:
      - es1_data:/usr/share/learni/data
    ports:
      - 9200:9200
    networks:
      - elastic
  elasticsearch2:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
    container_name: es2
    environment:
      - node.name=es2
      - cluster.name=learni-cluster
      - discovery.seed_hosts=es1,es3
      - cluster.initial_master_nodes=es1,es2,es3
      - bootstrap.memory_lock=true
      - 'ES_JAVA_OPTS=-Xms2g -Xmx2g'
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
    ulimits:
      memlock: -1
    volumes:
      - es2_data:/usr/share/learni/data
    networks:
      - elastic
  elasticsearch3:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
    container_name: es3
    environment:
      - node.name=es3
      - cluster.name=learni-cluster
      - discovery.seed_hosts=es1,es2
      - cluster.initial_master_nodes=es1,es2,es3
      - bootstrap.memory_lock=true
      - 'ES_JAVA_OPTS=-Xms2g -Xmx2g'
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
    ulimits:
      memlock: -1
    volumes:
      - es3_data:/usr/share/learni/data
    networks:
      - elastic
volumes:
  es1_data:
    driver: local
  es2_data:
    driver: local
  es3_data:
    driver: local
networks:
  elastic:
    driver: bridge

This docker-compose file deploys a secure 3-node cluster with X-Pack, 2GB heap per node, and automatic discovery. Volumes persist data; adjust ES_JAVA_OPTS based on your RAM. Run with docker compose up -d then check curl -u elastic:changeme http://localhost:9200/_cat/health.

Verification and Initial Setup

After docker compose up -d, wait 2-3 minutes for master election. Use elastic:changeme as default credentials (change them in production with elasticsearch-setup-passwords). Test health: green means stable cluster. This setup avoids single points of failure, unlike a solo node that crashes under load.

Generate Secure Passwords

setup-passwords.sh
#!/bin/bash
docker exec es1 /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto \
  --batch \
  --url https://es1:9200 \
  > /tmp/elasticsearch-passwords.txt
cat /tmp/elasticsearch-passwords.txt
docker exec es1 curl -u elastic:$(grep elastic /tmp/elasticsearch-passwords.txt | cut -d ' ' -f4) -X GET "localhost:9200/_cluster/health?pretty"

This script generates random passwords for elastic/kibana and tests connectivity. Run chmod +x setup-passwords.sh && ./setup-passwords.sh. Note the credentials for subsequent queries; in production, store them in Docker secrets or Vault.

Create Index with Advanced Mapping

create-index.json
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2,
    "analysis": {
      "analyzer": {
        "french_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "french_stemmer", "stop"]
        }
      },
      "filter": {
        "french_stemmer": {
          "type": "stemmer",
          "language": "light_french"
        }
      }
    },
    "index.lifecycle.name": "warm_delete_policy"
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "french_analyzer",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      },
      "content": {
        "type": "text",
        "analyzer": "french_analyzer"
      },
      "tags": {
        "type": "keyword"
      },
      "nested_tags": {
        "type": "nested",
        "properties": {
          "name": { "type": "keyword" },
          "score": { "type": "float" }
        }
      },
      "vector": {
        "type": "dense_vector",
        "dims": 768,
        "index": true,
        "similarity": "cosine"
      },
      "timestamp": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      }
    }
  }
}

This mapping optimizes for French-language content (stemming, stop words), nested objects, and dense vectors for AI search. 3 shards/2 replicas ensure scalability; ILM hooks the policy. Curl: curl -u elastic:PASS -X PUT "localhost:9200/learni_articles" -H 'Content-Type: application/json' -d @create-index.json.

Index Realistic Data

With the index created, bulk-index 1000+ docs for testing. Use _bulk for efficiency (15x faster than single POST). This simulates an e-commerce/articles dataset with nested tags and embeddings.

Bulk Indexing Documents

bulk-index.sh
#!/bin/bash
PASS=$(grep elastic /tmp/elasticsearch-passwords.txt | cut -d ' ' -f4)

cat > bulk-data.ndjson << EOF
{"index":{"_index":"learni_articles"}}
{"title":"Tutoriel Elasticsearch avancé","content":"Optimisez vos recherches full-text avec mappings nested et vectors.","tags":["elasticsearch","search"],"nested_tags":[{"name":"expert","score":9.5},{"name":"2026","score":10}],"vector":[0.1,0.2,0.3],"timestamp":"2026-01-01T10:00:00Z"}
{"index":{"_index":"learni_articles"}}
{"title":"Scaling clusters ES","content":"Sharding et replicas pour petabytes.","tags":["cluster","scale"],"nested_tags":[{"name":"prod","score":8.8}],"vector":[0.4,0.5,0.6],"timestamp":"2026-02-01T12:00:00Z"}
EOF

curl -u elastic:$PASS -X POST "localhost:9200/_bulk" -H "Content-Type: application/x-ndjson" --data-binary @bulk-data.ndjson
curl -u elastic:$PASS "localhost:9200/learni_articles/_count?pretty"

This script generates an NDJSON bulk (native ES format) with 2 example docs covering all mapped fields. Run after mapping; scale to millions with generators. Check count to confirm ingestion without mapping errors.

Advanced Nested Boolean Query

advanced-query.json
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "optimise scaling",
            "fields": ["title^2", "content"],
            "type": "best_fields",
            "fuzziness": "AUTO"
          }
        },
        {
          "nested": {
            "path": "nested_tags",
            "query": {
              "bool": {
                "must": [
                  { "match": { "nested_tags.name": "expert" } },
                  { "range": { "nested_tags.score": { "gte": 9 } } }
                ]
              }
            }
          }
        }
      ],
      "filter": [
        { "term": { "tags": "elasticsearch" } },
        { "range": { "timestamp": { "gte": "2026-01-01" } } }
      ],
      "should": [
        {
          "script_score": {
            "query": { "match_all": {} },
            "script": {
              "source": "cosineSimilarity(params.query_vector, 'vector') + 1.0",
              "params": { "query_vector": [0.15, 0.25, 0.35] }
            }
          }
        }
      ]
    }
  },
  "size": 10,
  "sort": [{ "_score": "desc" }]
}

This query combines must (fuzzy full-text), nested filters, exact terms, ranges, and script_score for KNN vectors. Boosts title x2; fuzziness handles typos. Curl: curl -u elastic:PASS -X GET "localhost:9200/learni_articles/_search" -H 'Content-Type: application/json' -d @advanced-query.json. Hybrid text+vector scoring.

Aggregations and Analytics

Move to aggs for dashboards: nested terms, date_histogram buckets. This powers Kibana without SQL.

Hierarchical Aggregations

aggregations.json
{
  "aggs": {
    "tags_agg": {
      "terms": {
        "field": "tags.keyword",
        "size": 10,
        "order": { "doc_count": "desc" }
      },
      "aggs": {
        "avg_nested_score": {
          "nested": { "path": "nested_tags" },
          "aggs": {
            "avg_score": { "avg": { "field": "nested_tags.score" } },
            "top_tags": {
              "terms": {
                "field": "nested_tags.name.keyword",
                "size": 5
              }
            }
          }
        },
        "date_buckets": {
          "date_histogram": {
            "field": "timestamp",
            "calendar_interval": "month",
            "format": "yyyy-MM"
          }
        }
      }
    }
  },
  "size": 0
}

Nested aggs: terms on tags, nested avg/top on sub-tags, and temporal histogram. size:0 optimizes for pure aggs. Curl as before; ideal for global metrics without pagination.

Performance Tuning and ILM Policy

ilm-policy.json
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "set_priority": { "priority": 100 }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": { "number_of_shards": 1 },
          "forcemerge": { "max_num_segments": 1 },
          "set_priority": { "priority": 50 }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "set_priority": { "priority": 0 },
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": { "delete": {} }
      }
    }
  }
}

Create ILM policy: hot->warm (shrink/merge), cold (freeze), auto-delete. Curl PUT /_ilm/policy/warm_delete_policy -d @ilm-policy.json. Apply to index settings; reduces cold data storage by 80%.

Best Practices

  • Shard sizing: Max 20-50GB per primary shard; calculate docs / (shards * replicas).
  • Custom analyzers: Always test with _analyze API before mapping.
  • Monitoring: Enable _cluster/stats and integrate Prometheus exporter.
  • Backups: Snapshot to S3/MinIO with PUT /_snapshot/my_repo.
  • Vectors: Use HNSW for >1024 dims; cosine over dot_product for normalization.

Common Errors to Avoid

  • Over-sharding: >50 shards/node kills heap; start low and reshard.
  • Mapping explosion: Avoid dynamic: true in prod; use strict mappings.
  • Yellow health: Unallocated replicas? Add nodes or PUT /index/_settings {"number_of_replicas":1}.
  • OOM kills: Heap >32GB? Disable compressed oops; monitor GC logs.

Next Steps

Dive into Elastic Cloud for managed clusters or integrate with LangChain for RAG. Check the official Elastic 8.15 docs. Explore our expert Learni trainings on Elasticsearch and OpenSearch for ECE certification.