ECK Elastic¶

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: September 2024

We had and issue when cluster has yellow state. In order to troubleshoot - Red Yellow Cluster Status

Check cluster status:

elasticsearch@elasticsearch-eck-elasticsearch-es-default-0:~$ curl -X GET "localhost:9200/_cluster/health?filter_path=status,*_shards&pretty"
{
  "status" : "yellow",
  "active_primary_shards" : 406,
  "active_shards" : 477,
  "relocating_shards" : 0,
  "initializing_shards" : 6,
  "unassigned_shards" : 330,
  "delayed_unassigned_shards" : 0
}

List unassigned shards

elasticsearch@elasticsearch-eck-elasticsearch-es-default-2:~$ curl -XGET 'http://localhost:9200/_cluster/health'
{"cluster_name":"elasticsearch-eck-elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":3,"number_of_data_nodes":3,"active_primary_shards":406,"active_shards":406,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":407,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":49.938499384993854}elasticsearch@elasticsearch-eck-elasticsearch-es-default-2:~$ curl -XGET 'http://localhost:9200/_cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state'
index                                                         shard prirep state      node                                         unassigned.reason
fairbid-sdk-events-2999-2024.03.15                            0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2023.09.26                            0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2023.12.05                            0     r      UNASSIGNED                                              NODE_LEFT
.ds-.kibana-event-log-8.9.0-2024.07.01-000018                 0     r      UNASSIGNED                                              NODE_LEFT
.kibana_task_manager_8.9.0_001                                0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2024.02.14                            0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2024.03.20                            0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2023.10.02                            0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2024.02.04                            0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2024.07.15                            0     r      UNASSIGNED                                              NODE_LEFT
.fleet-file-data-agent-000001                                 0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2024.08.18                            0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2024.04.25                            0     r      UNASSIGNED                                              NODE_LEFT
fairbid-sdk-events-2999-2023.10.23                            0     r      UNASSIGNED                                              NODE_LEFT

Check possible issue

elasticsearch@elasticsearch-eck-elasticsearch-es-default-2:~$ curl -X GET "localhost:9200/_cluster/allocation/explain?filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*&pretty" -H 'Content-Type: application/json' -d'
> {
>   "index": "fairbid-sdk-events-2999-2024.03.15",
>   "shard": 0,
>   "primary": false
> }
> '
{
  "index" : "fairbid-sdk-events-2999-2024.03.15",
  "node_allocation_decisions" : [
    {
      "node_name" : "elasticsearch-eck-elasticsearch-es-default-0",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], having less than the minimum required [44.2gb] free space, actual free: [38.4gb], actual used: [86.9%]"
        }
      ]
    },
    {
      "node_name" : "elasticsearch-eck-elasticsearch-es-default-2",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], having less than the minimum required [44.2gb] free space, actual free: [29.3gb], actual used: [90%]"
        }
      ]
    },
    {
      "node_name" : "elasticsearch-eck-elasticsearch-es-default-1",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[fairbid-sdk-events-2999-2024.03.15][0], node[lxK3ut4IRpOdgwQ8kBpHBg], [P], s[STARTED], a[id=iRbaVLn1RWmOEVh2V6Qi3g], failed_attempts[0]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], having less than the minimum required [44.2gb] free space, actual free: [26.7gb], actual used: [90.9%]"
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are [2] copies of this shard and [3] values for attribute [k8s_node_name] ([gke-gke-core-fairbid-nap-e2-standard--3c098cc9-wkb8, gke-gke-core-fairbid-nap-e2-standard--4755c446-c7p7, gke-gke-core-fairbid-nap-e2-standard--a76eb336-bwma] from nodes in the cluster and no forced awareness) so there may be at most [1] copies of this shard allocated to nodes with each value, but (including this copy) there would be [2] copies allocated to nodes with [node.attr.k8s_node_name: gke-gke-core-fairbid-nap-e2-standard--a76eb336-bwma]"
        }
      ]
    }
  ]
}

I have increased disk size for elastic to solve high watermark issue.