snapblocs use Elastic Observability for providing Observability of the running Data Platform. Observability of the Data Platform ensures that DevOps can easily detect undesirable behaviors (service downtime, errors, slow responses, etc.). And have actionable information to effectively pin down root cause (detailed event logs, granular resource usage metrics, and application performance monitoring). snapblocs use ELK Stack, MetricBeat, FileBeat, and APM to bring logs, metrics, and APM traces together in an embedded Observability solution within the Data Platform so DevOps can monitor and react to events happening inside a running Data Platform environment.
Choose either option to integrate Elastic with the Data Platform Stack.
Create a new Elastic cluster within the Data Platform Stack.
Integrate externally created Elastic cluster with the Data Platform Stack.
(Option 1) Create a new Elastic cluster within the Data Platform (Recommended option)
Choose this option to create a new Elastic cluster managed by the Data Platform Kubernetes infrastructure and seamlessly integrate with other Data Platform components without affecting external applications that are outside the Data Platform.
"ELK" is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. Kibana lets users visualize data with charts and graphs in Elasticsearch.
Elasticsearch is a highly scalable open-source full-text search and analytics engine. Store, search and analyze big volumes of data quickly and in near real-time. It is generally used as the underlying engine/technology that powers applications with complex search features and requirements.
Logstash is an open-source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash."
Kibana is an open-source data visualization dashboard for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line, scatter plots, or pie charts and maps on top of large volumes of data.
Motivation for CPU requests and limits
Configure the CPU requests and limits of the Containers that run in the cluster, efficiently using the CPU resources available on the cluster nodes. By keeping a Pod CPU request low, it gives the Pod a good chance of being scheduled. Having a CPU limit that is greater than the CPU request, accomplishes two things:
The Pod can have bursts of activity, making use of CPU resources that happen to be available.
The amount of CPU resources a Pod can use during a burst of activity is limited to a reasonable amount.
If CPU limit is not specified for a Container, it can result in one of these situations:
The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the node where it is running.
The Container runs in a namespace with a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.
Secure access to the Kibana dashboard using User Name and Password. The original Kibana dashboard provided by Elastic is ordinarily accessible by anyone who can access the Kibana URL, opening up the security and maintenance issues due to unauthorized personnel accessing the Kibana dashboard. snapblocs adds basic authentication for the Kibana dashboard login using User Name and Password defined at the stack configuration.
Use Timeout in Seconds to control how long the Kubernetes controller waits for the successful creation of Elastic and its sub-components. If it times out, the Elastic component has failed. If it times out frequently, increase this value
Minimum Master (Eligible) Nodes. The master node is responsible for lightweight cluster-wide actions such as creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. Cluster health requires a stable master node. High availability (HA) clusters require at least three master-eligible nodes, at least two of which are not voting-only nodes. Such a cluster will be able to elect a master node even if one of the nodes fails.
(Option 2) Integrate externally created Elastic cluster with a Data Platform Stack
Choose this option for an external Elastic cluster within the Data Platform. Be aware that any changes and usages that occur during running Data Platform may impact external applications (system) that depend on the Elastic cluster.