Choose an option to integrate Dremio with the Data Platform Stack.
Create a new Dremio cluster within the Data Platform Stack
Integrate externally created Dremio cluster with Data Platform Stack
(Option 1) Create a new Dremio cluster within the Data Platform (Recommended option)
Choose this option to create a new Dremio cluster managed by the Data Platform Kubernetes infrastructure and seamlessly integrate with other Data Platform components without affecting external applications outside the Data Platform.
There are several Dremio sub-components to be configured.
Master Coordinator
The master coordinator node has the unique function of managing metadata. The master coordinator node is also responsible for:
Query planning
Serving Dremio’s UI
Handling client connections, including the REST API
Master Coordinator and Executor Settings
See the memory and CPU requirements for the Coordinator and Executor here.
Zookeeper
Dremio utilizes Apache ZooKeeper behind the scenes for cluster coordination.
A Zookeeper cluster is installed externally, not as an embedded ZooKeepeer on the Coordinator node to provide High Availability by default.
See the configuration for running ZooKeeper in production here.
NOTE:
Motivation for CPU requests and limits
By configuring the CPU requests and limits of the Containers that run in a cluster, make efficient use of the CPU resources available on cluster Nodes. By keeping a Pod CPU request low, gives the Pod a good chance of being scheduled. Having a CPU limit that is greater than the CPU request accomplishes two things:
Not specifying a CPU limit for a Container can result in one of the following:
The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running.
The Container runs in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.
(Option 2) Integrate externally created Dremio cluster with a Data Platform Stack
Choose this option for an external Dremio cluster within the Data Platform. Be aware that any changes and usages while running the Data Platform may impact external applications (system) that depend on the Dremio cluster. For example, the runtime load to the Dremio cluster added by the Data Platform may impact the performance of those external applications.
What's Next?