Choose one of two options to integrate Apache NiFi with the Data Platform Stack.
Create a new NiFi cluster within the Data Platform Stack
Integrate externally created NiFi cluster with Data Platform Stack
(Option 1) Create a new NiFi cluster within the Data Platform (recommended option)
Choose this option to create a new NiFi cluster managed by the Data Platform Kubernetes infrastructure and integrate seamlessly with other Data Platform components without affecting external applications outside the Data Platform.
As NiFi enables to automate and manage the flow of data between systems, NiFi is mainly used for data acquisition, transportation, a guarantee of data delivery, capable of handling complicated and diverse data flows, inclusive of data-based event processing with buffering and prioritized queuing. NiFi is highly configurable, and flow can be modified at run-time, enabling organizations to make immediate changes to tighten the feedback loop. NiFi also provides data provenance capabilities that enable data flow to be tracked from end to end.
NiFi provides a user-friendly drag-and-drop user interface that allows data administrators to quickly build out data flow, and to simplify operations with real-time control and monitoring. NiFi is designed to be a Big Data analytics tool, working with structured, unstructured, or semi-structured data of any size and format, with or without a schema.
NiFi Settings
Replicas: Number of Nifi nodes.
Increase the replicas higher than 1 for H/A and F/O
NiFi Auth Configuration Storage
Storage capacity for the Nifi Auth Configuration.
Default: 100Mi
Data Directory Storage
Storage capacity for the 'data' directory, which is used to hold things such as the flow.xml.gz, configuration, state, etc.
Default: 1Gi
FlowFile Repository Storage
Storage capacity for the FlowFile repository.
Default: 10Gi
Content Repository Storage
Storage capacity for the Content repository
Default: 10Gi
Provenance Repository Storage
Storage capacity for the Provenance repository.
Default: 10Gi
NiFi Log Storage
Storage capacity for nifi logs
Default: 5Gi
NiFi Java Heap Memory
Amount of memory to give the NiFi java heap
Default: 2
Motivation for CPU requests and limits
Configure the CPU requests and limits of the Containers that run in the cluster, efficiently using the CPU resources available on the cluster nodes. By keeping a Pod CPU request low, it gives the Pod a good chance of being scheduled. Having a CPU limit that is greater than the CPU request, accomplishes two things:
The Pod can have bursts of activity, making use of CPU resources that happen to be available.
The amount of CPU resources a Pod can use during a burst of activity is limited to a reasonable amount.
If CPU limit is not specified for a Container, it can result in one of these situations:
The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the node where it is running.
The Container runs in a namespace with a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.
(Option 2) Integrate externally created NiFi cluster with Data Platform Stack
Choose this option if the NiFi cluster is external and the NiFi cluster is within the Data Platform. Be aware that any changes and usages that occur while running the Data Platform may impact external applications (systems) that depend on this NiFi cluster.