What is StreamSets DC+ Platform
StreamSets Data Collector+ Platform provides data ingestion pipelines integrated with ETL processing for streaming, batch, and change data capture (CDC). This allows a user to use snapblocs dashboard UI to create a data ingestion pipeline stack by configuring the data ingestion options such as input source and target destination settings and operational configuration for target runtime environment such as Kubernetes on Cloud platform (AWS, etc.). Therefore, you can focus your business applications without managing StreamSets DC clusters.
snapblocs dashboard allows a user to manage the lifecycle of StreamSets Data Collector stack (start, update, terminate) and monitor the runtime behavior of the stack through build-in Observability features that measure how well data are moving how internal states of the components are healthy. Also, it allows you to perform overall administrative tasks like scaling up the cluster to increase its capacity.
StreamSets DC+ Platform includes the following software components:
AWS EKS is used to provision a StreamSets Data Collector+ Platform stacks using the customer's AWS account.
Google GKE is used to provision a StreamSets Data Collector+ Platform stacks using the customer's Google account.
snapblocs provisions a StreamSets Data Collector+ following the well-architecture guides (i.e., AWS Well-Architected for AWS, Google Cloud Architecture Framework, etc.) for provisioning and configuring production-grade Kubernetes, and deploying workloads into the clusters. It provides benefits from patterns that worked for many customers who have gone to production. Also, snapblocs make it easy to get started and easy to do the right thing. StreamSets Data Collector is a low-latency ingest infrastructure tool that lets you create continuous data ingest pipelines using a drag and drop UI within an integrated development environment (IDE). It is used to ingest source data in-stream or batch to other data platforms such as Data Lake or different on-prem or cloud datacenter. Elastic Stack is used to provide observability (monitoring, alerting, APM) for answering questions about what's happening inside the system just by observing the outside of the system. Grafana is open-source visualization and analytics software. It allows you to query, visualize, and explore your metrics. It also allows you to set an alert for quickly identifying problems in your system moments so that you can minimize disruption to your services.
Related Articles
How to configure StreamSets DC+ Platform
You can initiate configuring a new stack from a few different places: On the Home page, "Configure stack" button on the Stacks statistics block. On the Stacks page, the "Configure new stack" button on the top page On the Projects page, select Project ...
How to customize StreamSets DC+ Platform
You can create a new stack for StreamSets DC+ Platform by following here. Test / Proof of Concept (POC) Stack To create a simple test StreamSets DC+ Platform stack, set the following parameters. On AWS & K8S component: Provider Key Name: Choose the ...
What are common use cases for StreamSets DC+ Platform
Example use cases for StreamSets DC+ Platform Ingest and Transform Data in Any Cloud Modernize Data Lakes and Data Warehouses without hand-coding or special skills, and feed analytics platforms with continuous data from any source. StreamSets Data ...
How to configure a StreamSets Data Collector component
Choose one of two options to integrate StreamSets Data Collector with the Data Platform Stack. Create a new StreamSets Data Collector cluster within the Data Platform Stack Integrate externally created StreamSets Data Collector cluster with Data ...
How to configure a new stack for Data Flow Platform
You can initiate configuring a new stack from a few different places: On the Home page, "Configure stack" button on the Stacks statistics block. On the Stacks page, the "Configure new stack" button on the top page On the Projects page, select Project ...