What is Data Flow Platform
Data Flow Platform provides a SaaS-managed service using Kubernetes for moving data from various input data sources to target data destinations in-stream or bulk mode. This allows a user to use snapblocs UI to create a Data Flow Stack by configuring the data flow options, such as input source and target destination settings, and operational configuration for the target runtime environment. Kubernetes on Cloud platform (AWS, etc.)
Also, the snapblocs UI allows a user to manage the lifecycle of the Data Flow Stack (start, update, terminate) and monitor the runtime behavior of the Stack through built-in observability features that measure how well data is moving and how internal states of the components are healthy.
Data Flow Platform includes the following software components:
AWS EKS is used to provision stacks of Data Flow Platform using the customer's AWS account.
Google GKE is used to provision stacks of Data Flow Platform using the customer's Google account.
snapblocs provisions Data Flow Platforms following the well-architecture guides (i.e., AWS Well-Architected for AWS, Google Cloud Architecture Framework, etc.) for provisioning, configuring production-grade Kubernetes clusters and deploying workloads into the clusters. It provides benefits from patterns that have been used successfully for many customers in production environments. Also, snapblocs makes it easy to get started and easy to configure properly.
Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management. It is used to deploy selected Components. Kafka is used to building real-time data pipelines and streaming applications by integrating data from multiple sources and locations into a single, central Event Streaming Platform. Elastic is used to provide observability (monitoring, alerting, APM) for answering questions about what's happening inside the system just by observing the outside of the system. Grafana is used to build visualizations and analytics to query, visualize, explore metrics, and set alerts for quickly identifying system problems to minimize disruption to services. StreamSets Data Collector is a low-latency ingest infrastructure tool used to create continuous data ingest pipelines using a drag and drop UI within an integrated development environment (IDE). It is used to ingest source data in-stream or batch to other data platforms such as Data Lake, on-prem, or cloud datacenter.
How to customize Data Flow Platform
After configuring a new stack of Data Flow Platform by following this, you can customize the stack. Test / Proof of Concept (POC) Stack To create a simple test Data Flow stack on cloud prividers, set the following parameters. AWS and K8S Component: ...
How to configure a new stack for Data Flow Platform
You can initiate configuring a new stack from a few different places: On the Home page, "Configure stack" button on the Stacks statistics block. On the Stacks page, the "Configure new stack" button on the top page On the Projects page, select Project ...
What are common use cases for Data Flow Platform
Example use cases for Data Flow Platform Data Ingestion Stream Data Ingestion Ingest data in real-time as they arrive. Good for real-time data-driven decision processing for improving customer experience, minimizing fraud, and optimizing operations ...
What is Data Lake Platform
Data Lake Platform provides a managed service using Kubernetes to provide integrated solutions (Data Flow, Data Transformation, Data As A Service) that ingests data from multiple data sources into a Data Lake. It provides the data workflow to ...
What is Data Transformation Platform
Data Transformation Platform provides a managed service using Kubernetes to convert data from one format or structure into another format or structure. This allows a user to use dpStudio UI to create a Data Transformation Stack by configuring the ...