What is Data Transformation Platform

What is Data Transformation Platform

Data Transformation Platform provides a managed service using Kubernetes to convert data from one format or structure into another format or structure. This allows a user to use dpStudio UI to create a Data Transformation Stack by configuring the data transformation options such as data source/target locations and data transformation settings. Data Transformation can be done in a real-time stream mode using a data pipeline or in a bulk batch mode using ETL, depending on the settings and use cases.

Also, dpStudio allows a user to manage the lifecycle of Data Transformation Stack (start, update, terminate) and monitor the runtime behavior of the Stack through build-in Observability features that monitor data flow metrics and observe the internal health states of the components.



Data Transformation Platform includes the following software components:
  1. AWS EKS is used to provision stacks of Data Transformation Platform using the customer's AWS account.
    Google GKE is used to provision stacks of Data Transformation Platform using the customer's Google account.
    snapblocs provisions Data Transformation Platforms following the well-architecture guides (i.e., AWS Well-Architected for AWS, Google Cloud Architecture Framework, etc.) for provisioning, configuring production-grade Kubernetes clusters, and deploying workloads into the clusters. It provides benefits from patterns that have been used successfully for many customers in production environments. Also, snapblocs makes it easy to get started and easy to configure properly.
  2. Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management. It is used to deploy selected Components.
  3. Elastic is used to provide observability (monitoring, alerting, APM) for answering questions about what's happening inside the system by observing the outside of the system.
  4. Grafana is used to build visualizations and analytics to query, visualize, explore metrics, and set alerts for quickly identifying system problems to minimize disruption to services.
  5. Sparkopen-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
  6. Airflowopen-source workflow management platform to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface.
  7. Kafka is used to building real-time data pipelines and streaming applications by integrating data from multiple sources and locations into a single, central Event Streaming Platform.
  8. StreamSets Data Collector is a low-latency ingest infrastructure tool used to create continuous data ingest pipelines using a drag and drop UI within an integrated development environment (IDE). It is used to ingest source data in-stream or batch to other data platforms such as Data Lake, on-prem, or cloud datacenter.


    • Related Articles

    • What are common use cases for Data Transformation Platform

      Example use cases for Data Transformation Platform Data Streaming Streaming data is the continuous flow of data generated by various sources. The data streaming can be processed, stored, analyzed, and acted upon when generated in real-time. The data ...
    • What is Data Lake Platform

      Data Lake Platform provides a managed service using Kubernetes to provide integrated solutions (Data Flow, Data Transformation, Data As A Service) that ingests data from multiple data sources into a Data Lake. It provides the data workflow to ...
    • What are common use cases for Data Lake Platform

      Example use cases for Data Lake Platform Provides a managed service using Kubernetes for delivering a set of integrated solutions (Data Flow, Data Transformation, Data as a Service) that ingests data from multiple data sources into a Data Lake. ...
    • What is Data Flow Platform

      Data Flow Platform provides a SaaS-managed service using Kubernetes for moving data from various input data sources to target data destinations in-stream or bulk mode. This allows a user to use snapblocs UI to create a Data Flow Stack by configuring ...
    • What is Data as a Service Platform

      DaaS Platform is a self-service analytics platform on cloud using Kubernetes to simplify access, accelerate analytical processing, secure and masking data, curate datasets, and provide a unified catalog of data across all data sources. This allows ...