Troubleshooting for DevOps

Troubleshooting for DevOps

Category: Stack Deploy & Teardown

Title
Detail
An error occurred in processing the request. Provision ID should be provided.
Detail:
When deploying a stack, the Provisioning Log shows the error message on the top of the Log.

Solution:
Close the Provisioning Log window. After a minute, click “Provisioning Log” again.

Error creating EKS Cluster: Subnets specified must be in at least two different AZs.

Detail:

Fail to deploy with VPC which doesn’t have Subnets specified must be in at least two different AZs.

Error messages in Provision Log:

2020-12-24 00:45:31,185 pulumi:pulumi:Stack (snapblocs_dpstudio_eks-66d528ce-435b-11eb-95b4-8b47a2310260):
2020-12-24 00:45:31,185 error: update failed
2020-12-24 00:45:31,185
2020-12-24 00:45:31,185 aws:eks:Cluster (snapblocs-dpstudio-eks-eksCluster):
2020-12-24 00:45:31,185 error: 1 error occurred:
2020-12-24 00:45:31,185 * error creating EKS Cluster (snapblocs-dpstudio-eks-eksCluster-cc01f23): InvalidParameterException: Subnets specified must be in at least two different AZs
2020-12-24 00:45:31,185 {
2020-12-24 00:45:31,186 RespMetadata: {
2020-12-24 00:45:31,186 StatusCode: 400,
2020-12-24 00:45:31,186 RequestID: "ad3e3392-7c2b-4919-b3ca-e94c2effcfad"
2020-12-24 00:45:31,186 },
2020-12-24 00:45:31,186 ClusterName: "snapblocs-dpstudio-eks-eksCluster-cc01f23",
2020-12-24 00:45:31,186 Message_: "Subnets specified must be in at least two different AZs"
2020-12-24 00:45:31,186 }

Solution:
Select the VPC (i.e., the default VPC), which has Subnets specified must be in at least two different AZs.


Deploy failed due to unexpected error. 

Detail:
Sometimes, internally Deploy might have failed while creating a certificate or something else.

Solution:
Try to re-deploy once more. If it continuously fails, please contact Customer support.

Kubernetes:core/v1:Namespace ingress-controller-namespace creating Retry #0; creation failed:
Get "https://BABAD86F8447AB0A69B2467C80CF7563.gr7.us-west-2.eks.amazonaws.com/api?timeout=32s": dial tcp 35.164.96.4:443: connect: connection refused.

Detail:
This EKS cluster is not reachable.

Solution:
Re-deploy the stack again, and this error wouldn’t show up.

Stack is Deploying or Tearing down for more than 45 min

Detail:
Sometimes, due to unexpected issues, a stack goes into a hanging state.

Solution:
A stack will be auto-corrected to its destined state after 1 hr from deployment time after visiting the stack page. If it is not corrected and has been more than 90 min, Please contact customer service.

Teardown fails due to ”DependencyViolation"
Detail:

Teardown failed due to
”DependencyViolation: resource sg-06107075a60bbc188 has a dependent object status code: 400, request id: 1b8cd4d2-bc33-417d-b8ac-5eb787859f06”

Solution:
  1. Go to AWS console EC2.

  2. Go to security groups,

  3. A goal is to delete the network interface associated with the security group.
    3.1) When deleting the security group(name will found in the snapblocs logs), it will show the network interface is associated with the security group. Click on a network interface and delete it from an AWS console.

  4. Go to snapblocs stack dashboard, click on teardown again.


Category: Component Configuration

Title
Detail
Not able to edit a component configuration when failing to deploy.
Detail:
React JSON form becomes a view-only mode, so users can’t fix the deployment error when it fails to deploy.

Solution:
Create a new stack configuration.
VPC doesn't show the Subnet list on AWS & K8S configuration
Detail:
When an invalid Access Key is selected, it should show an empty VPC list, but it doesn’t refresh the VPC and shows the list of VPC of the previous Access Key with an empty Subnet list.

Solution:
Select any other region, select any VPC, and return to the original Region and select the VPC.

Category: Component Upgrade

Title
Detail
Not able to upgrade Kafka version
Detail:

During upgrading Kafka version, the Kafka subcomponent(s) fail to initiate.

Cause of failure:
Some Kafka components (Kafka brokers, SchemaRegisty, etc.) may not be instantiated on K8S in the right sequence order while upgrading/downgrading Kafka.

Therefore, the connection to those Kubernetes objects is not available and failed to provide the readiness probe.

Solution:
  1. Increase K8S node size a little.

  2. Pause and Resume the stack

Unsupported Kubernetes minor version upgrade from v.x1 to v.x2
Detail:
When upgrading the Kubernetes version for an existing Kubernetes cluster, observe the following requirements:
  1. You cannot decrease the version. For example, you cannot downgrade from Kubernetes v1.18 to v1.17.
  2. You can upgrade the minor version, but only incrementally. Skipping minor versions is not supported. For example, you can upgrade from Kubernetes v1.17 to v1.18, but you cannot upgrade from Kubernetes v1.16 to v1.18.
  3. You cannot change the major version, such as upgrading from v1.18 to v2.0.
Solution:
Clone the stack and test on the cloned stack first before changing (upgrading or downgrading) the Kubernetes version on the target system.

Category: Security

Title
Detail
None (including Root admin, Account admin) can access the Access Control page on dpstudio due to the DENY access to Access Control policy.
Detail:
When accidentally adding a DENY access to the Access Control policy, none (including Root admin, Account admin) can access the Access Control page on dpstudio to revert the DENY policy.
There is no way to fix this through dpstudio UI.
Identify the policy_id on the policy table which causes the problem, and delete the policy.
Note: It should be safe to remove the policy. That delete will cascade and delete the role-policy and role_actions, so it should clean up what is needed.

Solution:
The snapblocs product support team authorized must manually delete the DENY policy on the “policy” table.

Category: Network

TitleDetail
The provider update token has expired.
Detail:
This error occurs when there is network congestion within the Snapblocs backend servers.

Solution:
The customer can re-deploy the stack to overcome the issue.




    • Related Articles

    • What is DevOps

      DevOps combines cultural philosophies, practices, and tools that increase an organization's ability to develop and deliver applications efficiently and accurately at high velocity and provide continuous delivery using Agile software development. It ...
    • Springboard for DevOps and DataOps

      snapblocs is the springboard for DevOps and DataOps teams to immediately start building and operating production-level Data Platforms without spending time designing architecture and implementing it from scratch. Achieve successful evolution of ...
    • Self-Service Cloud Platform

      Self-Service Infrastructure accelerates the testing and development efforts while reducing IT management and development costs. What is Self-Service? DevOps and DataOps teams want the ability to quickly launch cloud infrastructure or an entire ...
    • What is DataOps

      DataOps is a set of agile practices and technologies that operationalizes data management, collaboration, integration, and automation to ensure resiliency and agility despite constant change. It combines the DevOps principles of continuous delivery ...
    • Built-in Observability

      Observability is the measure of how well its external outputs can determine the internal states of a system. In other words, Observability describes the degree to which systems and services are behaving based on collected data. ​ There are three ...