Puzzle ITC - Interruption of Kubernetes Cluster – Incident details

Interruption of Kubernetes Cluster

Resolved
Major outage
Started about 4 years agoLasted about 1 hour

Affected

Puzzle Services

Major outage from 1:08 PM to 2:13 PM

Puzzle SSO (Keycloak)

Major outage from 1:08 PM to 2:13 PM

CodiMD

Major outage from 1:08 PM to 2:13 PM

Updates
  • Resolved
    Resolved

    Stable again. Closing incident

  • Monitoring
    Monitoring

    The master node remains stable, we continue to monitor. All applications running on the k8s cluster are up and running

  • Identified
    Identified

    One of the master nodes was not ready anymore. This resulted in short outage of the Kubernetes control plane. This then (for yet unknwon reason) also affected some of the applicactions. I'm currently investigating what exacly happened. I rebooted the vm and it seems stable again.

  • Investigating
    Investigating

    Our Kubernetes Cluster has currently trouble which seems to affect some application. So far we see outages on SSO and Codi, possible more applications.

    investigation started