Pods Scheduling

Pod Scheduling Fundamentals #

Stackbooster enhances pod scheduling within Kubernetes, offering a cost-effective solution for managing diverse node types and maximizing resource efficiency. This guide highlights how Stackbooster leverages both Kubernetes’ built-in capabilities and additional, unique enhancements to optimize scheduling. By allowing Stackbooster to manage pod deployment, users can leverage the full spectrum of available cloud resources and fine-tune pod placement based on:

Geographical Needs: Ensuring pods run in zones with necessary applications or storage.
Hardware Requirements: Allocating nodes with specific processors or other required hardware.
High Availability Strategies: Enhance overall service availability, even in the event of node or zone failures.

Stackbooster supports a variety of mechanisms to ensure that pods are scheduled effectively according to user-defined constraints and requirements.

Node Selection Techniques #

Node Selector:
This basic form of scheduling allows pods to be constrained to nodes with specified labels, ensuring they run on nodes that meet specific criteria. See for details the Kubernetes Docs.

nodeSelector:
  topology.kubernetes.io/zone: us-east-1a

Node Affinity:
Node affinity in Kubernetes allows you to specify rules for pod placement based on node attributes. This is useful for situations where you want to ensure that certain pods run on nodes that satisfy specific criteria, or you prefer them to run under certain conditions but it’s not mandatory. Here’s an expanded example with both required (hard affinity) and preferred (soft affinity) scheduling:

apiVersion: v1
kind: Pod
metadata:
  name: sample-affinity-pod
spec:
  containers:
  - name: main-container
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: "kubernetes.io/arch"
            operator: "In"
            values:
            - "arm64"
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: "topology.kubernetes.io/zone"
            operator: "In"
            values:
              - "us-east-1a"
              - "us-east-1b"

Pod Affinity and Anti-Affinity:
These settings manage pod placement in relation to other pods, either attracting them to or repelling them from certain nodes. This example helps ensure that certain pods are co-located in the same node for performance benefits, while others are spread across different nodes for fault tolerance.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: nginx-container
          image: nginx:latest
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: web
              topologyKey: "kubernetes.io/hostname"
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: "app"
                      operator: "In"
                      values:
                        - "db"
                topologyKey: "kubernetes.io/hostname"

Topology Spread:
This feature spreads pods across different physical locations, such as data centers or regions, to enhance service availability and resilience.

The topology spread scheduling mechanism is currently in the BETA phase within Stackbooster’s Autoscaler.

Well known Labels #

Stackbooster recognizes a comprehensive set of labels that enhance the scheduling accuracy and node selection process:topology.kubernetes.io/zone: Used to specify the geographic zone of the node, aligning pod placement with regional requirements.

topology.kubernetes.io/zone and failure-domain.beta.kubernetes.io/zone: Helps in defining the failure domain of nodes, crucial for disaster recovery and high availability setups.
node.kubernetes.io/instance-type: Identifies the instance type of the node, allowing pods to be scheduled on nodes that match their performance and resource requirements.
kubernetes.io/os: Specifies the operating system of the node, ensuring that pods are compatible with the node’s OS.
kubernetes.io/arch: Indicates the architecture of the node (e.g., amd64, arm64), which is essential for deploying pods with architecture-specific dependencies.
kubernetes.io/hostname: Defines the hostname of the node, useful for advanced scheduling decisions where specific nodes are targeted for their unique characteristics.

User-Defined Labels #

In addition to the predefined labels, Stackbooster supports custom user-defined labels. This feature allows for even more tailored scheduling capabilities. In order to recognize and schedule nodes using these user-defined labels, they must be explicitly included in the Node Templates restrictions using the ‘Exists’ operator.

Resource Requests and Limits #

In Stackbooster, the required node capacity is calculated based on the resource requests specified by the pods. If resource requests are not specified, the system will use the defined resource limits as a fallback to determine the necessary capacity.

Persistent Volume Management #

Stackbooster supports Persistent Volume Topology, which is crucial for ensuring that data persists in specific zones or regions to meet compliance requirements, optimize performance, or manage data locality in a distributed environment. This feature allows you to specify where to provision persistent volumes based on the pod’s location, ensuring that the data resides close to where it is consumed for increased access speed and resilience. ExampleL Deploying a Stateful Application with Topology-Aware Volume Provisioning

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
spec:
  serviceName: "db-service"
  replicas: 3
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      labels:
        app: database
    spec:
      containers:
      - name: mysql
        image: mysql:5.7
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-storage
        persistentVolumeClaim:
          claimName: mysql-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: regional-disk
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      usage: mysql-data
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: regional-disk
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: regional-pd
  zones: us-west1-a, us-west1-b
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.kubernetes.io/zone
    values:
    - us-west1-a
    - us-west1-b

In this deployment, the StatefulSet named database uses a PVC named mysql-pvc to ensure that each instance of the MySQL database has access to persistent storage. The associated StorageClass named regional-disk directs Kubernetes to provision storage on regional disks that span multiple specified zones, in this case, us-west1-a and us-west1-b. This setup ensures that the data is resilient and performs optimally by being close to the consuming application, minimizing latency and enhancing performance.