Guide for Node Template (Group) Creation/Edition

Guide for Node Template (Group) Creation/Edition #

A Node Template defines the configuration for worker nodes in your Kubernetes cluster. This guide walks you through creating and configuring node templates to meet your workload requirements.

Overview #

Node Templates allow you to standardize node configurations across your cluster, including:

  • EC2 instance specifications and lifecycle management
  • Networking and security settings
  • Kubernetes-specific configurations like taints and labels
  • Storage and access configurations

Getting Started #

Prerequisites #

Before creating a node template, ensure you have:

  • An active Kubernetes cluster
  • IAM instance profile with worker node permissions
  • Configured VPC with subnets and security groups
  • Appropriate permissions to create and manage node templates

Basic Workflow #

  1. Navigate to the Node Template Editor
  2. Configure basic information and scheduling
  3. Set core AWS infrastructure settings
  4. Configure instance specifications
  5. Apply advanced settings as needed
  6. Review the generated YAML
  7. Save your template

Configuration Sections #

Basic Information & Pod Scheduling #

Basic Information & Pod Scheduling

Template Name #

  • Required: Must be unique within your cluster namespace
  • Format: DNS-1123 compliant (^[a-z0-9]([-a-z0-9]*[a-z0-9])?$)
  • Length: 1-63 characters
  • Note: Template name becomes immutable after first save

Node Taints #

Taints prevent pods from being scheduled on nodes unless they have matching tolerations.

Configuration:

  • Key: Required, must be a valid Kubernetes label key
  • Value: Optional
  • Effect: Required, choose from:
    • NoSchedule: Prevents new pods from being scheduled
    • PreferNoSchedule: Tries to avoid scheduling pods
    • NoExecute: Evicts existing pods and prevents new ones

Best Practices:

  • Use taints for dedicated node pools (GPU nodes, high-memory workloads)
  • Avoid duplicate (key, effect) pairs
  • Consider using descriptive keys like workload-type or node-role

Custom Node Labels #

Labels help identify and select nodes for pod scheduling.

Configuration:

  • Format: key=value pairs
  • Key Requirements: Valid Kubernetes label keys, up to 63 characters
  • Value Requirements: Up to 63 characters
  • Reserved Prefixes: Avoid kubernetes.io/* and node.kubernetes.io/*

Example Labels:

environment=production
team=data-science
gpu-type=v100

Annotations #

Annotations store additional metadata that doesn’t affect scheduling.

Configuration:

  • Format: key=value pairs
  • Key Requirements: Valid Kubernetes annotation keys
  • Value Requirements: Up to 256 characters
  • Use Cases: Monitoring configuration, deployment information, contact details

Core Configuration #

Core Configuration: IAM profile, subnets, security groups

IAM Instance Profile ARN #

  • Required: Must exist in your integrated AWS account
  • Permissions: Must include worker node permissions for EKS
  • Format: Full ARN of the instance profile

Subnet Selection #

Configure which subnets your nodes will be deployed to.

Requirements:

  • Minimum: 1 subnet required
  • Location: All subnets must be in the cluster’s VPC and region
  • Recommendations:
    • Use subnets across multiple Availability Zones for high availability
    • Consider private subnets for security

Validation:

  • Warning if all subnets are in a single AZ
  • Error if any subnet is outside the cluster VPC

Security Groups #

Define network access rules for your nodes.

Requirements:

  • Minimum: 1 security group required
  • Location: Must be from the same VPC as the cluster
  • Recommendations: Include the cluster’s recommended node security group

Instance Configuration #

Instance Configuration: AMI family, lifecycle, allowed machine types

AMI Family #

Choose the Amazon Machine Image family for your nodes.

Options:

  • AL2: Amazon Linux 2 (stable, widely tested)
  • AL2023: Amazon Linux 2023 (latest features, optimized performance)

Compatibility: Automatically validated against your cluster’s Kubernetes version

Instance Lifecycle #

Control how your instances are provisioned and managed.

Options:

  • Spot with On-Demand Fallback (spot-od): Default, tries Spot instances first
  • On-Demand Only (od): Guaranteed capacity, higher cost
  • Spot Only (spot): Cost-optimized, scale-up may fail if no capacity

Capacity Planning:

  • Use spot-od for most workloads
  • Use od for critical, always-available services
  • Use spot for batch processing and cost-sensitive workloads

Allowed Machine Types #

Specify which EC2 instance types can be used.

Configuration:

  • Format: Comma-separated list (e.g., c6a.large,m5.xlarge,r5.2xlarge)
  • Validation: Types are checked against AMI architecture and region availability
  • Integration: Must align with any instance-type node restrictions

Selection Tips:

  • Consider your workload’s CPU, memory, and network requirements
  • Include multiple types for better Spot instance availability
  • Balance cost and performance needs

Advanced Configuration #

Advanced Configuration: SSH, block devices, node restrictions, kubelet params

SSH Access #

Configure SSH key access for node troubleshooting.

Configuration:

  • Optional: SSH key name from your AWS account/region
  • Security Warning: Production hardening warning displayed when enabled
  • Use Cases: Debugging, maintenance, custom software installation

Block Device Mappings #

Configure storage for your nodes.

Default Configuration:

  • Root Device: /dev/xvda
  • Size: 20 GiB (configurable 20-1024 GiB)
  • Type: gp3 (or gp2 for legacy)
  • Encryption: Per organization policy
  • Delete on Termination: Enabled

Additional Devices:

  • Add extra storage as needed
  • Configure encryption and KMS keys
  • Consider workload storage requirements

Node Restrictions #

Control pod placement using node selectors.

Standard Restriction Keys:

  • node.kubernetes.io/instance-type
  • kubernetes.io/arch
  • kubernetes.io/os
  • topology.kubernetes.io/zone

Extended Keys (platform-specific):

  • stackbooster.io/instance-family
  • stackbooster.io/instance-generation
  • stackbooster.io/instance-cpu
  • stackbooster.io/instance-memory
  • stackbooster.io/instance-gpu-name

Operators:

  • In: Pod must match one of the specified values
  • NotIn: Pod must not match any of the specified values

Validation Rules:

  • Zone values must match selected subnet AZs
  • Instance-type restrictions must be within allowed machine types
  • No empty value lists or contradictory duplicates
Node restriction key options dropdown

Kubelet Parameters #

Fine-tune Kubernetes node behavior.

Eviction Policies: Configure when pods are evicted due to resource pressure.

Hard Eviction:

  • Format: key=value pairs
  • Example: memory.available=100Mi, nodefs.available=10%
  • Behavior: Immediate pod eviction when threshold reached

Soft Eviction:

  • Format: key=value pairs
  • Behavior: Gradual eviction with grace period
  • Grace Periods: Define with key=duration format (e.g., memory.available=90s)

Supported Signals:

  • memory.available: Available memory
  • nodefs.available: Available disk space on root filesystem
  • imagefs.available: Available disk space for container images
  • pid.available: Available process IDs

Value Formats:

  • Memory: Mi (mebibytes), Gi (gibibytes)
  • Disk: % (percentage), Mi, Gi
  • Duration: s (seconds), m (minutes), h (hours)

Precedence: Hard eviction settings override soft eviction for the same resource

Generated YAML Preview #

The editor provides a real-time YAML preview that mirrors your configuration exactly. This includes all Kubernetes-standard fields and platform-specific extensions.

Generated YAML preview with Copy button

Key YAML Sections #

  • Metadata: Template name, namespace, labels
  • Instance Profile: AWS IAM configuration
  • Networking: Subnet and security group selectors
  • Compute: Instance types, lifecycle, AMI family
  • Storage: Block device mappings
  • Kubernetes: Labels, annotations, taints, kubelet settings
  • Restrictions: Node selector requirements

Copy Functionality #

Use the Copy button to copy the generated YAML to your clipboard for:

  • Version control tracking
  • Infrastructure as Code workflows
  • Manual kubectl applications
  • Template sharing and review

Validation and Error Handling #

Real-time Validation #

  • Inline Validation: Fields are validated as you type
  • Save Prevention: Save button remains disabled until all required fields are valid
  • Cross-field Validation: Related fields are checked for consistency

Common Validation Errors #

Infrastructure Mismatches:

  • Subnets or security groups from wrong VPC/region
  • AMI family incompatible with Kubernetes version
  • Instance types not available in selected regions

Configuration Conflicts:

  • Allowed machine types don’t match instance-type restrictions
  • Zone restrictions don’t match selected subnets
  • Contradictory node selector requirements

Capacity Warnings:

  • All subnets in single availability zone
  • Overly narrow constraints with Spot instances
  • Missing cluster-recommended security groups

Best Practices #

Security #

  • Use private subnets when possible
  • Include cluster-recommended security groups
  • Enable encryption for block devices
  • Avoid SSH access in production environments
  • Use IAM roles with minimal required permissions

Reliability #

  • Distribute nodes across multiple AZs
  • Use mixed instance lifecycle (spot-od) for resilience
  • Configure appropriate eviction thresholds
  • Monitor capacity warnings before saving

Cost Optimization #

  • Leverage Spot instances for non-critical workloads
  • Right-size instance types for workload requirements
  • Use appropriate storage types (gp3 vs gp2)
  • Implement node restrictions to optimize scheduling

Maintainability #

  • Use descriptive template names and labels
  • Document purpose with annotations
  • Standardize taints across similar node pools
  • Version control your template configurations

Troubleshooting #

Common Issues #

Template Save Failures:

  1. Check all required fields are completed
  2. Verify infrastructure resources belong to correct VPC/region
  3. Ensure AMI family is compatible with cluster version
  4. Review validation error summary

Node Scheduling Problems:

  1. Verify pod tolerations match node taints
  2. Check node selector requirements align with restrictions
  3. Confirm resource requests are within node capacity
  4. Review eviction policy settings

Capacity Issues:

  1. Broaden instance type selections
  2. Increase subnet/zone coverage
  3. Consider mixed lifecycle policies
  4. Monitor AWS service limits

Getting Help #

  • Review the generated YAML for configuration details
  • Check cluster compatibility matrices for version support
  • Consult AWS documentation for instance type specifications
  • Use platform monitoring tools to track node performance

Remember that node templates provide the foundation for your cluster’s compute capacity. Take time to understand your workload requirements and configure templates that balance performance, cost, and reliability for your specific use cases.