feat: allow more than one job per instance (#1184)

## Description We want to make the runners_capacity_per_instance variable since we want to be able to run more than 1 job on 1 EC2 instance. ## Migrations required No. ## Verification Using docker_autoscaler, set the capacity_per_instance from 1 to 4 in /etc/gitlab-runner/config.toml, exec gitlab-runner restart <runner>, and verify that only 1 instance is created when running a pipeline with 4 concurrent jobs. Also verified the reported instance in the 4 jobs is the same. --------- Co-authored-by: Depauw Natan <[email protected]>
cattle-ops · Sep 9, 2024 · f7f2ea2 · f7f2ea2
1 parent 3335b81
commit f7f2ea2
Show file tree

Hide file tree

Showing 3 changed files with 5 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -208,10 +208,10 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
 | <a name="input_runner_worker"></a> [runner\_worker](#input\_runner\_worker) | For detailed information, check https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section.<br><br>environment\_variables = List of environment variables to add to the Runner Worker (environment).<br>max\_jobs = Number of jobs which can be processed in parallel by the Runner Worker.<br>output\_limit = Sets the maximum build log size in kilobytes. Default is 4MB (output\_limit).<br>request\_concurrency = Limit number of concurrent requests for new jobs from GitLab (default 1) (request\_concurrency).<br>ssm\_access = Allows to connect to the Runner Worker via SSM.<br>type = The Runner Worker type to use. Currently supports `docker+machine` or `docker` or `docker-autoscaler`. | <pre>object({<br>    environment_variables = optional(list(string), [])<br>    max_jobs              = optional(number, 0)<br>    output_limit          = optional(number, 4096)<br>    request_concurrency   = optional(number, 1)<br>    ssm_access            = optional(bool, false)<br>    type                  = optional(string, "docker+machine")<br>  })</pre> | `{}` | no |
 | <a name="input_runner_worker_cache"></a> [runner\_worker\_cache](#input\_runner\_worker\_cache) | Configuration to control the creation of the cache bucket. By default the bucket will be created and used as shared<br>cache. To use the same cache across multiple Runner Worker disable the creation of the cache and provide a policy and<br>bucket name. See the public runner example for more details."<br><br>For detailed documentation check https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runnerscaches3-section<br><br>access\_log\_bucker\_id = The ID of the bucket where the access logs are stored.<br>access\_log\_bucket\_prefix = The bucket prefix for the access logs.<br>authentication\_type = A string that declares the AuthenticationType for [runners.cache.s3]. Can either be 'iam' or 'credentials'<br>bucket = Name of the cache bucket. Requires `create = false`.<br>bucket\_prefix = Prefix for s3 cache bucket name. Requires `create = true`.<br>create = Boolean used to enable or disable the creation of the cache bucket.<br>create\_aws\_s3\_bucket\_public\_access\_block = Boolean used to enable or disable the creation of the public access block for the cache bucket. Useful when organizations do not allow the creation of public access blocks on individual buckets (e.g. public access is blocked on all buckets at the organization level).<br>expiration\_days = Number of days before cache objects expire. Requires `create = true`.<br>include\_account\_id = Boolean used to include the account id in the cache bucket name. Requires `create = true`.<br>policy = Policy to use for the cache bucket. Requires `create = false`.<br>random\_suffix = Boolean used to enable or disable the use of a random string suffix on the cache bucket name. Requires `create = true`.<br>shared = Boolean used to enable or disable the use of the cache bucket as shared cache.<br>versioning = Boolean used to enable versioning on the cache bucket. Requires `create = true`. | <pre>object({<br>    access_log_bucket_id                     = optional(string, null)<br>    access_log_bucket_prefix                 = optional(string, null)<br>    authentication_type                      = optional(string, "iam")<br>    bucket                                   = optional(string, "")<br>    bucket_prefix                            = optional(string, "")<br>    create                                   = optional(bool, true)<br>    create_aws_s3_bucket_public_access_block = optional(bool, true)<br>    expiration_days                          = optional(number, 1)<br>    include_account_id                       = optional(bool, true)<br>    policy                                   = optional(string, "")<br>    random_suffix                            = optional(bool, false)<br>    shared                                   = optional(bool, false)<br>    versioning                               = optional(bool, false)<br>  })</pre> | `{}` | no |
 | <a name="input_runner_worker_docker_add_dind_volumes"></a> [runner\_worker\_docker\_add\_dind\_volumes](#input\_runner\_worker\_docker\_add\_dind\_volumes) | Add certificates and docker.sock to the volumes to support docker-in-docker (dind) | `bool` | `false` | no |
-| <a name="input_runner_worker_docker_autoscaler"></a> [runner\_worker\_docker\_autoscaler](#input\_runner\_worker\_docker\_autoscaler) | fleeting\_plugin\_version = The version of aws fleeting plugin<br>connector\_config\_user = User to connect to worker machine<br>key\_pair\_name = The name of the key pair used by the Runner to connect to the docker-machine Runner Workers. This variable is only supported when `enables` is set to `true`.<br>max\_use\_count = Max job number that can run on a worker<br>update\_interval = The interval to check with the fleeting plugin for instance updates.<br>update\_interval\_when\_expecting = The interval to check with the fleeting plugin for instance updates when expecting a state change. | <pre>object({<br>    fleeting_plugin_version        = optional(string, "1.0.0")<br>    connector_config_user          = optional(string, "ec2-user")<br>    key_pair_name                  = optional(string, "runner-worker-key")<br>    max_use_count                  = optional(number, 100)<br>    update_interval                = optional(string, "1m")<br>    update_interval_when_expecting = optional(string, "2s")<br>  })</pre> | `{}` | no |
+| <a name="input_runner_worker_docker_autoscaler"></a> [runner\_worker\_docker\_autoscaler](#input\_runner\_worker\_docker\_autoscaler) | fleeting\_plugin\_version = The version of aws fleeting plugin<br>connector\_config\_user = User to connect to worker machine<br>key\_pair\_name = The name of the key pair used by the Runner to connect to the docker-machine Runner Workers. This variable is only supported when `enables` is set to `true`.<br>capacity\_per\_instance = The number of jobs that can be executed concurrently by a single instance.<br>max\_use\_count = Max job number that can run on a worker<br>update\_interval = The interval to check with the fleeting plugin for instance updates.<br>update\_interval\_when\_expecting = The interval to check with the fleeting plugin for instance updates when expecting a state change. | <pre>object({<br>    fleeting_plugin_version        = optional(string, "1.0.0")<br>    connector_config_user          = optional(string, "ec2-user")<br>    key_pair_name                  = optional(string, "runner-worker-key")<br>    capacity_per_instance          = optional(number, 1)<br>    max_use_count                  = optional(number, 100)<br>    update_interval                = optional(string, "1m")<br>    update_interval_when_expecting = optional(string, "2s")<br>  })</pre> | `{}` | no |
 | <a name="input_runner_worker_docker_autoscaler_ami_filter"></a> [runner\_worker\_docker\_autoscaler\_ami\_filter](#input\_runner\_worker\_docker\_autoscaler\_ami\_filter) | List of maps used to create the AMI filter for the Runner Worker. | `map(list(string))` | <pre>{<br>  "name": [<br>    "ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"<br>  ]<br>}</pre> | no |
 | <a name="input_runner_worker_docker_autoscaler_ami_owners"></a> [runner\_worker\_docker\_autoscaler\_ami\_owners](#input\_runner\_worker\_docker\_autoscaler\_ami\_owners) | The list of owners used to select the AMI of the Runner Worker. | `list(string)` | <pre>[<br>  "099720109477"<br>]</pre> | no |
-| <a name="input_runner_worker_docker_autoscaler_asg"></a> [runner\_worker\_docker\_autoscaler\_asg](#input\_runner\_worker\_docker\_autoscaler\_asg) | enable\_mixed\_instances\_policy = Make use of autoscaling-group mixed\_instances\_policy capacities to leverage pools and spot instances.<br>health\_check\_grace\_period = Time (in seconds) after instance comes into service before checking health<br>health\_check\_type = Controls how health checking is done. Values are - EC2 and ELB<br>instance\_refresh\_min\_healthy\_percentage = The amount of capacity in the Auto Scaling group that must remain healthy during an instance refresh to allow the operation to continue, as a percentage of the desired capacity of the Auto Scaling group.<br>instance\_refresh\_triggers = Set of additional property names that will trigger an Instance Refresh. A refresh will always be triggered by a change in any of launch\_configuration, launch\_template, or mixed\_instances\_policy.<br>max\_growth\_rate = The maximum number of machines that can be added to the runner in parallel.<br>on\_demand\_base\_capacity = Absolute minimum amount of desired capacity that must be fulfilled by on-demand instances.<br>on\_demand\_percentage\_above\_base\_capacity = Percentage split between on-demand and Spot instances above the base on-demand capacity.<br>override\_instance\_types = List to override the instance type in the Launch Template. Allow to spread spot instances on several types, to reduce interruptions<br>profile\_name = profile\_name = Name of the IAM profile to attach to the Runner Workers.<br>sg\_ingresses = Extra security group rule for workers<br>spot\_allocation\_strategy = How to allocate capacity across the Spot pools. 'lowest-price' to optimize cost, 'capacity-optimized' to reduce interruptions<br>spot\_instance\_pools = Number of Spot pools per availability zone to allocate capacity. EC2 Auto Scaling selects the cheapest Spot pools and evenly allocates Spot capacity across the number of Spot pools that you specify.<br>subnet\_ids = The list of subnet IDs to use for the Runner Worker when the fleet mode is enabled.<br>types = The type of instance to use for the Runner Worker. In case of fleet mode, multiple instance types are supported.<br>upgrade\_strategy = Auto deploy new instances when launch template changes. Can be either 'bluegreen', 'rolling' or 'off' | <pre>object({<br>    enable_mixed_instances_policy            = optional(bool, false)<br>    health_check_grace_period                = optional(number, 300)<br>    health_check_type                        = optional(string, "EC2")<br>    instance_refresh_min_healthy_percentage  = optional(number, 90)<br>    instance_refresh_triggers                = optional(list(string), [])<br>    max_growth_rate                          = optional(number, 0)<br>    on_demand_base_capacity                  = optional(number, 0)<br>    on_demand_percentage_above_base_capacity = optional(number, 100)<br>    profile_name                             = optional(string, "")<br>    spot_allocation_strategy                 = optional(string, "lowest-price")<br>    spot_instance_pools                      = optional(number, 2)<br>    subnet_ids                               = optional(list(string), [])<br>    types                                    = optional(list(string), ["m5.large"])<br>    upgrade_strategy                         = optional(string, "rolling")<br>    sg_ingresses = optional(list(object({<br>      description = string<br>      from_port   = number<br>      to_port     = number<br>      protocol    = string<br>      cidr_blocks = list(string)<br>    })), [])<br>  })</pre> | `{}` | no |
+| <a name="input_runner_worker_docker_autoscaler_asg"></a> [runner\_worker\_docker\_autoscaler\_asg](#input\_runner\_worker\_docker\_autoscaler\_asg) | enable\_mixed\_instances\_policy = Make use of autoscaling-group mixed\_instances\_policy capacities to leverage pools and spot instances.<br>health\_check\_grace\_period = Time (in seconds) after instance comes into service before checking health<br>health\_check\_type = Controls how health checking is done. Values are - EC2 and ELB<br>instance\_refresh\_min\_healthy\_percentage = The amount of capacity in the Auto Scaling group that must remain healthy during an instance refresh to allow the operation to continue, as a percentage of the desired capacity of the Auto Scaling group.<br>instance\_refresh\_triggers = Set of additional property names that will trigger an Instance Refresh. A refresh will always be triggered by a change in any of launch\_configuration, launch\_template, or mixed\_instances\_policy.<br>max\_growth\_rate = The maximum number of machines that can be added to the runner in parallel.<br>on\_demand\_base\_capacity = Absolute minimum amount of desired capacity that must be fulfilled by on-demand instances.<br>on\_demand\_percentage\_above\_base\_capacity = Percentage split between on-demand and Spot instances above the base on-demand capacity.<br>override\_instance\_types = List to override the instance type in the Launch Template. Allow to spread spot instances on several types, to reduce interruptions<br>profile\_name = profile\_name = Name of the IAM profile to attach to the Runner Workers.<br>sg\_ingresses = Extra security group rule for workers<br>spot\_allocation\_strategy = How to allocate capacity across the Spot pools. 'lowest-price' to optimize cost, 'capacity-optimized' to reduce interruptions<br>spot\_instance\_pools = Number of Spot pools per availability zone to allocate capacity. EC2 Auto Scaling selects the cheapest Spot pools and evenly allocates Spot capacity across the number of Spot pools that you specify.<br>subnet\_ids = The list of subnet IDs to use for the Runner Worker when the fleet mode is enabled.<br>types = The type of instance to use for the Runner Worker. In case of fleet mode, multiple instance types are supported.<br>upgrade\_strategy = Auto deploy new instances when launch template changes. Can be either 'bluegreen', 'rolling' or 'off'<br>enabled\_metrics = List of metrics to collect. | <pre>object({<br>    enable_mixed_instances_policy            = optional(bool, false)<br>    health_check_grace_period                = optional(number, 300)<br>    health_check_type                        = optional(string, "EC2")<br>    instance_refresh_min_healthy_percentage  = optional(number, 90)<br>    instance_refresh_triggers                = optional(list(string), [])<br>    max_growth_rate                          = optional(number, 0)<br>    on_demand_base_capacity                  = optional(number, 0)<br>    on_demand_percentage_above_base_capacity = optional(number, 100)<br>    profile_name                             = optional(string, "")<br>    spot_allocation_strategy                 = optional(string, "lowest-price")<br>    spot_instance_pools                      = optional(number, 2)<br>    subnet_ids                               = optional(list(string), [])<br>    types                                    = optional(list(string), ["m5.large"])<br>    upgrade_strategy                         = optional(string, "rolling")<br>    enabled_metrics                          = optional(list(string), [])<br>    sg_ingresses = optional(list(object({<br>      description = string<br>      from_port   = number<br>      to_port     = number<br>      protocol    = string<br>      cidr_blocks = list(string)<br>    })), [])<br>  })</pre> | `{}` | no |
 | <a name="input_runner_worker_docker_autoscaler_autoscaling_options"></a> [runner\_worker\_docker\_autoscaler\_autoscaling\_options](#input\_runner\_worker\_docker\_autoscaler\_autoscaling\_options) | Set autoscaling parameters based on periods, see https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runnersautoscalerpolicy-sections | <pre>list(object({<br>    periods            = list(string)<br>    timezone           = optional(string, "UTC")<br>    idle_count         = optional(number)<br>    idle_time          = optional(string)<br>    scale_factor       = optional(number)<br>    scale_factor_limit = optional(number, 0)<br>  }))</pre> | `[]` | no |
 | <a name="input_runner_worker_docker_autoscaler_instance"></a> [runner\_worker\_docker\_autoscaler\_instance](#input\_runner\_worker\_docker\_autoscaler\_instance) | ebs\_optimized = Enable EBS optimization for the Runner Worker.<br>http\_tokens = Whether or not the metadata service requires session tokens<br>http\_put\_response\_hop\_limit = The desired HTTP PUT response hop limit for instance metadata requests. The larger the number, the further instance metadata requests can travel.<br>monitoring = Enable detailed monitoring for the Runner Worker.<br>private\_address\_only = Restrict Runner Worker to the use of a private IP address. If `runner_instance.use_private_address_only` is set to `true` (default),<br>root\_device\_name = The name of the root volume for the Runner Worker.<br>root\_size = The size of the root volume for the Runner Worker.<br>start\_script = Cloud-init user data that will be passed to the Runner Worker. Should not be base64 encrypted.<br>volume\_type = The type of volume to use for the Runner Worker. `gp2`, `gp3`, `io1` or `io2` are supported<br>volume\_iops = Guaranteed IOPS for the volume. Only supported when using `gp3`, `io1` or `io2` as `volume_type`.<br>volume\_throughput = Throughput in MB/s for the volume. Only supported when using `gp3` as `volume_type`. | <pre>object({<br>    ebs_optimized               = optional(bool, true)<br>    http_tokens                 = optional(string, "required")<br>    http_put_response_hop_limit = optional(number, 2)<br>    monitoring                  = optional(bool, false)<br>    private_address_only        = optional(bool, true)<br>    root_device_name            = optional(string, "/dev/sda1")<br>    root_size                   = optional(number, 8)<br>    start_script                = optional(string, "")<br>    volume_type                 = optional(string, "gp2")<br>    volume_throughput           = optional(number, 125)<br>    volume_iops                 = optional(number, 3000)<br>  })</pre> | `{}` | no |
 | <a name="input_runner_worker_docker_autoscaler_role"></a> [runner\_worker\_docker\_autoscaler\_role](#input\_runner\_worker\_docker\_autoscaler\_role) | additional\_tags = Map of tags that will be added to the Runner Worker.<br>assume\_role\_policy\_json = Assume role policy for the Runner Worker.<br>policy\_arns = List of ARNs of IAM policies to attach to the Runner Workers.<br>profile\_name    = Name of the IAM profile to attach to the Runner Workers. | <pre>object({<br>    additional_tags         = optional(map(string), {})<br>    assume_role_policy_json = optional(string, "")<br>    policy_arns             = optional(list(string), [])<br>    profile_name            = optional(string, "")<br>  })</pre> | `{}` | no |

diff --git a/main.tf b/main.tf
@@ -124,7 +124,7 @@ locals {
     {
       docker_autoscaling_name       = var.runner_worker.type == "docker-autoscaler" ? aws_autoscaling_group.autoscaler[0].name : ""
       connector_config_user         = var.runner_worker_docker_autoscaler.connector_config_user
-      runners_capacity_per_instance = 1
+      runners_capacity_per_instance = var.runner_worker_docker_autoscaler.capacity_per_instance
       runners_max_use_count         = var.runner_worker_docker_autoscaler.max_use_count
       runners_max_instances         = var.runner_worker.max_jobs