diff --git a/README.md b/README.md index 607a690f..a217e8bc 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ You can also completely remove `ethereum_package` from your configuration in whi Kurtosis packages are parameterizable, meaning you can customize your network and its behavior to suit your needs by storing parameters in a file that you can pass in at runtime like so: -```bash +```shell kurtosis run github.com/ethpandaops/optimism-package --args-file https://raw.githubusercontent.com/ethpandaops/optimism-package/main/network_params.yaml ``` @@ -47,7 +47,7 @@ For `--args-file` parameters file, you can pass a local file path or a URL to a To clean up running enclaves and data, you can run: -```bash +```shell kurtosis clean -a ``` @@ -57,7 +57,7 @@ This will stop and remove all running enclaves and **delete all data**. If you are attempting to test any changes to the package code, you can point to the directory as the `run` argument -```bash +```shell cd ~/go/src/github.com/ethpandaops/optimism-package kurtosis run . --args-file ./network_params.yaml ``` @@ -77,8 +77,10 @@ The full YAML schema that can be passed in is as follows with the defaults provi optimism_package: # Observability configuration observability: - # Whether or not to configure observability (e.g. prometheus) + # Whether to provision an observability stack (prometheus, loki, promtail, grafana) enabled: true + # Whether to enable features exclusive to the K8s backend (ie log collection) + enable_k8s_features: false # Default prometheus configuration prometheus_params: storage_tsdb_retention_time: "1d" @@ -91,15 +93,34 @@ optimism_package: min_mem: 128 max_mem: 2048 # Prometheus docker image to use - # Defaults to the latest image - image: "prom/prometheus:latest" + image: "prom/prometheus:v3.1.0" + # Default loki configuration + loki_params: + # Loki docker image to use + image: "grafana/loki:3.3.2" + # Resource management for loki container + # CPU is milicores + # RAM is in MB + min_cpu: 10 + max_cpu: 1000 + min_mem: 128 + max_mem: 2048 + # Default promtail configuration + promtail_params: + # Promtail docker image to use + image: "grafana/promtail:3.3.2" + # Resource management for promtail container + # CPU is milicores + # RAM is in MB + min_cpu: 10 + max_cpu: 1000 + min_mem: 128 + max_mem: 2048 # Default grafana configuration grafana_params: - # A list of locators for grafana dashboards to be loaded by the grafana service. - # Each locator should be a URL to a directory containing a /folders and a /dashboards directory. - # Those will be uploaded to the grafana service by using grizzly. - # See https://github.com/ethereum-optimism/grafana-dashboards-public for more info. + # A list of locators for grafana dashboards to be loaded be the grafana service dashboard_sources: + # Default public Optimism dashboards - github.com/ethereum-optimism/grafana-dashboards-public/resources # Resource management for grafana container # CPU is milicores @@ -109,8 +130,7 @@ optimism_package: min_mem: 128 max_mem: 2048 # Grafana docker image to use - # Defaults to the latest image - image: "grafana/grafana:latest" + image: "grafana/grafana:11.5.0" # Interop configuration interop: # Whether or not to enable interop mode @@ -623,7 +643,7 @@ optimism_package: Compile [tx-fuzz](https://github.com/MariusVanDerWijden/tx-fuzz) locally per instructions in the repo. Run tx-fuzz against the l2 EL client's RPC URL and using the pre-funded wallet: -```bash +```shell ./livefuzzer spam --sk "0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80" --rpc http://127.0.0.1: --slot-time 2 ``` @@ -666,21 +686,21 @@ Please find examples of additional configurations in the [test folder](.github/t - List information about running containers and open ports -```bash +```shell kurtosis enclave ls kurtosis enclave inspect ``` - Inspect chain state. -```bash +```shell kurtosis files inspect op-deployer-configs ``` - Dump all files generated by kurtosis to disk (for inspecting chain state/deploy configs/contract addresses etc.). A file that contains an exhaustive set of information about the current deployment is `files/op-deployer-configs/state.json`. Deployed contract address, roles etc can all be found here. -```bash +```shell # dumps all files to a enclave-name prefixed directory under the current directory kurtosis enclave dump kurtosis files download op-deployer-configs @@ -688,17 +708,63 @@ kurtosis files download op-deployer-configs - Get logs for running services -```bash +```shell kurtosis service logs -f . # -f tails the log ``` - Stop/Start running service (restart sequencer/batcher/op-geth etc.) -```bash +```shell kurtosis service stop kurtosis service start ``` +## Observability + +This package optionally provisions an in-enclave observability stack consiisting of Grafana, prometheus, promtail, and loki, which collects logs and metrics from the enclave. + +This feature is enabled by default, but can be disabled like so: + +```yaml +optimism_package: + observability: + enabled: false +``` + +You can provide custom dashboard sources to have Grafana pre-populated with your preferred dashboards. Each source should be a URL to a Github repository directory containing at minimum a `dashboards` directory: + +```yaml +optimism_package: + observability: + grafana_params: + dashboard_sources: + - github.com/// +``` + +See [grafana-dashboards-public](https://github.com/ethereum-optimism/grafana-dashboards-public) for more info. + +To access the Grafana UI, you can use the following command after starting the enclave: + +```shell +just open-grafana +``` + +### Logs + +Note that due to `kurtosis` limitations, log collection is not enabled by default, and is only supported for the Kubernetes backend. To enable log collection, you must set the following parameter: + +```yaml +optimism_package: + observability: + enable_k8s_features: true +``` + +Note that since `kurtosis` runs pods using the namespace's default `ServiceAccount`, which is not typically able to modify cluster-level resources, such as `ClusterRoles`, as the `promtail` Helm chart requires, you must also install the `ns-authz` Helm chart to the Kubernetes cluster serving as the `kurtosis` backend using the following command: + +```shell +just install-ns-authz +``` + ## Development ### Development environment @@ -723,7 +789,7 @@ mise install If you have made changes and would like to submit a PR, test locally and make sure to run `lint` on your changes -```bash +```shell kurtosis lint --format . ``` @@ -733,7 +799,7 @@ kurtosis lint --format . We are using [`kurtosis-test`](https://github.com/ethereum-optimism/kurtosis-test) to run a set of unit tests against the starlark code: -```bash +```shell # To run all unit tests kurtosis-test . ``` diff --git a/justfile b/justfile new file mode 100644 index 00000000..6d71b870 --- /dev/null +++ b/justfile @@ -0,0 +1,16 @@ +install-ns-authz: + helm install ns-authz util/ns-authz --namespace kube-system + +uninstall-ns-authz: + helm uninstall ns-authz --namespace kube-system + +get-service-url enclaveName serviceName portId: + kurtosis service inspect {{enclaveName}} {{serviceName}} | tail -n +2 | yq e - -o=json |\ + jq -r --arg portId {{portId}} '.Ports[$portId]' | sed 's/.*-> //' + +open-service enclaveName serviceName: + open "$(just get-service-url {{enclaveName}} {{serviceName}} http)" + +open-grafana enclaveName: + just open-service {{enclaveName}} grafana + diff --git a/main.star b/main.star index c89d1a4c..1ce050e7 100644 --- a/main.star +++ b/main.star @@ -9,8 +9,6 @@ op_challenger_launcher = import_module( ) observability = import_module("./src/observability/observability.star") -prometheus = import_module("./src/observability/prometheus/prometheus_launcher.star") -grafana = import_module("./src/observability/grafana/grafana_launcher.star") wait_for_sync = import_module("./src/wait/wait_for_sync.star") input_parser = import_module("./src/package_io/input_parser.star") @@ -163,21 +161,9 @@ def run(plan, args): observability_helper, ) - if observability_helper.enabled and len(observability_helper.metrics_jobs) > 0: - plan.print("Launching prometheus...") - prometheus_private_url = prometheus.launch_prometheus( - plan, - observability_helper, - global_node_selectors, - ) - - plan.print("Launching grafana...") - grafana.launch_grafana( - plan, - prometheus_private_url, - global_node_selectors, - observability_params.grafana_params, - ) + observability.launch( + plan, observability_helper, global_node_selectors, observability_params + ) def get_l1_config(all_l1_participants, l1_network_params, l1_network_id): diff --git a/mise.toml b/mise.toml index 38a8bdca..3239ded0 100644 --- a/mise.toml +++ b/mise.toml @@ -3,3 +3,4 @@ # Core dependencies "ubi:ethereum-optimism/kurtosis-test" = "0.0.1" "ubi:kurtosis-tech/kurtosis-cli-release-artifacts[exe=kurtosis]" = "1.4.4" +"yq" = "v4.44.3" diff --git a/src/batcher/op-batcher/op_batcher_launcher.star b/src/batcher/op-batcher/op_batcher_launcher.star index feda7245..dece33e4 100644 --- a/src/batcher/op-batcher/op_batcher_launcher.star +++ b/src/batcher/op-batcher/op_batcher_launcher.star @@ -6,6 +6,9 @@ ethereum_package_constants = import_module( "github.com/ethpandaops/ethereum-package/src/package_io/constants.star" ) +constants = import_module("../../package_io/constants.star") +util = import_module("../../util.star") + observability = import_module("../../observability/observability.star") prometheus = import_module("../../observability/prometheus/prometheus_launcher.star") @@ -14,16 +17,13 @@ prometheus = import_module("../../observability/prometheus/prometheus_launcher.s # The Docker container runs as the "op-batcher" user so we can't write to root BATCHER_DATA_DIRPATH_ON_SERVICE_CONTAINER = "/data/op-batcher/op-batcher-data" -# Port IDs -BATCHER_HTTP_PORT_ID = "http" - # Port nums BATCHER_HTTP_PORT_NUM = 8548 def get_used_ports(): used_ports = { - BATCHER_HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( BATCHER_HTTP_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, @@ -47,8 +47,6 @@ def launch( observability_helper, da_server_context, ): - batcher_service_name = "{0}".format(service_name) - config = get_batcher_config( plan, image, @@ -62,16 +60,12 @@ def launch( da_server_context, ) - batcher_service = plan.add_service(service_name, config) - - batcher_http_port = batcher_service.ports[BATCHER_HTTP_PORT_ID] - batcher_http_url = "http://{0}:{1}".format( - batcher_service.ip_address, batcher_http_port.number - ) + service = plan.add_service(service_name, config) + service_url = util.make_service_http_url(service) - observability.register_op_service_metrics_job(observability_helper, batcher_service) + observability.register_op_service_metrics_job(observability_helper, service) - return "op_batcher" + return service_url def get_batcher_config( diff --git a/src/cl/hildr/hildr_launcher.star b/src/cl/hildr/hildr_launcher.star index 48f5aaa2..972891a4 100644 --- a/src/cl/hildr/hildr_launcher.star +++ b/src/cl/hildr/hildr_launcher.star @@ -26,7 +26,6 @@ BEACON_DATA_DIRPATH_ON_SERVICE_CONTAINER = "/data/hildr/hildr-beacon-data" # Port IDs BEACON_TCP_DISCOVERY_PORT_ID = "tcp-discovery" BEACON_UDP_DISCOVERY_PORT_ID = "udp-discovery" -BEACON_HTTP_PORT_ID = "http" # Port nums BEACON_DISCOVERY_PORT_NUM = 9003 @@ -43,7 +42,7 @@ def get_used_ports(discovery_port): BEACON_UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.UDP_PROTOCOL, wait=None ), - BEACON_HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( BEACON_HTTP_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, @@ -84,7 +83,7 @@ def launch( # endpoint="/", # content_type="application/json", # body='{"jsonrpc":"2.0","method":"opp2p_self","params":[],"id":1}', - # port_id=BEACON_HTTP_PORT_ID, + # port_id=constants.HTTP_PORT_ID, # extract={ # "enr": ".result.ENR", # "multiaddr": ".result.addresses[0]", @@ -113,15 +112,11 @@ def launch( da_server_context, ) - beacon_service = plan.add_service(service_name, config) - - beacon_http_port = beacon_service.ports[BEACON_HTTP_PORT_ID] - beacon_http_url = "http://{0}:{1}".format( - beacon_service.ip_address, beacon_http_port.number - ) + service = plan.add_service(service_name, config) + service_url = util.make_service_http_url(service) metrics_info = observability.new_metrics_info( - observability_helper, beacon_service, METRICS_PATH + observability_helper, service, METRICS_PATH ) # response = plan.request( @@ -135,9 +130,9 @@ def launch( return ethereum_package_cl_context.new_cl_context( client_name="hildr", enr="", # beacon_node_enr, - ip_addr=beacon_service.ip_address, - http_port=beacon_http_port.number, - beacon_http_url=beacon_http_url, + ip_addr=service.ip_address, + http_port=util.get_service_http_port_num(service), + beacon_http_url=service_url, cl_nodes_metrics_info=[metrics_info], beacon_service_name=service_name, ) @@ -159,14 +154,8 @@ def get_beacon_config( observability_helper, da_server_context, ): - EXECUTION_ENGINE_ENDPOINT = "http://{0}:{1}".format( - el_context.ip_addr, - el_context.engine_rpc_port_num, - ) - EXECUTION_RPC_ENDPOINT = "http://{0}:{1}".format( - el_context.ip_addr, - el_context.rpc_port_num, - ) + EXECUTION_ENGINE_ENDPOINT = util.make_execution_engine_url(el_context) + EXECUTION_RPC_ENDPOINT = util.make_execution_rpc_url(el_context) ports = dict(get_used_ports(BEACON_DISCOVERY_PORT_NUM)) diff --git a/src/cl/op-node/op_node_launcher.star b/src/cl/op-node/op_node_launcher.star index 8ab6abf6..433f0472 100644 --- a/src/cl/op-node/op_node_launcher.star +++ b/src/cl/op-node/op_node_launcher.star @@ -15,7 +15,6 @@ ethereum_package_input_parser = import_module( ) constants = import_module("../../package_io/constants.star") - util = import_module("../../util.star") observability = import_module("../../observability/observability.star") interop_constants = import_module("../../interop/constants.star") @@ -24,10 +23,6 @@ interop_constants = import_module("../../interop/constants.star") # The Docker container runs as the "op-node" user so we can't write to root BEACON_DATA_DIRPATH_ON_SERVICE_CONTAINER = "/data/op-node/op-node-beacon-data" -# Port IDs -BEACON_TCP_DISCOVERY_PORT_ID = "tcp-discovery" -BEACON_UDP_DISCOVERY_PORT_ID = "udp-discovery" -BEACON_HTTP_PORT_ID = "http" # Port nums BEACON_DISCOVERY_PORT_NUM = 9003 @@ -36,13 +31,13 @@ BEACON_HTTP_PORT_NUM = 8547 def get_used_ports(discovery_port): used_ports = { - BEACON_TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.TCP_PROTOCOL, wait=None ), - BEACON_UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.UDP_PROTOCOL, wait=None ), - BEACON_HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( BEACON_HTTP_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, @@ -83,7 +78,7 @@ def launch( endpoint="/", content_type="application/json", body='{"jsonrpc":"2.0","method":"opp2p_self","params":[],"id":1}', - port_id=BEACON_HTTP_PORT_ID, + port_id=constants.HTTP_PORT_ID, extract={ "enr": ".result.ENR", "multiaddr": ".result.addresses[0]", @@ -114,14 +109,10 @@ def launch( da_server_context, ) - beacon_service = plan.add_service(service_name, config) - - beacon_http_port = beacon_service.ports[BEACON_HTTP_PORT_ID] - beacon_http_url = "http://{0}:{1}".format( - beacon_service.ip_address, beacon_http_port.number - ) + service = plan.add_service(service_name, config) + service_url = util.make_service_http_url(service) - metrics_info = observability.new_metrics_info(observability_helper, beacon_service) + metrics_info = observability.new_metrics_info(observability_helper, service) response = plan.request( recipe=beacon_node_identity_recipe, service_name=service_name @@ -134,9 +125,9 @@ def launch( return ethereum_package_cl_context.new_cl_context( client_name="op-node", enr=beacon_node_enr, - ip_addr=beacon_service.ip_address, - http_port=beacon_http_port.number, - beacon_http_url=beacon_http_url, + ip_addr=service.ip_address, + http_port=util.get_service_http_port_num(service), + beacon_http_url=service_url, cl_nodes_metrics_info=[metrics_info], beacon_service_name=service_name, multiaddr=beacon_multiaddr, @@ -164,10 +155,7 @@ def get_beacon_config( ): ports = dict(get_used_ports(BEACON_DISCOVERY_PORT_NUM)) - EXECUTION_ENGINE_ENDPOINT = "http://{0}:{1}".format( - el_context.ip_addr, - el_context.engine_rpc_port_num, - ) + EXECUTION_ENGINE_ENDPOINT = util.make_execution_engine_url(el_context) cmd = [ "op-node", diff --git a/src/el/op-besu/op_besu_launcher.star b/src/el/op-besu/op_besu_launcher.star index 16c914bb..5d1c74cb 100644 --- a/src/el/op-besu/op_besu_launcher.star +++ b/src/el/op-besu/op_besu_launcher.star @@ -23,8 +23,8 @@ ethereum_package_constants = import_module( ) constants = import_module("../../package_io/constants.star") -observability = import_module("../../observability/observability.star") util = import_module("../../util.star") +observability = import_module("../../observability/observability.star") RPC_PORT_NUM = 8545 WS_PORT_NUM = 8546 @@ -35,14 +35,6 @@ ENGINE_RPC_PORT_NUM = 8551 EXECUTION_MIN_CPU = 300 EXECUTION_MIN_MEMORY = 512 -# Port IDs -RPC_PORT_ID = "rpc" -WS_PORT_ID = "ws" -TCP_DISCOVERY_PORT_ID = "tcp-discovery" -UDP_DISCOVERY_PORT_ID = "udp-discovery" -ENGINE_RPC_PORT_ID = "engine-rpc" -ENGINE_WS_PORT_ID = "engineWs" - # TODO(old) Scale this dynamically based on CPUs available and Geth nodes mining NUM_MINING_THREADS = 1 @@ -52,21 +44,21 @@ EXECUTION_DATA_DIRPATH_ON_CLIENT_CONTAINER = "/data/besu/execution-data" def get_used_ports(discovery_port=DISCOVERY_PORT_NUM): used_ports = { - RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, ), - WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( WS_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL ), - TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.TCP_PROTOCOL ), - UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.UDP_PROTOCOL ), - ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( ENGINE_RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ), @@ -126,13 +118,12 @@ def launch( ) service = plan.add_service(service_name, config) + http_url = util.make_service_http_url(service, constants.RPC_PORT_ID) enode = ethereum_package_el_admin_node_info.get_enode_for_node( - plan, service_name, RPC_PORT_ID + plan, service_name, constants.RPC_PORT_ID ) - http_url = "http://{0}:{1}".format(service.ip_address, RPC_PORT_NUM) - metrics_info = observability.new_metrics_info(observability_helper, service) return ethereum_package_el_context.new_el_context( diff --git a/src/el/op-erigon/op_erigon_launcher.star b/src/el/op-erigon/op_erigon_launcher.star index 26f8abc7..8b8295e2 100644 --- a/src/el/op-erigon/op_erigon_launcher.star +++ b/src/el/op-erigon/op_erigon_launcher.star @@ -21,8 +21,8 @@ ethereum_package_constants = import_module( ) constants = import_module("../../package_io/constants.star") -observability = import_module("../../observability/observability.star") util = import_module("../../util.star") +observability = import_module("../../observability/observability.star") RPC_PORT_NUM = 8545 WS_PORT_NUM = 8546 @@ -33,35 +33,27 @@ ENGINE_RPC_PORT_NUM = 8551 EXECUTION_MIN_CPU = 300 EXECUTION_MIN_MEMORY = 512 -# Port IDs -RPC_PORT_ID = "rpc" -WS_PORT_ID = "ws" -TCP_DISCOVERY_PORT_ID = "tcp-discovery" -UDP_DISCOVERY_PORT_ID = "udp-discovery" -ENGINE_RPC_PORT_ID = "engine-rpc" -ENGINE_WS_PORT_ID = "engineWs" - # The dirpath of the execution data directory on the client container EXECUTION_DATA_DIRPATH_ON_CLIENT_CONTAINER = "/data/op-erigon/execution-data" def get_used_ports(discovery_port=DISCOVERY_PORT_NUM): used_ports = { - RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, ), - WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( WS_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL ), - TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.TCP_PROTOCOL ), - UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.UDP_PROTOCOL ), - ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( ENGINE_RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ), @@ -118,13 +110,12 @@ def launch( ) service = plan.add_service(service_name, config) + http_url = util.make_service_http_url(service, constants.RPC_PORT_ID) enode, enr = ethereum_package_el_admin_node_info.get_enode_enr_for_node( - plan, service_name, RPC_PORT_ID + plan, service_name, constants.RPC_PORT_ID ) - http_url = "http://{0}:{1}".format(service.ip_address, RPC_PORT_NUM) - metrics_info = observability.new_metrics_info(observability_helper, service) return ethereum_package_el_context.new_el_context( diff --git a/src/el/op-geth/op_geth_launcher.star b/src/el/op-geth/op_geth_launcher.star index f45209cb..900f5e9c 100644 --- a/src/el/op-geth/op_geth_launcher.star +++ b/src/el/op-geth/op_geth_launcher.star @@ -22,9 +22,10 @@ ethereum_package_constants = import_module( ) constants = import_module("../../package_io/constants.star") +util = import_module("../../util.star") + observability = import_module("../../observability/observability.star") interop_constants = import_module("../../interop/constants.star") -util = import_module("../../util.star") RPC_PORT_NUM = 8545 WS_PORT_NUM = 8546 @@ -35,15 +36,6 @@ ENGINE_RPC_PORT_NUM = 8551 EXECUTION_MIN_CPU = 300 EXECUTION_MIN_MEMORY = 512 -# Port IDs -RPC_PORT_ID = "rpc" -WS_PORT_ID = "ws" -TCP_DISCOVERY_PORT_ID = "tcp-discovery" -UDP_DISCOVERY_PORT_ID = "udp-discovery" -ENGINE_RPC_PORT_ID = "engine-rpc" -ENGINE_WS_PORT_ID = "engineWs" - - # TODO(old) Scale this dynamically based on CPUs available and Geth nodes mining NUM_MINING_THREADS = 1 @@ -53,21 +45,21 @@ EXECUTION_DATA_DIRPATH_ON_CLIENT_CONTAINER = "/data/geth/execution-data" def get_used_ports(discovery_port=DISCOVERY_PORT_NUM): used_ports = { - RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, ), - WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( WS_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL ), - TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.TCP_PROTOCOL ), - UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.UDP_PROTOCOL ), - ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( ENGINE_RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ), @@ -128,13 +120,12 @@ def launch( ) service = plan.add_service(service_name, config) + http_url = util.make_service_http_url(service, constants.RPC_PORT_ID) enode, enr = ethereum_package_el_admin_node_info.get_enode_enr_for_node( - plan, service_name, RPC_PORT_ID + plan, service_name, constants.RPC_PORT_ID ) - http_url = "http://{0}:{1}".format(service.ip_address, RPC_PORT_NUM) - metrics_info = observability.new_metrics_info(observability_helper, service) return ethereum_package_el_context.new_el_context( diff --git a/src/el/op-nethermind/op_nethermind_launcher.star b/src/el/op-nethermind/op_nethermind_launcher.star index d2c4bd07..d3f4587d 100644 --- a/src/el/op-nethermind/op_nethermind_launcher.star +++ b/src/el/op-nethermind/op_nethermind_launcher.star @@ -22,8 +22,8 @@ ethereum_package_constants = import_module( ) constants = import_module("../../package_io/constants.star") -observability = import_module("../../observability/observability.star") util = import_module("../../util.star") +observability = import_module("../../observability/observability.star") RPC_PORT_NUM = 8545 WS_PORT_NUM = 8546 @@ -34,14 +34,6 @@ ENGINE_RPC_PORT_NUM = 8551 EXECUTION_MIN_CPU = 300 EXECUTION_MIN_MEMORY = 512 -# Port IDs -RPC_PORT_ID = "rpc" -WS_PORT_ID = "ws" -TCP_DISCOVERY_PORT_ID = "tcp-discovery" -UDP_DISCOVERY_PORT_ID = "udp-discovery" -ENGINE_RPC_PORT_ID = "engine-rpc" -ENGINE_WS_PORT_ID = "engineWs" - # TODO(old) Scale this dynamically based on CPUs available and Nethermind nodes mining NUM_MINING_THREADS = 1 @@ -51,21 +43,21 @@ EXECUTION_DATA_DIRPATH_ON_CLIENT_CONTAINER = "/data/nethermind/execution-data" def get_used_ports(discovery_port=DISCOVERY_PORT_NUM): used_ports = { - RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, ), - WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( WS_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL ), - TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.TCP_PROTOCOL ), - UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.UDP_PROTOCOL ), - ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( ENGINE_RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ), @@ -120,14 +112,13 @@ def launch( ) service = plan.add_service(service_name, config) + http_url = util.make_service_http_url(service, constants.RPC_PORT_ID) + ws_url = util.make_service_ws_url(service) enode = ethereum_package_el_admin_node_info.get_enode_for_node( - plan, service_name, RPC_PORT_ID + plan, service_name, constants.RPC_PORT_ID ) - http_url = "http://{0}:{1}".format(service.ip_address, RPC_PORT_NUM) - ws_url = "ws://{0}:{1}".format(service.ip_address, WS_PORT_NUM) - metrics_info = observability.new_metrics_info(observability_helper, service) return ethereum_package_el_context.new_el_context( diff --git a/src/el/op-reth/op_reth_launcher.star b/src/el/op-reth/op_reth_launcher.star index 9a716093..3a535596 100644 --- a/src/el/op-reth/op_reth_launcher.star +++ b/src/el/op-reth/op_reth_launcher.star @@ -21,8 +21,8 @@ ethereum_package_input_parser = import_module( ) constants = import_module("../../package_io/constants.star") -observability = import_module("../../observability/observability.star") util = import_module("../../util.star") +observability = import_module("../../observability/observability.star") RPC_PORT_NUM = 8545 WS_PORT_NUM = 8546 @@ -33,13 +33,6 @@ ENGINE_RPC_PORT_NUM = 9551 EXECUTION_MIN_CPU = 100 EXECUTION_MIN_MEMORY = 256 -# Port IDs -RPC_PORT_ID = "rpc" -WS_PORT_ID = "ws" -TCP_DISCOVERY_PORT_ID = "tcp-discovery" -UDP_DISCOVERY_PORT_ID = "udp-discovery" -ENGINE_RPC_PORT_ID = "engine-rpc" - # Paths METRICS_PATH = "/metrics" @@ -49,21 +42,21 @@ EXECUTION_DATA_DIRPATH_ON_CLIENT_CONTAINER = "/data/op-reth/execution-data" def get_used_ports(discovery_port=DISCOVERY_PORT_NUM): used_ports = { - RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, ), - WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.WS_PORT_ID: ethereum_package_shared_utils.new_port_spec( WS_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL ), - TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.TCP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.TCP_PROTOCOL ), - UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.UDP_DISCOVERY_PORT_ID: ethereum_package_shared_utils.new_port_spec( discovery_port, ethereum_package_shared_utils.UDP_PROTOCOL ), - ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.ENGINE_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( ENGINE_RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL ), } @@ -117,13 +110,12 @@ def launch( ) service = plan.add_service(service_name, config) + http_url = util.make_service_http_url(service, constants.RPC_PORT_ID) enode = ethereum_package_el_admin_node_info.get_enode_for_node( - plan, service_name, RPC_PORT_ID + plan, service_name, constants.RPC_PORT_ID ) - http_url = "http://{0}:{1}".format(service.ip_address, RPC_PORT_NUM) - metrics_info = observability.new_metrics_info( observability_helper, service, METRICS_PATH ) diff --git a/src/interop/constants.star b/src/interop/constants.star index a48ac7c3..023b686b 100644 --- a/src/interop/constants.star +++ b/src/interop/constants.star @@ -1,10 +1,12 @@ +constants = import_module("../package_io/constants.star") +util = import_module("../util.star") + INTEROP_WS_PORT_ID = "interop-ws" INTEROP_WS_PORT_NUM = 9645 SUPERVISOR_SERVICE_NAME = "op-supervisor" -SUPERVISOR_RPC_PORT_ID = "rpc" SUPERVISOR_RPC_PORT_NUM = 8545 -SUPERVISOR_ENDPOINT = "http://{0}:{1}".format( +SUPERVISOR_ENDPOINT = util.make_http_url( SUPERVISOR_SERVICE_NAME, SUPERVISOR_RPC_PORT_NUM ) diff --git a/src/interop/op-supervisor/op_supervisor_launcher.star b/src/interop/op-supervisor/op_supervisor_launcher.star index 9c72160b..53fe155f 100644 --- a/src/interop/op-supervisor/op_supervisor_launcher.star +++ b/src/interop/op-supervisor/op_supervisor_launcher.star @@ -8,6 +8,7 @@ ethereum_package_constants = import_module( "github.com/ethpandaops/ethereum-package/src/package_io/constants.star" ) +constants = import_module("../../package_io/constants.star") observability = import_module("../../observability/observability.star") prometheus = import_module("../../observability/prometheus/prometheus_launcher.star") @@ -16,7 +17,7 @@ interop_constants = import_module("../constants.star") def get_used_ports(): used_ports = { - interop_constants.SUPERVISOR_RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.RPC_PORT_ID: ethereum_package_shared_utils.new_port_spec( interop_constants.SUPERVISOR_RPC_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, diff --git a/src/observability/grafana/grafana_launcher.star b/src/observability/grafana/grafana_launcher.star index 01023576..07a3ca2d 100644 --- a/src/observability/grafana/grafana_launcher.star +++ b/src/observability/grafana/grafana_launcher.star @@ -1,3 +1,4 @@ +constants = import_module("../../package_io/constants.star") util = import_module("../../util.star") ethereum_package_shared_utils = import_module( @@ -5,20 +6,17 @@ ethereum_package_shared_utils = import_module( ) SERVICE_NAME = "grafana" - -HTTP_PORT_ID = "http" HTTP_PORT_NUMBER_UINT16 = 3000 TEMPLATES_FILEPATH = "./templates" -DATASOURCE_UID = "grafanacloud-prom" DATASOURCE_CONFIG_TEMPLATE_FILEPATH = TEMPLATES_FILEPATH + "/datasource.yml.tmpl" DATASOURCE_CONFIG_REL_FILEPATH = "datasources/datasource.yml" CONFIG_DIRPATH_ON_SERVICE = "/config" USED_PORTS = { - HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( + constants.HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( HTTP_PORT_NUMBER_UINT16, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, @@ -28,41 +26,42 @@ USED_PORTS = { def launch_grafana( plan, - prometheus_private_url, + prometheus_url, + loki_url, global_node_selectors, grafana_params, ): datasource_config_template = read_file(DATASOURCE_CONFIG_TEMPLATE_FILEPATH) - grafana_config_artifact_name = upload_grafana_config( + config_artifact_name = create_config_artifact( plan, datasource_config_template, - prometheus_private_url, + prometheus_url, + loki_url, ) config = get_config( - grafana_config_artifact_name, + config_artifact_name, global_node_selectors, grafana_params, ) service = plan.add_service(SERVICE_NAME, config) - service_url = "http://{0}:{1}".format( - service.ip_address, service.ports[HTTP_PORT_ID].number - ) + service_url = util.make_service_http_url(service) provision_dashboards(plan, service_url, grafana_params.dashboard_sources) return service_url -def upload_grafana_config( +def create_config_artifact( plan, datasource_config_template, - prometheus_private_url, + prometheus_url, + loki_url, ): - datasource_data = new_datasource_config_template_data(prometheus_private_url) + datasource_data = new_datasource_config_template_data(prometheus_url, loki_url) datasource_template_and_data = ethereum_package_shared_utils.new_template_and_data( datasource_config_template, datasource_data ) @@ -71,19 +70,24 @@ def upload_grafana_config( DATASOURCE_CONFIG_REL_FILEPATH: datasource_template_and_data, } - grafana_config_artifact_name = plan.render_templates( + config_artifact_name = plan.render_templates( template_and_data_by_rel_dest_filepath, name="grafana-config" ) - return grafana_config_artifact_name + return config_artifact_name -def new_datasource_config_template_data(prometheus_url): - return {"PrometheusUID": DATASOURCE_UID, "PrometheusURL": prometheus_url} +def new_datasource_config_template_data(prometheus_url, loki_url): + return { + "PrometheusUID": "grafanacloud-prom", + "PrometheusURL": prometheus_url, + "LokiUID": "grafanacloud-logs", + "LokiURL": loki_url, + } def get_config( - grafana_config_artifact_name, + config_artifact_name, node_selectors, grafana_params, ): @@ -95,10 +99,9 @@ def get_config( "GF_AUTH_ANONYMOUS_ENABLED": "true", "GF_AUTH_ANONYMOUS_ORG_ROLE": "Admin", "GF_AUTH_ANONYMOUS_ORG_NAME": "Main Org.", - # "GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH": "/dashboards/default.json", }, files={ - CONFIG_DIRPATH_ON_SERVICE: grafana_config_artifact_name, + CONFIG_DIRPATH_ON_SERVICE: config_artifact_name, }, min_cpu=grafana_params.min_cpu, max_cpu=grafana_params.max_cpu, @@ -138,6 +141,7 @@ def provision_dashboards(plan, service_url, dashboard_sources): plan.run_sh( description="upload dashboards", + # latest version, no tagged release yet image="grafana/grizzly:main-0b88d01", env_vars={ "GRAFANA_URL": service_url, diff --git a/src/observability/grafana/templates/datasource.yml.tmpl b/src/observability/grafana/templates/datasource.yml.tmpl index a37438b0..600334a7 100644 --- a/src/observability/grafana/templates/datasource.yml.tmpl +++ b/src/observability/grafana/templates/datasource.yml.tmpl @@ -10,3 +10,13 @@ datasources: basicAuth: false isDefault: true editable: true + {{ if .LokiURL }} + - name: Loki + type: loki + access: proxy + orgId: 1 + uid: {{ .LokiUID }} + url: {{ .LokiURL }} + basicAuth: false + editable: true + {{ end }} diff --git a/src/observability/loki/loki_launcher.star b/src/observability/loki/loki_launcher.star new file mode 100644 index 00000000..ed6c47d8 --- /dev/null +++ b/src/observability/loki/loki_launcher.star @@ -0,0 +1,104 @@ +constants = import_module("../../package_io/constants.star") +util = import_module("../../util.star") + +ethereum_package_shared_utils = import_module( + "github.com/ethpandaops/ethereum-package/src/shared_utils/shared_utils.star" +) + +SERVICE_NAME = "loki" +HTTP_PORT_NUMBER = 3100 +GRPC_PORT_NUMBER = 9096 + +TEMPLATES_FILEPATH = "./templates" + +CONFIG_TEMPLATE_FILEPATH = TEMPLATES_FILEPATH + "/loki-config.yaml.tmpl" +CONFIG_REL_FILEPATH = "loki-config.yaml" + +CONFIG_DIRPATH_ON_SERVICE = "/config" + +USED_PORTS = { + constants.HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( + HTTP_PORT_NUMBER, + ethereum_package_shared_utils.TCP_PROTOCOL, + ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, + ), + "grpc": ethereum_package_shared_utils.new_port_spec( + GRPC_PORT_NUMBER, + ethereum_package_shared_utils.TCP_PROTOCOL, + "grpc", + ), +} + + +def launch_loki( + plan, + global_node_selectors, + loki_params, +): + config_template = read_file(CONFIG_TEMPLATE_FILEPATH) + + config_artifact_name = create_config_artifact( + plan, + config_template, + ) + + service_config = get_service_config( + config_artifact_name, + global_node_selectors, + loki_params, + ) + + service = plan.add_service(SERVICE_NAME, service_config) + + service_url = util.make_service_http_url(service) + + return service_url + + +def create_config_artifact( + plan, + config_template, +): + config_data = { + "Ports": { + "http": HTTP_PORT_NUMBER, + "grpc": GRPC_PORT_NUMBER, + }, + } + config_template_and_data = ethereum_package_shared_utils.new_template_and_data( + config_template, config_data + ) + + template_and_data_by_rel_dest_filepath = { + CONFIG_REL_FILEPATH: config_template_and_data, + } + + config_artifact_name = plan.render_templates( + template_and_data_by_rel_dest_filepath, name="loki-config" + ) + + return config_artifact_name + + +def get_service_config( + config_artifact_name, + node_selectors, + loki_params, +): + return ServiceConfig( + image=loki_params.image, + ports=USED_PORTS, + cmd=[ + "-config.file={0}/{1}".format( + CONFIG_DIRPATH_ON_SERVICE, CONFIG_REL_FILEPATH + ), + ], + files={ + CONFIG_DIRPATH_ON_SERVICE: config_artifact_name, + }, + min_cpu=loki_params.min_cpu, + max_cpu=loki_params.max_cpu, + min_memory=loki_params.min_mem, + max_memory=loki_params.max_mem, + node_selectors=node_selectors, + ) diff --git a/src/observability/loki/templates/loki-config.yaml.tmpl b/src/observability/loki/templates/loki-config.yaml.tmpl new file mode 100644 index 00000000..d350008a --- /dev/null +++ b/src/observability/loki/templates/loki-config.yaml.tmpl @@ -0,0 +1,42 @@ +auth_enabled: false + +server: + http_listen_port: {{ .Ports.http }} + grpc_listen_port: {{ .Ports.grpc }} + log_level: debug + grpc_server_max_concurrent_streams: 1000 + +common: + instance_addr: 127.0.0.1 + path_prefix: /loki + storage: + filesystem: + chunks_directory: /loki/chunks + rules_directory: /loki/rules + replication_factor: 1 + ring: + kvstore: + store: inmemory + +query_range: + results_cache: + cache: + embedded_cache: + enabled: true + max_size_mb: 100 + +limits_config: + metric_aggregation_enabled: true + +schema_config: + configs: + - from: 2020-10-24 + store: tsdb + object_store: filesystem + schema: v13 + index: + prefix: index_ + period: 24h + +analytics: + reporting_enabled: false diff --git a/src/observability/observability.star b/src/observability/observability.star index b7e6a45c..8f5ea9ac 100644 --- a/src/observability/observability.star +++ b/src/observability/observability.star @@ -6,6 +6,14 @@ ethereum_package_node_metrics = import_module( "github.com/ethpandaops/ethereum-package/src/node_metrics_info.star" ) +util = import_module("../util.star") + +prometheus = import_module("./prometheus/prometheus_launcher.star") +loki = import_module("./loki/loki_launcher.star") +promtail = import_module("./promtail/promtail_launcher.star") +grafana = import_module("./grafana/grafana_launcher.star") + + DEFAULT_SCRAPE_INTERVAL = "15s" METRICS_PORT_ID = "metrics" @@ -18,15 +26,11 @@ METRICS_INFO_PATH_KEY = "path" METRICS_INFO_ADDITIONAL_CONFIG_KEY = "config" -def make_metrics_url(service, metrics_port_num=METRICS_PORT_NUM): - return "{0}:{1}".format(service.ip_address, metrics_port_num) - - def new_metrics_info(helper, service, metrics_path=METRICS_PATH): if not helper.enabled: return None - metrics_url = make_metrics_url(service) + metrics_url = util.make_service_url_authority(service, METRICS_PORT_ID) metrics_info = ethereum_package_node_metrics.new_node_metrics_info( service.name, metrics_path, metrics_url ) @@ -83,7 +87,7 @@ def register_op_service_metrics_job(helper, service): register_service_metrics_job( helper, service_name=service.name, - endpoint=make_metrics_url(service), + endpoint=util.make_service_url_authority(service, METRICS_PORT_ID), ) @@ -145,3 +149,41 @@ def register_node_metrics_job( additional_labels=labels, scrape_interval=scrape_interval, ) + + +def launch(plan, observability_helper, global_node_selectors, observability_params): + if not observability_helper.enabled or len(observability_helper.metrics_jobs) == 0: + return + + plan.print("Launching prometheus...") + prometheus_private_url = prometheus.launch_prometheus( + plan, + observability_helper, + global_node_selectors, + ) + + loki_url = None + if observability_params.enable_k8s_features: + plan.print("Launching loki...") + loki_url = loki.launch_loki( + plan, + global_node_selectors, + observability_params.loki_params, + ) + + plan.print("Launching promtail...") + promtail.launch_promtail( + plan, + global_node_selectors, + loki_url, + observability_params.promtail_params, + ) + + plan.print("Launching grafana...") + grafana.launch_grafana( + plan, + prometheus_private_url, + loki_url, + global_node_selectors, + observability_params.grafana_params, + ) diff --git a/src/observability/promtail/promtail_launcher.star b/src/observability/promtail/promtail_launcher.star new file mode 100644 index 00000000..3a777749 --- /dev/null +++ b/src/observability/promtail/promtail_launcher.star @@ -0,0 +1,104 @@ +constants = import_module("../../package_io/constants.star") +util = import_module("../../util.star") + +ethereum_package_shared_utils = import_module( + "github.com/ethpandaops/ethereum-package/src/shared_utils/shared_utils.star" +) + +HTTP_PORT_NUMBER = 9080 +GRPC_PORT_NUMBER = 0 + +TEMPLATES_FILEPATH = "./templates" + +VALUES_FILE_NAME = "values.yaml" +VALUES_TEMPLATE_FILEPATH = "{0}/{1}.tmpl".format(TEMPLATES_FILEPATH, VALUES_FILE_NAME) + +K8S_NAMESPACE_FILE = "/var/run/secrets/kubernetes.io/serviceaccount/namespace" + + +def launch_promtail( + plan, + global_node_selectors, + loki_url, + promtail_params, +): + values_template = read_file(VALUES_TEMPLATE_FILEPATH) + + values_artifact_name = create_values_artifact( + plan, + values_template, + global_node_selectors, + loki_url, + ) + + install_helm_chart( + plan, + values_artifact_name, + "promtail", + "grafana", + "https://grafana.github.io/helm-charts", + override_name=True, + ) + + +def create_values_artifact( + plan, + values_template, + node_selectors, + loki_url, +): + config_data = { + "Ports": { + "http": HTTP_PORT_NUMBER, + "grpc": GRPC_PORT_NUMBER, + }, + "LokiURL": loki_url, + "NodeSelectors": node_selectors, + } + + values_template_and_data = ethereum_package_shared_utils.new_template_and_data( + values_template, config_data + ) + + values_artifact_name = plan.render_templates( + { + "/{0}".format(VALUES_FILE_NAME): values_template_and_data, + }, + name="promtail-config", + ) + + return values_artifact_name + + +def install_helm_chart( + plan, + values_artifact_name, + chart_name, + repo_name=None, + repo_url=None, + override_name=False, +): + cmds = [] + + if repo_name != None and repo_url != None: + cmds += [ + "helm repo add {0} {1}".format(repo_name, repo_url), + "helm repo update", + ] + + install_cmd = "helm upgrade --values /helm/{2} --install {1} {0}/{1}".format( + repo_name, chart_name, VALUES_FILE_NAME + ) + + if override_name: + install_cmd += " --set nameOverride=$(cat {0})".format(K8S_NAMESPACE_FILE) + + cmds.append(install_cmd) + + plan.run_sh( + image="alpine/helm", + files={ + "/helm": values_artifact_name, + }, + run=util.join_cmds(cmds), + ) diff --git a/src/observability/promtail/templates/promtail-config.yaml.tmpl b/src/observability/promtail/templates/promtail-config.yaml.tmpl new file mode 100644 index 00000000..12f498fc --- /dev/null +++ b/src/observability/promtail/templates/promtail-config.yaml.tmpl @@ -0,0 +1,18 @@ +server: + http_listen_port: {{ .Ports.http }} + grpc_listen_port: {{ .Ports.grpc }} + +positions: + filename: /tmp/positions.yaml + +clients: + - url: {{ .LokiURL }}/loki/api/v1/push + +scrape_configs: +- job_name: system + static_configs: + - targets: + - localhost + labels: + job: varlogs + __path__: /var/log/*log diff --git a/src/observability/promtail/templates/values.yaml.tmpl b/src/observability/promtail/templates/values.yaml.tmpl new file mode 100644 index 00000000..ff7aea20 --- /dev/null +++ b/src/observability/promtail/templates/values.yaml.tmpl @@ -0,0 +1,9 @@ +config: + serverPort: {{ .Ports.http }} + clients: + - url: {{ .LokiURL }}/loki/api/v1/push + +nodeSelector: + {{ range .NodeSelectors }} + - {{ .Key }}: {{ .Value }} + {{ end }} diff --git a/src/package_io/constants.star b/src/package_io/constants.star index 08fe50a8..a49b006b 100644 --- a/src/package_io/constants.star +++ b/src/package_io/constants.star @@ -1,3 +1,11 @@ +HTTP_PORT_ID = "http" +RPC_PORT_ID = "rpc" +WS_PORT_ID = "ws" +TCP_DISCOVERY_PORT_ID = "tcp-discovery" +UDP_DISCOVERY_PORT_ID = "udp-discovery" +ENGINE_RPC_PORT_ID = "engine-rpc" +ENGINE_WS_PORT_ID = "engineWs" + EL_TYPE = struct( op_geth="op-geth", op_erigon="op-erigon", diff --git a/src/package_io/input_parser.star b/src/package_io/input_parser.star index 02fb8913..47af141b 100644 --- a/src/package_io/input_parser.star +++ b/src/package_io/input_parser.star @@ -75,6 +75,7 @@ def input_parser(plan, input_args): return struct( observability=struct( enabled=results["observability"]["enabled"], + enable_k8s_features=results["observability"]["enable_k8s_features"], prometheus_params=struct( image=results["observability"]["prometheus_params"]["image"], storage_tsdb_retention_time=results["observability"][ @@ -88,6 +89,20 @@ def input_parser(plan, input_args): min_mem=results["observability"]["prometheus_params"]["min_mem"], max_mem=results["observability"]["prometheus_params"]["max_mem"], ), + loki_params=struct( + image=results["observability"]["loki_params"]["image"], + min_cpu=results["observability"]["loki_params"]["min_cpu"], + max_cpu=results["observability"]["loki_params"]["max_cpu"], + min_mem=results["observability"]["loki_params"]["min_mem"], + max_mem=results["observability"]["loki_params"]["max_mem"], + ), + promtail_params=struct( + image=results["observability"]["promtail_params"]["image"], + min_cpu=results["observability"]["promtail_params"]["min_cpu"], + max_cpu=results["observability"]["promtail_params"]["max_cpu"], + min_mem=results["observability"]["promtail_params"]["min_mem"], + max_mem=results["observability"]["promtail_params"]["max_mem"], + ), grafana_params=struct( image=results["observability"]["grafana_params"]["image"], dashboard_sources=results["observability"]["grafana_params"][ @@ -265,6 +280,16 @@ def parse_network_params(plan, input_args): input_args.get("observability", {}).get("prometheus_params", {}) ) + results["observability"]["loki_params"] = default_loki_params() + results["observability"]["loki_params"].update( + input_args.get("observability", {}).get("loki_params", {}) + ) + + results["observability"]["promtail_params"] = default_promtail_params() + results["observability"]["promtail_params"].update( + input_args.get("observability", {}).get("promtail_params", {}) + ) + results["observability"]["grafana_params"] = default_grafana_params() results["observability"]["grafana_params"].update( input_args.get("observability", {}).get("grafana_params", {}) @@ -418,12 +443,13 @@ def parse_network_params(plan, input_args): def default_observability_params(): return { "enabled": True, + "enable_k8s_features": False, } def default_prometheus_params(): return { - "image": "prom/prometheus:latest", + "image": "prom/prometheus:v3.1.0", "storage_tsdb_retention_time": "1d", "storage_tsdb_retention_size": "512MB", "min_cpu": 10, @@ -435,9 +461,9 @@ def default_prometheus_params(): def default_grafana_params(): return { - "image": "grafana/grafana:latest", + "image": "grafana/grafana:11.5.0", "dashboard_sources": [ - "github.com/ethereum-optimism/grafana-dashboards-public/resources@ee47a8ec0545a06ef487ed5ec03ca692e258e5ec" + "github.com/ethereum-optimism/grafana-dashboards-public/resources" ], "min_cpu": 10, "max_cpu": 1000, @@ -446,6 +472,26 @@ def default_grafana_params(): } +def default_loki_params(): + return { + "image": "grafana/loki:3.3.2", + "min_cpu": 10, + "max_cpu": 1000, + "min_mem": 128, + "max_mem": 2048, + } + + +def default_promtail_params(): + return { + "image": "grafana/promtail:3.3.2", + "min_cpu": 10, + "max_cpu": 1000, + "min_mem": 128, + "max_mem": 2048, + } + + def default_interop_params(): return { "enabled": False, diff --git a/src/package_io/sanity_check.star b/src/package_io/sanity_check.star index 1044f451..161dbff8 100644 --- a/src/package_io/sanity_check.star +++ b/src/package_io/sanity_check.star @@ -1,6 +1,21 @@ +ROOT_PARAMS = [ + "observability", + "interop", + "altda_deploy_config", + "chains", + "op_contract_deployer_params", + "global_log_level", + "global_node_selectors", + "global_tolerations", + "persistent", +] + OBSERVABILITY_PARAMS = [ "enabled", + "enable_k8s_features", "prometheus_params", + "loki_params", + "promtail_params", "grafana_params", ] @@ -14,6 +29,22 @@ PROMETHEUS_PARAMS = [ "max_mem", ] +LOKI_PARAMS = [ + "image", + "min_cpu", + "max_cpu", + "min_mem", + "max_mem", +] + +PROMTAIL_PARAMS = [ + "image", + "min_cpu", + "max_cpu", + "min_mem", + "max_mem", +] + GRAFANA_PARAMS = [ "image", "dashboard_sources", @@ -124,18 +155,6 @@ ADDITIONAL_SERVICES_PARAMS = [ "da_server", ] -ROOT_PARAMS = [ - "observability", - "interop", - "altda_deploy_config", - "chains", - "op_contract_deployer_params", - "global_log_level", - "global_node_selectors", - "global_tolerations", - "persistent", -] - EXTERNAL_L1_NETWORK_PARAMS = [ "network_id", "rpc_kind", @@ -193,6 +212,22 @@ def sanity_check(plan, optimism_config): PROMETHEUS_PARAMS, ) + if "loki_params" in optimism_config["observability"]: + validate_params( + plan, + optimism_config["observability"], + "loki_params", + LOKI_PARAMS, + ) + + if "promtail_params" in optimism_config["observability"]: + validate_params( + plan, + optimism_config["observability"], + "promtail_params", + PROMTAIL_PARAMS, + ) + if "grafana_params" in optimism_config["observability"]: validate_params( plan, diff --git a/src/proposer/op-proposer/op_proposer_launcher.star b/src/proposer/op-proposer/op_proposer_launcher.star index 947604b8..da753073 100644 --- a/src/proposer/op-proposer/op_proposer_launcher.star +++ b/src/proposer/op-proposer/op_proposer_launcher.star @@ -6,25 +6,25 @@ ethereum_package_constants = import_module( "github.com/ethpandaops/ethereum-package/src/package_io/constants.star" ) +constants = import_module("../../package_io/constants.star") +util = import_module("../../util.star") + observability = import_module("../../observability/observability.star") prometheus = import_module("../../observability/prometheus/prometheus_launcher.star") # # ---------------------------------- Batcher client ------------------------------------- # The Docker container runs as the "op-proposer" user so we can't write to root -PROPOSER_DATA_DIRPATH_ON_SERVICE_CONTAINER = "/data/op-proposer/op-proposer-data" - -# Port IDs -PROPOSER_HTTP_PORT_ID = "http" +DATA_DIRPATH_ON_SERVICE_CONTAINER = "/data/op-proposer/op-proposer-data" # Port nums -PROPOSER_HTTP_PORT_NUM = 8560 +HTTP_PORT_NUM = 8560 def get_used_ports(): used_ports = { - PROPOSER_HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( - PROPOSER_HTTP_PORT_NUM, + constants.HTTP_PORT_ID: ethereum_package_shared_utils.new_port_spec( + HTTP_PORT_NUM, ethereum_package_shared_utils.TCP_PROTOCOL, ethereum_package_shared_utils.HTTP_APPLICATION_PROTOCOL, ), @@ -60,18 +60,12 @@ def launch( observability_helper, ) - proposer_service = plan.add_service(service_name, config) + service = plan.add_service(service_name, config) + http_url = util.make_service_http_url(service) - proposer_http_port = proposer_service.ports[PROPOSER_HTTP_PORT_ID] - proposer_http_url = "http://{0}:{1}".format( - proposer_service.ip_address, proposer_http_port.number - ) - - observability.register_op_service_metrics_job( - observability_helper, proposer_service - ) + observability.register_op_service_metrics_job(observability_helper, service) - return "op_proposer" + return http_url def get_proposer_config( @@ -90,7 +84,7 @@ def get_proposer_config( cmd = [ "op-proposer", "--poll-interval=12s", - "--rpc.port=" + str(PROPOSER_HTTP_PORT_NUM), + "--rpc.port=" + str(HTTP_PORT_NUM), "--rollup-rpc=" + cl_context.beacon_http_url, "--game-factory-address=" + str(game_factory_address), "--private-key=" + gs_proposer_private_key, diff --git a/src/util.star b/src/util.star index ff350a97..78393c22 100644 --- a/src/util.star +++ b/src/util.star @@ -1,3 +1,5 @@ +constants = import_module("./package_io/constants.star") + DEPLOYMENT_UTILS_IMAGE = "mslipper/deployment-utils:latest" @@ -80,3 +82,63 @@ def label_from_image(image): def join_cmds(commands): return " && ".join(commands) + + +def get_service_port_num(service, port_id): + return service.ports[port_id].number + + +def get_service_http_port_num(service): + return get_service_port_num(service, constants.HTTP_PORT_ID) + + +def make_url_authority(host, port_num): + return "{0}:{1}".format(host, port_num) + + +def prefix_url_scheme(scheme, authority): + return "{0}://{1}".format(scheme, authority) + + +def prefix_url_scheme_http(authority): + return prefix_url_scheme("http", authority) + + +def prefix_url_scheme_ws(authority): + return prefix_url_scheme("ws", authority) + + +def make_http_url(host, port_num): + return prefix_url_scheme_http(make_url_authority(host, port_num)) + + +def make_ws_url(host, port_num): + return prefix_url_scheme_ws(make_url_authority(host, port_num)) + + +def make_service_url_authority(service, port_id): + return make_url_authority( + service.ip_address, get_service_port_num(service, port_id) + ) + + +def make_service_http_url(service, port_id=constants.HTTP_PORT_ID): + return prefix_url_scheme_http(make_service_url_authority(service, port_id)) + + +def make_service_ws_url(service, port_id=constants.WS_PORT_ID): + return prefix_url_scheme_ws(make_service_url_authority(service, port_id)) + + +def make_execution_engine_url(el_context): + return make_http_url( + el_context.ip_addr, + el_context.engine_rpc_port_num, + ) + + +def make_execution_rpc_url(el_context): + return make_http_url( + el_context.ip_addr, + el_context.rpc_port_num, + ) diff --git a/util/ns-authz/Chart.yaml b/util/ns-authz/Chart.yaml new file mode 100644 index 00000000..1e5abb61 --- /dev/null +++ b/util/ns-authz/Chart.yaml @@ -0,0 +1,6 @@ +apiVersion: v2 +name: ns-authz +description: A Helm chart to automatically grant cluster-admin to the default ServiceAccount in every new namespace +type: application +version: 0.1.0 +appVersion: "1.0.0" diff --git a/util/ns-authz/README.md b/util/ns-authz/README.md new file mode 100644 index 00000000..24cf4318 --- /dev/null +++ b/util/ns-authz/README.md @@ -0,0 +1,22 @@ +# ns-authz + +This chart deploys a lightweight namespace watcher that automatically grants the `cluster-admin` role to the default `ServiceAccount` in every namespace. It is meant to enable [kurtosis](http://kurtosis.com/) packages to run `helm` commands in Kubernetes enclaves, and is necessary as `kurtosis` runs pods using the namespace's default `ServiceAccount`, which is not typically able to modify cluster-level resources, such as `ClusterRoles`, as some Helm charts require. + +> Note: this chart is not meant to be used in production environments and is strictly a stopgap measure until `kurtosis` supports running pods with configurable `ServiceAccounts`. + +## Installation + +```bash +helm install ns-authz ./ns-authz --namespace kube-system +``` + +## Usage + +1. Create a new namespace: + ```bash + kubectl create namespace test-ns + ``` +2. Check the watcher pod logs to ensure the new namespace's `default` `ServiceAccount` was granted `cluster-admin` access: + ```bash + kubectl logs -l app=ns-authz -n kube-system + ``` diff --git a/util/ns-authz/templates/clusterrolebinding.yaml b/util/ns-authz/templates/clusterrolebinding.yaml new file mode 100644 index 00000000..cf220042 --- /dev/null +++ b/util/ns-authz/templates/clusterrolebinding.yaml @@ -0,0 +1,12 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: ns-authz-crb +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: cluster-admin +subjects: +- kind: ServiceAccount + name: {{ .Values.serviceAccount.name }} + namespace: {{ .Release.Namespace }} diff --git a/util/ns-authz/templates/configmap.yaml b/util/ns-authz/templates/configmap.yaml new file mode 100644 index 00000000..de69d342 --- /dev/null +++ b/util/ns-authz/templates/configmap.yaml @@ -0,0 +1,7 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: ns-authz-script +data: + watcher.sh: |- +{{ (.Files.Get "watcher.sh") | indent 4 }} diff --git a/util/ns-authz/templates/deployment.yaml b/util/ns-authz/templates/deployment.yaml new file mode 100644 index 00000000..0ab1e88d --- /dev/null +++ b/util/ns-authz/templates/deployment.yaml @@ -0,0 +1,32 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ns-authz + labels: + app: ns-authz +spec: + replicas: 1 + selector: + matchLabels: + app: ns-authz + template: + metadata: + labels: + app: ns-authz + spec: + serviceAccountName: {{ .Values.serviceAccount.name }} + containers: + - name: ns-authz + image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" + imagePullPolicy: {{ .Values.image.pullPolicy }} + command: ["/bin/sh", "/scripts/watcher.sh"] + volumeMounts: + - name: script-volume + mountPath: /scripts + volumes: + - name: script-volume + configMap: + name: ns-authz-script + nodeSelector: {{ toYaml .Values.nodeSelector | nindent 8 }} + tolerations: {{ toYaml .Values.tolerations | nindent 8 }} + affinity: {{ toYaml .Values.affinity | nindent 8 }} diff --git a/util/ns-authz/templates/serviceaccount.yaml b/util/ns-authz/templates/serviceaccount.yaml new file mode 100644 index 00000000..755e5acb --- /dev/null +++ b/util/ns-authz/templates/serviceaccount.yaml @@ -0,0 +1,7 @@ +{{- if .Values.serviceAccount.create }} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ .Values.serviceAccount.name }} + namespace: {{ .Release.Namespace }} +{{- end }} diff --git a/util/ns-authz/values.yaml b/util/ns-authz/values.yaml new file mode 100644 index 00000000..54b41f70 --- /dev/null +++ b/util/ns-authz/values.yaml @@ -0,0 +1,16 @@ +image: + repository: bitnami/kubectl + tag: latest + pullPolicy: IfNotPresent + +serviceAccount: + name: ns-authz-sa + create: true + +resources: {} + +nodeSelector: {} + +tolerations: [] + +affinity: {} diff --git a/util/ns-authz/watcher.sh b/util/ns-authz/watcher.sh new file mode 100755 index 00000000..9437a454 --- /dev/null +++ b/util/ns-authz/watcher.sh @@ -0,0 +1,32 @@ +#!/bin/bash + +ROLEBINDING_NAME="cluster-admin-binding" + +echo "starting namespace watcher..." + +ensureClusterRoleBinding() { + local ns="$1" + local sa="$2" + + # Strip off the leading 'namespace/' + local nsName="${1#namespace/}" + local clusterRoleBindingName="${ROLEBINDING_NAME}-${nsName}-${sa}" + + echo "ensuring CRB '$clusterRoleBindingName'..." + + if kubectl get clusterrolebinding "$clusterRoleBindingName" -o name >/dev/null 2>&1; then + echo "CRB already exists, skipping" + return + fi + + echo "creating CRB '$clusterRoleBindingName'..." + + kubectl create clusterrolebinding "$clusterRoleBindingName" \ + --clusterrole=cluster-admin \ + --serviceaccount="${nsName}:${sa}" +} + +kubectl get namespaces --watch -o name | while read ns; do + ensureClusterRoleBinding $ns "default" + ensureClusterRoleBinding $ns "kurtosis-api" +done