跳转至

Configuration

This is a list of configurable values of Kepler System. The configuration can be also applied by defining the following CR spec if Kepler Operator is installed.

Point of Configuration Spec Description Default
Kepler CR (single item: default)
Kepler DaemonSet Deployment daemon.exporter.image Kepler main image quay.io/sustainable_computing_io/kepler:latest
Kepler DaemonSet Deployment daemon.exporter.port Metric exporter port 9102
Kepler DaemonSet Deployment daemon.estimator-sidecar.enabled Kepler Estimator Sidecar patch false
Kepler DaemonSet Deployment daemon.estimator-sidecar.image Kepler estimator sidecar image -
Kepler DaemonSet Deployment daemon.estimator-sidecar.mnt-path Mount path between main container and the sidecar for unix domain socket /tmp
Kepler DaemonSet Environment (METRIC_PATH) daemon.exporter.path Path to export metrics /metrics
Kepler DaemonSet Environment (MODEL_SERVER_ENABLE) model-server.enabled Kepler Model Server Pod connection false
model-server.enabled
Model Server Pod Pod Environment (MODEL_SERVER_PORT) model-server.port Model serving port of model server 8100
Model Server Pod Pod Environment (PROM_SERVER) model-server.prom Endpoint to Prometheus metric server http://prometheus-k8s.monitoring.svc.cluster.local:9090
Model Server Pod Pod Environment (MODEL_PATH) model-server.model-path Path to keep models models
Kepler DaemonSet Environment (MODEL_SERVER_ENDPOINT) daemon.model-server Endpoint to server container of model server http://kepler-model-server.monitoring.cluster.local:[model-server.port]/model
Model Server Pod Deployment model-server.trainer Model online trainer patch false
model-server.trainer
Model Server Pod Environment (PROM_QUERY_INTERVAL) model-server.prom_interval Interval to execute training pipelines in seconds 20
Model Server Pod Environment (PROM_QUERY_STEP) model-server.prom-step Step of query data point in training pipelines in seconds 3
Model Server Pod Environment (PROM_HEADERS) model-server.prom-header For specify required header (such as authentication) -
Model Server Pod Environment (PROM_SSL_DISABLE) model-server.prom-ssl Disable ssl in Prometheus connection true
Model Server Pod Environment (INITIAL_MODELS_LOC) model-server.init-loc Root URL of offline models to use as initial models https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-server/main/tests/test_models
Model Server Pod Environment (INITIAL_MODEL_NAMES.[MODEL_TYPE]) model-server.[MODEL_TYPE] Name of default pipeline for each model type -
CollectMetric CR (single item: default)
Kepler DaemonSet Environment (COUNTER_METRICS) counter List of performance metrics to enable from counter source * (enable all available metrics from counter source)
Kepler DaemonSet Environment (BPF_METRICS) bpf List of performance metrics to enable from bpf (aka. eBPF) source * (enable all available metrics from bpf source)
Kepler DaemonSet Environment (GPU_METRICS) gpu List of performance metrics to enable from gpu source * (enable all available metrics from gpu source)
ExportMetric CR (single item: default)
Kepler DaemonSet Environment (PERF_METRICS) perf List of performance metrics to export * (enable all collected performance metrics)
Kepler DaemonSet Environment (EXPORT_NODE_TOTAL_POWER) node_total_power Toggle whether to export node total power true
Kepler DaemonSet Environment (EXPORT_NODE_COMPONENT_POWERS) node_component_powers Toggle whether to export node powers by components true
Kepler DaemonSet Environment (EXPORT_POD_TOTAL_POWER) pod_total_power Toggle whether to export pod total power true
Kepler DaemonSet Environment (EXPORT_POD_COMPONENT_POWERS) pod_component_powers Toggle whether to export pod powers by components true
EstimatorConfig CR (multiple items: node-total-power, node-component-powers, pod-total-power, pod-component-powers)
Kepler DaemonSet Environment (MODEL_CONFIG.[MODEL_ITEM]_ESTIMATOR) use-sidecar Toggle whether to use estimator sidecar for power estimation false
Kepler DaemonSet Environment (MODEL_CONFIG.[MODEL_ITEM]_MODEL) fixed-model Specify model name (auto-selected)
Kepler DaemonSet Environment (MODEL_CONFIG.[MODEL_ITEM]_FILTERS) filters Specify model filter conditions in string (auto-selected)
Kepler DaemonSet Environment (MODEL_CONFIG.[MODEL_ITEM]_INIT_URL) init-url URL to initial model location -
RatioConfig CR (single items: default)
Kepler DaemonSet Environment (CORE_USAGE_METRIC) core_metric Specify metric for compute core (mostly cpu) component of pod power by ratio modeling cpu_instr
Kepler DaemonSet Environment (DRAM_USAGE_METRIC) dram_metric Specify metric for computing dram component of pod power by ratio modeling cache_miss
Kepler DaemonSet Environment (UNCORE_USAGE_METRIC) uncore_metric Specify metric for computing uncore component of pod power by ratio modeling (evenly divided)
Kepler DaemonSet Environment (GENERAL_USAGE_METRIC) general_metric Specify metric for computing uncategorized component (pkg-core-uncore) of pod power by ratio modeling cpu_instr
Kepler DaemonSet Environment (GPU_USAGE_METRIC) core_metric Specify metric for computing gpu component of pod power by ratio modeling -

Remarks:

  • [MODEL_ITEM] can be either of the following values corresponding to item names: NODE_TOTAL, NODE_COMPONENT, POD_TOTAL, POD_COMPONENTS.
  • [MODEL_TYPE] is a concatenation of [MODEL_ITEM] and [OUTPUT_FORMAT] which can be either POWER for archived model or MODEL_WEIGHT for weights in json.

For example:

  • NODE_TOTAL_POWER: archived model to estimate node total power used by estimator sidecar
  • POD_COMPONENTS_MODEL_WEIGHT: model weight to estimate pod component powers used by linear regressor embedded in Kepler main component.

Copyright Contributors to the Kepler's project.

The Linux Foundation® (TLF) has registered trademarks and uses trademarks. For a list of TLF trademarks, see Trademark Usage.