How do idempotent runs work?
Background
To learn about what idempotent runs are in Kurtosis and the motivation behind this feature, go here.
When running the kurtosis run
command, you may notice the following message get printed:
SKIPPED - This instruction has already been run in this enclave
The reason this happens is because Kurtosis will optimize each run of a Starlark package based on what has already been run in a given enclave, thus reducing execution time and resources.
This means when you try to run the exact same package twice in a row, Kurtosis will skip all the instructions for the second run because they were already executed in the first run.
This feature is still experimental and can be deactivated by adding --experimental NO_INSTRUCTIONS_CACHING
parameter
to the kurtosis run
command.
How it works
Definitions
The enclave plan is defined as the sequence of Starlark instructions that were previously executed inside a given enclave. Meanwhile, the submitted plan is defined as the set of instructions generated by interpreting the package before it gets executed.
When running a Starlark package in a world without idempotent runs, all the instructions are naively executed inside the enclave and the new post-execution enclave plan is set to the concatenation of the previous enclave plan and the submitted plan.
To avoid re-running instructions that have already been run inside the enclave, Kurtosis will try to maximize the overlap between the submitted plan and the tail-end portion of the enclave plan. In the overlapping portion, if any, Kurtosis will re-run only the instructions that were updated. Then, if they are new instructions at the end of the submitted plan that were not in the enclave plan, they are executed as new instructions and added to the enclave plan
Instruction equality
To spot overlap between the enclave plan and the submitted plan, Kurtosis needs to compare instructions one by one. There are different level of equality:
- The submitted plan instruction is equal to the enclave plan instruction - the instructions are of the same type
(i.e. two
exec
,wait
,upload_file
etc.) and the set of arguments of the instructions are strictly identical. - The submitted plan instruction is an update of the enclave plan instruction - the instructions are of the same
type but only a subset of pre-defined arguments are identical. This only exist for a certain instructions:
add_service
instruction adding a service with the same name but a differentServiceConfig
object will be considered as an update to the enclave plan instruction. The service will be restarted inside the enclave with the new service configuration.upload_file
instruction uploading a files artifact with the same name but different file contents will be considered as an update to the enclave plan instruction. The files artifact will be updated with the new contents inside the enclave.render_template
instruction creating a files artifact with the same name but a different content will be considered as an update to the enclave plan instruction. Similarly toupload_file
, the content of the files artifact will be updated inside the enclave.store_service_file
instruction creating a files artifact with the same name but either a different source path or a different service name will be considered as an update to the enclave plan instruction. Similarly toupload_file
, the content of the files artifact will be updated inside the enclave.
Two instructions that doesn't fit into any of the two categories above are considered different (i.e. independent from each other).
It's good to callout here that a few Kurtosis instructions are fundamentally incompatible with the concept of idempotency. The use of one of those instructions in the package will make the plans not resolvable, and Kurtosis will default to the "naive" execution strategy of running the submitted plan on top of the current plan, without even trying the overlap them. Those instructions currently are:
remove_service
start_service
stop_service
Instruction dependencies
Certain instructions depend on other ones, and with the concept of instruction update explained above comes the concept of dependency between instructions. It's easier to understand the concept with an example.
Let's consider a submitted plan with 2 instructions: an add_service
adding service_1
and an exec
on service_1
.
If the first add_service
instruction is considered an update when running the package, service_1
will be updated
and therefore restarted. In that case, even if the exec
is equal to the matching instruction in the enclave plan
it will be re-run because it runs on a component (service_1
) that has been updated. It is said that the exec
instruction depends on the add_service
instruction.
Dependency relationships can be the following:
add_service
instruction depends on the files artifact mounted onto the service. If one of the files artifact is updated, theadd_service
will be re-runexec
instruction depends on the service on which it runs. If the service is updated, theexec
will be re-run.request
instruction depends on the service on which it runs, similarly toexec
store_service_file
instruction depends on the service on which it runs, similarly toexec
wait
instruction depends on the service on which it runs, similarly toexec
Examples
Case of a submitted plan being disjoint from the enclave plan
No instruction get skipped, all instructions from the submitted plan are executed and appended to the enclave plan.
Case of a submitted plan partially overlapping the enclave plan
The first two add_service
instructions from the submitted plans are equal to the last two instructions of the
enclave plan. They are therefore skipped, and only the exec
and store_service_files
from the submitted plan are
executed.
Case of a submitted plan partially overlapping the enclave plan with instruction updates
The upload_file
instruction is equal, it will be skipped similarly to the case explained above.
The add_service
instruction from the submitted plan adding service service_1
is an update of the add_service
instruction from the enclave plan adding service_1
(notice the ***
on the schema - the ServiceConfig
object has
been updated, for example to update the container image version). It will therefore be re-run and service_1
will
be updated inside the enclave.
The second add_service
instruction from the submitted plan adding service service_2
is equal to the one
from the enclave plan. It will be skipped.
The exec
instruction from the submitted plan is equal to the one from the enclave plan. However, since
it operates on service_1
and service_1
was updated in the submitted plan, this instruction will also be
re-run.
The store_service_file
from the submitted plan is equal to the one from the enclave plan, and the service on
which it runs (service_2
) was left intact in the submitted plan. It will therefore be skipped.