CLI

Init file for Sparkle commands.

about

Helper module for information about Sparkle.

configurator

This package provides configurator support for Sparkle.

class sparkle.configurator.AblationScenario(configuration_scenario: ConfigurationScenario, test_set: InstanceSet, cutoff_length: str, concurrent_clis: int, best_configuration: dict, ablation_racing: bool = False)[source]

Class for ablation analysis.

check_for_ablation() → bool[source]: Checks if ablation has terminated successfully.

static check_requirements(verbose: bool = False) → bool[source]: Check if Ablation Analysis is installed.

create_configuration_file() → Path[source]

Create a configuration file for ablation analysis.

Returns:: Path to the created configuration file.

create_instance_file(test: bool = False) → Path[source]: Create an instance file for ablation analysis.

create_scenario(override_dirs: bool = False) → None[source]: Create scenario directory and files.

static download_requirements(ablation_url: str = 'https://github.com/ADA-research/Sparkle/raw/refs/heads/development/Resources/Other/ablationAnalysis-0.9.4.zip') → None[source]: Download Ablation Analysis executable.

static from_file(path: Path, config_scenario: ConfigurationScenario) → AblationScenario[source]: Reads scenario file and initalises AblationScenario.

read_ablation_table() → list[list[str]][source]: Read from ablation table of a scenario.

submit_ablation(log_dir: Path, sbatch_options: list[str] = [], slurm_prepend: str | list[str] | Path | None = None, run_on: Runner = Runner.SLURM) → list[Run][source]

Submit an ablation job.

Args:: log_dir: Directory to store job logs sbatch_options: Options to pass to sbatch slurm_prepend: Script to prepend to sbatch script run_on: Determines to which RunRunner queue the job is added
Returns:: A list of Run objects. Empty when running locally.

class sparkle.configurator.ConfigurationScenario(solver: Solver, instance_set: InstanceSet, sparkle_objectives: list[SparkleObjective], number_of_runs: int, parent_directory: Path)[source]

Template class to handle a configuration scenarios.

property ablation_scenario: AblationScenario: Return the ablation scenario for the scenario if it exists.

property configuration_ids: list[str]

Return the IDs of the configurations for the scenario.

Only exists after the scenario has been created.

Returns:: List of configuration IDs, one for each run.

property configurator: Configurator: Return the type of configurator the scenario belongs to.

create_scenario(parent_directory: Path) → None[source]

Create scenario with solver and instances in the parent directory.

This prepares all the necessary subdirectories related to configuration.

Args:: parent_directory: Directory in which the scenario should be created.

create_scenario_file() → Path[source]: Create a file with the configuration scenario.

classmethod find_scenario(directory: Path, solver: Solver, instance_set: InstanceSet) → ConfigurationScenario[source]: Resolve a scenario from a directory and Solver / Training set.

static from_file(scenario_file: Path) → ConfigurationScenario[source]: Reads scenario file and initalises ConfigurationScenario.

property name: str: Return the name of the scenario.

serialise() → dict[source]: Serialize the configuration scenario.

property timestamp: str: Return the timestamp of the scenario.

class sparkle.configurator.Configurator(multi_objective_support: bool = False)[source]

Abstact class to use different configurators like SMAC.

static check_requirements(verbose: bool = False) → bool[source]: Check if the configurator is installed.

configure(configuration_commands: list[str], data_target: PerformanceDataFrame, output: Path, scenario: ConfigurationScenario, configuration_ids: list[str] | None = None, validate_after: bool = True, sbatch_options: list[str] | None = None, slurm_prepend: str | list[str] | Path | None = None, num_parallel_jobs: int | None = None, base_dir: Path | None = None, run_on: Runner = Runner.SLURM) → Run[source]

Start configuration job.

This method is shared by the configurators and should be called by the implementation/subclass of the configurator.

Args:: configuration_commands: List of configurator commands to execute data_target: Performance data to store the results. output: Output directory. scenario: ConfigurationScenario to execute. configuration_ids: List of configuration ids that are to be created validate_after: Whether the configurations should be validated sbatch_options: List of slurm batch options to use slurm_prepend: Slurm script to prepend to the sbatch num_parallel_jobs: The maximum number of jobs to run in parallel base_dir: The base_dir of RunRunner where the sbatch scripts will be placed run_on: On which platform to run the jobs. Default: Slurm.
Returns:: A RunRunner Run object.

static download_requirements() → None[source]: Download the configurator.

get_status_from_logs() → None[source]: Method to scan the log files of the configurator for warnings.

property name: str: Return the name of the configurator.

static organise_output(output_source: Path, output_target: Path, scenario: ConfigurationScenario, configuration_id: str) → None | str[source]

Method to restructure and clean up after a single configurator call.

Args:: output_source: Path to the output file of the configurator run. output_target: Path to the Performance DataFrame to store result. scenario: ConfigurationScenario of the configuration. configuration_id: ID (of the run) of the configuration.

static save_configuration(scenario: ConfigurationScenario, configuration_id: str, configuration: dict, output_target: Path) → dict | None[source]

Method to save a configuration to a file.

If the output_target is None, return the configuration.

Args:: scenario: ConfigurationScenario of the configuration. Should be removed. configuration_id: ID (of the run) of the configuration. configuration: Configuration to save. output_target: Path to the Performance DataFrame to store result.

static scenario_class() → ConfigurationScenario[source]: Return the scenario class of the configurator.

instance

This package provides instance set support for Sparkle.

class sparkle.instance.FileInstanceSet(target: Path)[source]

Object representation of a set of single-file instances.

property name: str: Get instance set name.

class sparkle.instance.InstanceSet(target: Path | list[str, Path])[source]

Base object representation of a set of instances.

property all_paths: list[Path]: Returns all file paths in the instance set as a flat list.

get_path_by_name(name: str) → Path | list[Path][source]: Retrieves an instance paths by its name. Returns None upon failure.

property instance_names: list[str]: Get processed instance names for instances.

property instance_paths: list[Path]: Get processed instance paths.

property instances: list[str]: Get instance names with relative path.

property name: str: Get instance set name.

property size: int: Returns the number of instances in the set.

sparkle.instance.Instance_Set(target: any) → InstanceSet[source]: The combined interface for all instance set types.

class sparkle.instance.IterableFileInstanceSet(target: Path)[source]

Object representation of files containing multiple instances.

property size: int: Returns the number of instances in the set.

class sparkle.instance.MultiFileInstanceSet(target: Path)[source]

Object representation of a set of multi-file instances.

property all_paths: list[Path]: Returns all file paths in the instance set as a flat list.

property instances: list[str]: Get instance names with relative path for multi-file instances.

platform

This package provides platform support for Sparkle.

class sparkle.platform.SettingState(value)[source]: Enum of possible setting states.

class sparkle.platform.Settings(file_path: PurePath | None = None)[source]

Class to read, write, set, and get settings.

add_slurm_extra_option(name: str, value: str, origin: SettingState = SettingState.DEFAULT) → None[source]: Add additional Slurm options.

static check_settings_changes(cur_settings: Settings, prev_settings: Settings) → bool[source]

Check if there are changes between the previous and the current settings.

Prints any section changes, printing None if no setting was found.

Args:: cur_settings: The current settings prev_settings: The previous settings
Returns:: True iff there are no changes.

get_ablation_racing_flag() → bool[source]: Return a bool indicating whether the racing flag is set for ablation.

get_configurator_max_iterations() → int | None[source]: Get the maximum number of configurator iterations.

get_configurator_number_of_runs() → int[source]: Return the number of configuration runs.

get_configurator_output_path(configurator: Configurator) → Path[source]: Return the configurator output path.

get_configurator_settings(configurator_name: str) → dict[str, any][source]: Return the configurator settings.

get_configurator_solver_calls() → int | None[source]: Return the maximum number of solver calls the configurator can do.

get_general_check_interval() → int[source]: Return the general check interval.

get_general_extractor_cutoff_time() → int[source]: Return the cutoff time in seconds for feature extraction.

get_general_solver_cutoff_time() → int[source]: Return the cutoff time in seconds for Solvers.

get_general_sparkle_configurator() → Configurator[source]: Return the configurator init method.

get_general_sparkle_objectives(filter_metric: bool = False) → list[SparkleObjective][source]: Return the Sparkle objectives.

get_general_verbosity() → VerbosityLevel[source]: Return the general verbosity.

get_irace_first_test() → int | None[source]

Return the first test for IRACE.

Specifies how many instances are evaluated before the first elimination test. IRACE Default: 5. [firstTest]

get_irace_max_experiments() → int[source]: Return the max number of experiments for IRACE.

get_irace_max_iterations() → int[source]: Return the number of iterations for IRACE.

get_irace_max_time() → int[source]: Return the max time in seconds for IRACE.

get_irace_mu() → int | None[source]

Return the mu for IRACE.

Parameter used to define the number of configurations sampled and evaluated at each iteration. IRACE Default: 5. [mu]

get_number_of_jobs_in_parallel() → int[source]: Return the number of runs Sparkle can do in parallel.

get_parallel_portfolio_check_interval() → int[source]: Return the parallel portfolio check interval.

get_parallel_portfolio_number_of_seeds_per_solver() → int[source]: Return the parallel portfolio seeds per solver to start.

get_paramils_cli_cores() → int | None[source]

Number of cores to use to execute runs.

In other words, the number of requests to run at a given time.

get_paramils_focused_approach() → bool[source]: Return the focused approach for ParamILS.

get_paramils_initial_configurations() → int | None[source]: Return the initial configurations for ParamILS.

get_paramils_max_iterations() → int | None[source]: Get the maximum number of paramils iterations.

get_paramils_max_runs() → int | None[source]: Return the maximum number of runs for ParamILS.

get_paramils_min_runs() → int | None[source]: Return the minimum number of runs for ParamILS.

get_paramils_random_restart() → float | None[source]: Return the random restart chance for ParamILS.

get_paramils_tuner_timeout() → int | None[source]: Return the maximum CPU time for ParamILS.

get_paramils_use_cpu_time_in_tunertime() → bool[source]: Return whether to use CPU time in tunertime.

get_run_on() → Runner[source]: Return the compute on which to run.

get_selection_class() → type[source]: Return the selector class.

get_selection_model() → type[source]: Return the selector model class.

get_slurm_extra_options(as_args: bool = False) → dict | list[source]: Return a dict with additional Slurm options.

get_slurm_job_prepend() → str[source]: Return the Slurm job prepend.

get_slurm_job_submission_limit() → int[source]: [NOT ACTIVE YET] Return the maximum number of jobs you can submit to Slurm.

get_slurm_max_parallel_runs_per_node() → int[source]: Return the number of algorithms Slurm can run in parallel per node.

get_smac2_cli_cores() → int | None[source]

Number of cores to use to execute runs.

In other words, the number of requests to run at a given time.

get_smac2_cpu_time() → int | None[source]: Return the budget per configuration run in seconds (cpu).

get_smac2_max_iterations() → int | None[source]: Get the maximum number of SMAC2 iterations.

get_smac2_target_cutoff_length() → str[source]

Return the target algorithm cutoff length.

‘A domain specific measure of when the algorithm should consider itself done.’

Returns:: The target algorithm cutoff length.

get_smac2_use_cpu_time_in_tunertime() → bool[source]: Return whether to use CPU time in tunertime.

get_smac2_wallclock_time() → int | None[source]: Return the budget per configuration run in seconds (wallclock).

get_smac3_cputime_limit() → float[source]

Get the SMAC3 CPU time limit.

‘The maximum CPU time in seconds that SMAC is allowed to run.’

get_smac3_crash_cost() → float | list[float][source]

Get the SMAC3 objective crash cost.

‘crash_cost : float | list[float], defaults to np.inf Defines the cost for a failed trial. In case of multi-objective, each objective can be associated with a different cost.’

get_smac3_facade_max_ratio() → float[source]: Return the SMAC3 facade max ratio.

get_smac3_max_budget() → int | float[source]

Get the SMAC3 max budget.

‘The maximum budget (epochs, subset size, number of instances, …) that is used for the optimization. Use this argument if you use multi-fidelity or instance optimization.’

get_smac3_min_budget() → int | float[source]

Get the SMAC3 min budget.

‘The minimum budget (epochs, subset size, number of instances, …) that is used for the optimization. Use this argument if you use multi-fidelity or instance optimization.’

get_smac3_number_of_trials() → int | None[source]

Return the number of SMAC3 trials (Solver calls).

‘The maximum number of trials (combination of configuration, seed, budget, and instance, depending on the task) to run.’

get_smac3_smac_facade() → str[source]: Return the SMAC3 facade.

get_smac3_termination_cost_threshold() → float | list[float][source]

Get the SMAC3 termination cost threshold.

‘Defines a cost threshold when the optimization should stop. In case of multi-objective, each objective must be associated with a cost. The optimization stops when all objectives crossed the threshold.’

get_smac3_use_default_config() → bool[source]

Get the SMAC3 to use default config.

‘If True, the configspace’s default configuration is evaluated in the initial design. For historic benchmark reasons, this is False by default. Notice, that this will result in n_configs + 1 for the initial design. Respecting n_trials, this will result in one fewer evaluated configuration in the optimization.’

get_smac3_walltime_limit() → float[source]

Get the SMAC3 walltime limit.

‘The maximum time in seconds that SMAC is allowed to run.’

read_settings_ini(file_path: PurePath = PurePosixPath('Settings/sparkle_settings.ini'), state: SettingState = SettingState.FILE) → None[source]: Read the settings from an INI file.

set_ablation_racing_flag(value: bool = False, origin: SettingState = SettingState.DEFAULT) → None[source]: Set a flag indicating whether racing should be used for ablation.

set_configurator_max_iterations(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the number of configuration runs.

set_configurator_number_of_runs(value: int = 25, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the number of configuration runs.

set_configurator_solver_calls(value: int = 100, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the number of solver calls.

set_general_check_interval(value: int = 10, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the general check interval.

set_general_extractor_cutoff_time(value: int = 60, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the cutoff time in seconds for feature extraction.

set_general_solver_cutoff_time(value: int = 60, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the cutoff time in seconds for Solver.

set_general_sparkle_configurator(value: str = 'SMAC2', origin: SettingState = SettingState.DEFAULT) → None[source]: Set the Sparkle configurator.

set_general_sparkle_objectives(value: list[~sparkle.types.objective.SparkleObjective] = [<sparkle.types.objective.PAR object>], origin: ~sparkle.platform.settings_objects.SettingState = SettingState.DEFAULT) → None[source]: Set the sparkle objective.

set_general_verbosity(value: VerbosityLevel = VerbosityLevel.STANDARD, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the general verbosity to use.

set_irace_first_test(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the first test for IRACE.

set_irace_max_experiments(value: int = 0, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the max number of experiments for IRACE.

set_irace_max_iterations(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]

Set the number of iterations for IRACE.

Maximum number of iterations to be executed. Each iteration involves the generation of new configurations and the use of racing to select the best configurations. By default (with 0), irace calculates a minimum number of iterations as N^iter = ⌊2 + log2 N param⌋, where N^param is the number of non-fixed parameters to be tuned. Setting this parameter may make irace stop sooner than it should without using all the available budget. IRACE recommends to use the default value (Empty).

set_irace_max_time(value: int = 0, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the max time in seconds for IRACE.

set_irace_mu(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the mu for IRACE.

set_number_of_jobs_in_parallel(value: int = 25, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the number of runs Sparkle can do in parallel.

set_parallel_portfolio_check_interval(value: int = 4, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the parallel portfolio check interval.

set_parallel_portfolio_number_of_seeds_per_solver(value: int = 1, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the parallel portfolio seeds per solver to start.

set_paramils_cli_cores(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the number of cores to use for ParamILS CLI.

set_paramils_focused_approach(value: bool | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the focused approach for ParamILS.

set_paramils_initial_configurations(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the initial configurations for ParamILS.

set_paramils_max_iterations(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the maximum number of ParamILS iterations.

set_paramils_max_runs(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the maximum number of runs for ParamILS.

set_paramils_min_runs(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the minimum number of runs for ParamILS.

set_paramils_random_restart(value: float | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the random restart chance for ParamILS.

set_paramils_tuner_timeout(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the maximum CPU time for ParamILS.

set_paramils_use_cpu_time_in_tunertime(value: bool | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set whether to use CPU time in tunertime.

set_run_on(value: Runner = 'local', origin: SettingState = SettingState.DEFAULT) → None[source]: Set the compute on which to run.

set_selection_class(value: str = 'MultiClassClassifier', origin: SettingState = SettingState.DEFAULT) → None[source]

Set the Sparkle selector.

Can contain any of the class names as defined in asf.selectors.

set_selection_model(value: str = 'RandomForestClassifier', origin: SettingState = SettingState.DEFAULT) → None[source]

Set the selector model.

Can be any of the sklearn.ensemble models.

set_slurm_job_prepend(value: str = '', origin: SettingState = SettingState.DEFAULT) → None[source]: Set the Slurm job prepend.

set_slurm_job_submission_limit(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: [NOT ACTIVE YET] Set the number of jobs that can be submitted to Slurm.

set_slurm_max_parallel_runs_per_node(value: int = 8, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the number of algorithms Slurm can run in parallel per node.

set_smac2_cli_cores(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the number of cores to use for SMAC2 CLI.

set_smac2_cpu_time(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the budget per configuration run in seconds (cpu).

set_smac2_max_iterations(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the maximum number of SMAC2 iterations.

set_smac2_target_cutoff_length(value: str = 'max', origin: SettingState = SettingState.DEFAULT) → None[source]: Set the target algorithm cutoff length.

set_smac2_use_cpu_time_in_tunertime(value: bool | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set whether to use CPU time in tunertime.

set_smac2_wallclock_time(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the budget per configuration run in seconds (wallclock).

set_smac3_cputime_limit(value: float | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 CPU time limit.

set_smac3_crash_cost(value: float | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 objective crash cost.

set_smac3_facade_max_ratio(value: float | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 facade max ratio.

set_smac3_max_budget(value: int | float | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 max budget.

set_smac3_min_budget(value: int | float | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 min budget.

set_smac3_number_of_trials(value: int | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the number of SMAC3 trials.

set_smac3_smac_facade(value: str = 'AlgorithmConfigurationFacade', origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 facade.

set_smac3_termination_cost_threshold(value: float | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 termination cost threshold.

set_smac3_use_default_config(value: bool | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 to use default config.

set_smac3_walltime_limit(value: float | None = None, origin: SettingState = SettingState.DEFAULT) → None[source]: Set the SMAC3 walltime limit.

write_settings_ini(file_path: Path) → None[source]: Write the settings to an INI file.

write_used_settings() → None[source]: Write the used settings to the default locations.

selector

This package provides selector support for Sparkle.

class sparkle.selector.Extractor(directory: Path, runsolver_exec: Path | None = None)[source]

Extractor base class for extracting features from instances.

Builds a command line string seperated by space.

Args:: instance: The instance to run on feature_group: The optional feature group to run the extractor for. outputfile: Optional file to write the output to. runsolver_args: The arguments for runsolver. If not present,

will run the extractor without runsolver.
Returns:: The command seperated per item in the list.

property feature_groups: list[str]: Returns the various feature groups the Extractor has.

property features: list[tuple[str, str]]: Determines the features of the extractor.

get_feature_vector(result: Path, runsolver_values: Path | None = None) → list[str][source]

Extracts feature vector from an output file.

Args:: result: The raw output of the extractor runsolver_values: The output of runsolver.
Returns:: A list of features. Vector of missing values upon failure.

property groupwise_computation: bool: Determines if you can call the extractor per group for parallelisation.

property output_dimension: int: The size of the output vector of the extractor.

Runs an extractor job with Runrunner.

Args:

extractor_path: Path to the executable instance: Path to the instance to run on feature_group: The feature group to compute. Must be supported by the

extractor to use.

output_file: Target output. If None, piped to the RunRunner job. cutoff_time: CPU cutoff time in seconds log_dir: Directory to write logs. Defaults to CWD.

Returns:

The features or None if an output file is used, or features can not be found.

class sparkle.selector.SelectionScenario(parent_directory: Path, selector: Selector, objective: SparkleObjective, performance_data: PerformanceDataFrame | Path, feature_data: FeatureDataFrame | Path, feature_extractors: list[str] | None = None, solver_cutoff: int | float | None = None, extractor_cutoff: int | float | None = None, ablate: bool = False, subdir_path: Path | None = None)[source]

A scenario for a Selector.

create_scenario() → None[source]: Prepare the scenario directories.

create_scenario_file() → None[source]

Create the scenario file.

Write the scenario to file.

static from_file(scenario_file: Path) → SelectionScenario[source]: Reads scenario file and initalises SelectorScenario.

property instance_sets: list[str]: Get all the instance sets used in this scenario.

serialise() → dict[source]: Serialize the scenario.

property solvers: list[str]: Get the solvers used for the selector.

property test_instance_sets: list[str]: Get the test instance sets.

property test_instances: list[str]: Get the test instances.

property training_instance_sets: list[str]: Get the training instance sets.

property training_instances: list[str]: Get the training instances.

class sparkle.selector.Selector(selector_class: AbstractModelBasedSelector, model_class: AbstractPredictor | ClassifierMixin | RegressorMixin)[source]

The Selector class for handling Algorithm Selection.

construct(selection_scenario: SelectionScenario, run_on: Runner = Runner.SLURM, job_name: str | None = None, sbatch_options: list[str] | None = None, slurm_prepend: str | list[str] | Path | None = None, base_dir: Path = PosixPath('.')) → Run[source]

Construct the Selector.

Args:: selector_scenario: The scenario to construct the Selector for. run_on: Which runner to use. Defaults to slurm. job_name: Name to give the construction job when submitting. sbatch_options: Additional options to pass to sbatch. slurm_prepend: Slurm script to prepend to the sbatch base_dir: The base directory to run the Selector in.
Returns:: The construction Run

property name: str: Return the name of the selector.

run(selector_path: Path, instance: str, feature_data: FeatureDataFrame) → list[source]: Run the Selector, returning the prediction schedule upon success.

solver

This package provides solver support for Sparkle.

class sparkle.solver.SATVerifier[source]

Class to handle the SAT verifier.

static call_sat_raw_result(instance: Path, raw_result: Path) → SolverStatus[source]

Run a SAT verifier to determine correctness of a result.

Args:: instance: path to the instance raw_result: path to the result to verify
Returns:: The status of the solver on the instance

static sat_verify_output(sat_output: str) → SolverStatus[source]

Return the status of the SAT verifier.

Four statuses are possible: “SAT”, “UNSAT”, “WRONG”, “UNKNOWN”

static verify(instance: Path, output: dict, solver_call: list[str]) → SolverStatus[source]: Run a SAT verifier and return its status.

class sparkle.solver.SolutionVerifier[source]

Solution verifier base class.

verify(instance: Path, output: dict, solver_call: list[str]) → SolverStatus[source]: Verify the solution.

class sparkle.solver.Solver(directory: Path, runsolver_exec: Path | None = None, deterministic: bool | None = None, verifier: SolutionVerifier | None = None)[source]

Class to handle a solver and its directories.

build_cmd(instance: str | list[str], objectives: list[SparkleObjective], seed: int, cutoff_time: int | None = None, configuration: dict | None = None, log_dir: Path | None = None) → list[str][source]

Build the solver call on an instance with a configuration.

Args:: instance: Path to the instance. seed: Seed of the solver. cutoff_time: Cutoff time for the solver. configuration: Configuration of the solver.
Returns:: List of commands and arguments to execute the solver.

static config_str_to_dict(config_str: str) → dict[str, str][source]: Parse a configuration string to a dictionary.

get_configuration_space() → ConfigurationSpace[source]: Get the ConfigurationSpace of the PCS file.

get_pcs_file(port_type: PCSConvention) → Path[source]

Get path of the parameter file of a specific convention.

Args:

port_type: Port type of the parameter file. If None, will return the: file with the shortest name.

Returns:

Path to the parameter file. None if it can not be resolved.

static parse_solver_output(solver_output: str, solver_call: list[str | Path] | None = None, objectives: list[SparkleObjective] | None = None, verifier: SolutionVerifier | None = None) → dict[str, Any][source]

Parse the output of the solver.

Args:: solver_output: The output of the solver run which needs to be parsed solver_call: The solver call used to run the solver objectives: The objectives to apply to the solver output verifier: The verifier to check the solver output
Returns:: Dictionary representing the parsed solver output

property pcs_file: Path: Get path of the parameter file.

port_pcs(port_type: PCSConvention) → None[source]: Port the parameter file to the given port type.

read_pcs_file() → bool[source]: Checks if the pcs file can be read.

Run the solver on an instance with a certain configuration.

Args:

instance: The instance(s) to run the solver on, list in case of multi-file.: In case of an instance set, will run on all instances in the set.
seed: Seed to run the solver with. Fill with abitrary int in case of: determnistic solver.
cutoff_time: The cutoff time for the solver, measured through RunSolver.: If None, will be executed without RunSolver.

configuration: The solver configuration to use. Can be empty. run_on: Whether to run on slurm or locally. sbatch_options: The sbatch options to use. slurm_prepend: The script to prepend to a slurm script. log_dir: The log directory to use.

Returns:

Solver output dict possibly with runsolver values.

Run the solver from and place the results in the performance dataframe.

This in practice actually runs Solver.run, but has a little script before/after, to read and write to the performance dataframe.

Args:

instance: The instance(s) to run the solver on. In case of an instance set,: or list, will create a job for all instances in the set/list.

config_ids: The config indices to use in the performance dataframe. performance_dataframe: The performance dataframe to use. run_ids: List of run ids to use. If list of list, a list of runs is given

per instance. Otherwise, all runs are used for each instance.

cutoff_time: The cutoff time for the solver, measured through RunSolver. objective: The objective to use, only relevant for train set best config

determining

train_set: The training set to use. If present, will determine the best: configuration of the solver using these instances and run with it on all instances in the instance argument.

sbatch_options: List of slurm batch options to use slurm_prepend: Slurm script to prepend to the sbatch dependencies: List of slurm runs to use as dependencies log_dir: Path where to place output files. Defaults to CWD. base_dir: Path where to place output files. job_name: Name of the job

If None, will generate a name based on Solver and Instances

run_on: On which platform to run the jobs. Default: Slurm.

Returns:

SlurmRun or Local run of the job.

property wrapper: str: Get name of the wrapper file.

property wrapper_extension: str: Get the extension of the wrapper file.

property wrapper_file: Path: Get path of the wrapper file.

structures

This package provides Sparkle’s wrappers for Pandas DataFrames.

class sparkle.structures.FeatureDataFrame(csv_filepath: Path, instances: list[str] = [], extractor_data: dict[str, list[tuple[str, str]]] = {})[source]

Class to manage feature data CSV files and common operations on them.

add_extractor(extractor: str, extractor_features: list[tuple[str, str]], values: list[list[float]] | None = None) → None[source]

Add an extractor and its feature names to the dataframe.

Arguments:: extractor: Name of the extractor extractor_features: Tuples of [FeatureGroup, FeatureName] values: Initial values of the Extractor per instance in the dataframe.

Defaults to FeatureDataFrame.missing_value.

add_instances(instance: str | list[str], values: list[float] | None = None) → None[source]: Add one or more instances to the dataframe.

property extractors: list[str]: Returns all unique extractors in the DataFrame.

property features: list[str]: Return the features in the dataframe.

get_feature_groups(extractor: str | list[str] | None = None) → list[str][source]

Retrieve the feature groups in the dataframe.

Args:

extractor: Optional. If extractor(s) are given,: yields only feature groups of that extractor.

Returns:

A list of feature groups.

get_instance(instance: str) → list[float][source]: Return the feature vector of an instance.

get_value(instance: str, extractor: str, feature_group: str, feature_name: str) → None[source]: Return a value in the dataframe.

has_missing_value() → bool[source]: Return whether there are missing values in the feature data.

has_missing_vectors() → bool[source]: Returns True if there are any Extractors still to be run on any instance.

impute_missing_values() → None[source]: Imputes all NaN values by taking the average feature value.

property instances: list[str]: Return the instances in the dataframe.

property num_features: int: Return the number of features in the dataframe.

remaining_jobs() → list[tuple[str, str, str]][source]

Determines needed feature computations per instance/extractor/group.

Returns:

list: A list of tuples representing (Extractor, Instance, Feature Group).: that needs to be computed.

remove_extractor(extractor: str) → None[source]: Remove an extractor from the dataframe.

remove_instances(instances: str | list[str]) → None[source]: Remove an instance from the dataframe.

reset_dataframe() → bool[source]: Resets all values to FeatureDataFrame.missing_value.

save_csv(csv_filepath: Path | None = None) → None[source]

Write a CSV to the given path.

Args:: csv_filepath: String path to the csv file. Defaults to self.csv_filepath.

set_value(instance: str, extractor: str, feature_group: str, feature_name: str, value: float) → None[source]: Set a value in the dataframe.

sort() → None[source]: Sorts the DataFrame by Multi-Index for readability.

class sparkle.structures.PerformanceDataFrame(csv_filepath: Path, solvers: list[str] | None = None, configurations: dict[str, dict[str, dict]] | None = None, objectives: list[str | SparkleObjective] | None = None, instances: list[str] | None = None, n_runs: int = 1)[source]

Class to manage performance data and common operations on them.

add_configuration(solver: str, configuration_id: str | list[str], configuration: dict[str, Any] | list[dict[str, Any]] | None = None) → None[source]

Add new configurations for a solver to the dataframe.

If the key already exists, update the value.

Args:: solver: The name of the solver to be added. configuration_id: The name of the configuration to be added. configuration: The configuration to be added.

add_instance(instance_name: str, initial_values: Any | list[Any] | None = None) → None[source]

Add and instance to the DataFrame.

Args:: instance_name: The name of the instance to be added. initial_values: The values assigned for each index of the new instance.

If list, must match the column dimension (Value, Seed, Configuration).

add_objective(objective_name: str, initial_value: float | None = None) → None[source]: Add an objective to the DataFrame.

add_runs(num_extra_runs: int, instance_names: list[str] | None = None, initial_values: Any | list[Any] | None = None) → None[source]

Add runs to the DataFrame.

Args:

num_extra_runs: The number of runs to be added. instance_names: The instances for which runs are to be added.

By default None, which means runs are added to all instances.

initial_values: The initial value for each objective of each new run.: If a list, needs to have a value for Value, Seed and Configuration.

add_solver(solver_name: str, configurations: list[str, dict] | None = None, initial_value: float | list[str | float] | None = None) → None[source]

Add a new solver to the dataframe. Initializes value to None by default.

Args:: solver_name: The name of the solver to be added. configurations: A list of configuration keys for the solver. initial_value: The value assigned for each index of the new solver.

If not None, must match the index dimension (n_obj * n_inst * n_runs).

best_configuration(solver: str, objective: SparkleObjective | None = None, instances: list[str] | None = None) → tuple[str, float][source]

Return the best configuration for the given objective over the instances.

Args:: solver: The solver for which we determine the best configuration objective: The objective for which we calculate the best configuration instances: The instances which should be selected for the evaluation
Returns:: The best configuration id and its aggregated performance.

Return the best performance for each instance in the portfolio.

Args:

objective: The objective for which we calculate the best performance instances: The instances which should be selected for the evaluation run_id: The run for which we calculate the best performance. If None,

we consider all runs.

exclude_solvers: List of (solver, config_id) to exclude in the calculation.

Returns:

The best performance for each instance in the portfolio.

best_performance(exclude_solvers: list[str, str] = [], instances: list[str] | None = None, objective: str | SparkleObjective | None = None) → float[source]

Return the overall best performance of the portfolio.

Args:

exclude_solvers: List of (solver, config_id) to exclude in the calculation.: Defaults to none.
instances: The instances which should be selected for the evaluation: If None, use all instances.

objective: The objective for which we calculate the best performance

Returns:

The aggregated best performance of the portfolio over all instances.

clean_csv() → None[source]: Set all values in Performance Data to None.

clone(csv_filepath: Path | None = None) → PerformanceDataFrame[source]

Create a copy of this object.

Args:

csv_filepath: The new filepath to use for saving the object to.: If None, will not be saved. Warning: If the original path is used, it could lead to dataloss!

property configuration_ids: list[str]: Return the list of configuration keys.

Return the (best) configuration performance for objective over the instances.

Args:: solver: The solver for which we determine evaluate the configuration configuration: The configuration (id) to evaluate objective: The objective for which we calculate find the best value instances: The instances which should be selected for the evaluation per_instance: Whether to return the performance per instance,

or aggregated.
Returns:: The (best) configuration id and its aggregated performance.

property configurations: dict[str, dict[str, dict]]: Return a dictionary (copy) containing the configurations for each solver.

filter_objective(objective: str | list[str]) → None[source]: Filter the Dataframe to a subset of objectives.

get_configurations(solver_name: str) → list[str][source]: Return the list of configuration keys for a solver.

get_full_configuration(solver: str, configuration_id: str | list[str]) → dict | list[dict][source]: Return the actual configuration associated with the configuration key.

get_instance_num_runs(instance: str) → int[source]: Return the number of runs for an instance.

get_job_list(rerun: bool = False) → list[tuple[str, str]][source]

Return a list of performance computation jobs there are to be done.

Get a list of tuple[instance, solver] to run from the performance data. If rerun is False (default), get only the tuples that don’t have a value, else (True) get all the tuples.

Args:: rerun: Boolean indicating if we want to rerun all jobs
Returns:: A tuple of (solver, config, instance, run) combinations

get_solver_ranking(objective: str | SparkleObjective | None = None, instances: list[str] | None = None) → list[tuple[str, dict, float]][source]: Return a list with solvers ranked by average performance.

get_value(solver: str | list[str] | None = None, instance: str | list[str] | None = None, configuration: str | None = None, objective: str | None = None, run: int | None = None, solver_fields: list[str] = ['Value']) → float | str | list[Any][source]: Index a value of the DataFrame and return it.

property has_missing_values: bool: Returns True if there are any missing values in the dataframe.

property instances: list[str]: Return the instances as a Pandas Index object.

is_missing(solver: str, instance: str) → int[source]: Checks if a solver/instance is missing values.

marginal_contribution(objective: str | SparkleObjective | None = None, instances: list[str] | None = None, sort: bool = False) → list[float][source]

Return the marginal contribution of the solver configuration on the instances.

Args:: objective: The objective for which we calculate the marginal contribution. instances: The instances which should be selected for the evaluation sort: Whether to sort the results afterwards
Returns:: The marginal contribution of each solver.

mean(objective: str | None = None, solver: str | None = None, instance: str | None = None) → float[source]: Return the mean value of a slice of the dataframe.

property multi_objective: bool: Return whether the dataframe represent MO or not.

property num_instances: int: Return the number of instances.

property num_objectives: int: Retrieve the number of objectives in the DataFrame.

property num_runs: int: Return the maximum number of runs of each instance.

property num_solver_configurations: int: Return the number of solver configurations.

property num_solvers: int: Return the number of solvers.

property objective_names: list[str]: Return the objective names as a list of strings.

property objectives: list[SparkleObjective]: Return the objectives as a list of SparkleObjectives.

remove_configuration(solver: str, configuration: str | list[str]) → None[source]: Drop one or more configurations from the Dataframe.

remove_empty_runs() → None[source]: Remove runs that contain no data, except for the first.

remove_instances(instances: str | list[str]) → None[source]: Drop instances from the Dataframe.

remove_objective(objectives: str | list[str]) → None[source]: Remove objective from the Dataframe.

remove_runs(runs: int | list[int], instance_names: list[str] | None = None) → None[source]

Drop one or more runs from the Dataframe.

Args:

runs: The run indices to be removed. If its an int,: the last n runs are removed. NOTE: If each instance has a different number of runs, the amount of removed runs is not uniform.
instance_names: The instances for which runs are to be removed.: By default None, which means runs are removed from all instances.

remove_solver(solvers: str | list[str]) → None[source]: Drop one or more solvers from the Dataframe.

reset_value(solver: str, instance: str, objective: str | None = None, run: int | None = None) → None[source]: Reset a value in the dataframe.

property run_ids: list[int]: Return the run ids as a list of integers.

save_csv(csv_filepath: Path | None = None) → None[source]

Write a CSV to the given path.

Args:: csv_filepath: String path to the csv file. Defaults to self.csv_filepath.

schedule_performance(schedule: dict[slice(<class 'str'>, dict[slice(<class 'str'>, (<class 'str'>, <class 'str'>, <class 'int'>), None)], None)], target_solver: str | tuple[str, str] | None = None, objective: str | ~sparkle.types.objective.SparkleObjective | None = None) → float[source]

Return the performance of a selection schedule on the portfolio.

Args:

schedule: Compute the best performance according to a selection schedule.: A schedule is a dictionary of instances, with a schedule per instance, consisting of a triple of solver, config_id and maximum runtime.

target_solver: If not None, store the found values in this solver of the DF. objective: The objective for which we calculate the best performance

Returns:

The performance of the schedule over the instances in the dictionary.

Setter method to assign a value to the Dataframe.

Allows for setting the same value to multiple indices.

Args:

value: Value(s) to be assigned. If value is a list, first dimension is: the solver field, second dimension is if multiple different values are to be assigned. Must be the same shape as target.
solver: The solver(s) for which the value should be set.: If solver is a list, multiple solvers are set. If None, all solvers are set.
instance: The instance(s) for which the value should be set.: If instance is a list, multiple instances are set. If None, all instances are set.
configuration: The configuration(s) for which the value should be set.: When left None, set for all configurations
objective: The objectives for which the value should be set.: When left None, set for all objectives
run: The run index for which the value should be set.: If left None, set for all runs.
solver_fields: The level to which each value should be assinged.: Defaults to [“Value”].
append_write_csv: For concurrent writing to the PerformanceDataFrame.: If True, the value is directly appended to the CSV file. This will create duplicate entries in the file, but these are combined when loading the file.

property solvers: list[str]: Return the solver present as a list of strings.

verify_indexing(objective: str, run_id: int) → tuple[str, int][source]

Method to check whether data indexing is correct.

Users are allowed to use the Performance Dataframe without the second and fourth dimension (Objective and Run respectively) in the case they only have one objective or only do one run. This method adjusts the indexing for those cases accordingly.

Args:: objective: The given objective name run_id: The given run index
Returns:: A tuple representing the (possibly adjusted) Objective and Run index.

verify_objective(objective: str) → str[source]

Method to check whether the specified objective is valid.

Users are allowed to index the dataframe without specifying all dimensions. However, when dealing with multiple objectives this is not allowed and this is verified here. If we have only one objective this is returned. Otherwise, if an objective is specified by the user this is returned.

Args:: objective: The objective given by the user

verify_run_id(run_id: int) → int[source]

Method to check whether run id is valid.

Similar to verify_objective but here we check the dimensionality of runs.

Args:: run_id: the run as specified by the user.

tools

Init for the tools module.

class sparkle.tools.PCSConverter[source]

Parser class independent file of notation.

static export(configspace: ConfigurationSpace, pcs_format: PCSConvention, file: Path) → str | None[source]

Exports a config space object to a specific PCS convention.

Args:: configspace: ConfigurationSpace, the space to convert pcs_format: PCSConvention, the convention to conver to file: Path, the file to write to. If None, will return string.
Returns:: String in case of no file path given, otherwise None.

static get_convention(file: Path) → PCSConvention[source]: Determines the format of a pcs file.

static parse(file: Path, convention: PCSConvention | None = None) → ConfigurationSpace[source]: Determines the format of a pcs file and parses into Configuration Space.

static parse_irace(content: list[str] | Path) → ConfigurationSpace[source]: Parses a irace file.

static parse_paramils(content: list[str] | Path) → ConfigurationSpace[source]: Parses a paramils file.

static parse_smac(content: list[str] | Path) → ConfigurationSpace[source]: Parses a SMAC2 file.

static validate(file_path: Path) → bool[source]: Validate a pcs file.

class sparkle.tools.RunSolver[source]

Class representation of RunSolver.

For more information see: http://www.cril.univ-artois.fr/~roussel/runsolver/

static get_measurements(runsolver_values_path: Path, not_found: float = -1.0) → tuple[float, float, float][source]: Return the CPU and wallclock time reported by runsolver in values log.

static get_solver_args(runsolver_log_path: Path) → str[source]: Retrieves solver arguments dict from runsolver log.

static get_solver_output(runsolver_configuration: list[str | Path], process_output: str) → dict[str, str | object][source]: Decode solver output dictionary when called with runsolver.

static get_status(runsolver_values_path: Path, runsolver_raw_path: Path) → SolverStatus[source]: Get run status from runsolver logs.

static wrap_command(runsolver_executable: Path, command: list[str], cutoff_time: int, log_directory: Path, log_name_base: str | None = None, raw_results_file: bool = True) → list[str][source]

Wrap a command with the RunSolver call and arguments.

Args:

runsolver_executable: The Path to the runsolver executable.: Is returned as an absolute path in the output.

command: The command to wrap. cutoff_time: The cutoff CPU time for the solver. log_directory: The directory where to write the solver output. log_name_base: A user defined name to easily identify the logs.

Defaults to “runsolver”.

raw_results_file: Whether to use the raw results file.

Returns:

List of commands and arguments to execute the solver.

class sparkle.tools.SlurmBatch(srcfile: Path)[source]

Class to parse a Slurm batch file and get structured information.

Attributes

sbatch_options: list[str]: The SBATCH options. Ex.: [”–array=-22%250”, “–mem-per-cpu=3000”]
cmd_params: list[str]: The parameters to pass to the command
cmd: str: The command to execute
srun_options: list[str]: A list of arguments to pass to srun. Ex.: [“-n1”, “–nodes=1”]
file: Path: The loaded file Path

sparkle.tools.get_solver_call_params(args_dict: dict, prefix: str = '-', postfix: str = ' ') → list[str][source]

Gather the additional parameters for the solver call.

Args:: args_dict: Dictionary mapping argument names to their currently held values prefix: Prefix of the command line options postfix: Postfix of the command line options
Returns:: A list of parameters for the solver call

sparkle.tools.get_time_pid_random_string() → str[source]

Return a combination of time, Process ID, and random int as string.

Returns:: A random string composed of time, PID and a random positive integer value.

types

This package provides types for Sparkle applications.

class sparkle.types.FeatureGroup(value)[source]: Various feature groups.

class sparkle.types.FeatureSubgroup(value)[source]: Various feature subgroups. Only used for embedding in with feature names.

class sparkle.types.FeatureType(value)[source]

Various feature types.

static with_subgroup(subgroup: FeatureSubgroup, feature: FeatureType) → str[source]: Return a standardised string with a subgroup embedded.

class sparkle.types.SolverStatus(value)[source]

Possible return states for solver runs.

property positive: bool: Return whether the status is positive.

class sparkle.types.SparkleCallable(directory: Path, runsolver_exec: Path | None = None)[source]

Sparkle Callable class.

build_cmd() → list[str | Path][source]: A method that builds the commandline call string.

run() → None[source]: A method that runs the callable.

class sparkle.types.SparkleObjective(name: str, run_aggregator: ~typing.Callable = <function mean>, instance_aggregator: ~typing.Callable = <function mean>, solver_aggregator: ~typing.Callable | None = None, minimise: bool = True, post_process: ~typing.Callable | None = None, use_time: ~sparkle.types.objective.UseTime = UseTime.NO, metric: bool = False)[source]

Objective for Sparkle specified by user.

property stem: str: Return the stem of the objective name.

property time: bool: Return whether the objective is time based.

class sparkle.types.UseTime(value)[source]: Enum describing what type of time to use.

sparkle.types._check_class(candidate: Callable) → bool[source]: Verify whether a loaded class is a valid objective class.

sparkle.types.resolve_objective(objective_name: str) → SparkleObjective[source]

Try to resolve the objective class by (case-sensitive) name.

convention: objective_name(variable-k)?(:[min|max])?(:[metric|objective])? Here, min|max refers to the minimisation or maximisation of the objective and metric|objective refers to whether the objective should be optimized or just recorded.

Order of resolving:: class_name of user defined SparkleObjectives class_name of sparkle defined SparkleObjectives default SparkleObjective with minimization unless specified as max
Args:: name: The name of the objective class. Can include parameter value k.
Returns:: Instance of the Objective class or None if not found.