# Platform ## File structure The platform automatically generates a file structure for both input and output upon initialisation. (dir-instances)= ### Instance directory The instance directory has the following structure: ``` Instances/ Example_Instance_Set/ instance_a.cnf instance_b.cnf ... ... instance_z.cnf ``` Each directory under the Instances directory represents an `Instance Set` and each file is considered an instance. Note that if your dataset is a single file, it will be considered a single instance in the set. For instances consisting of multiple files one additional file called `instances.csv` should be included in the `Example_Instance_Set` directory, describing which files together form an instance. The format is a single instance per line with each file separated by a space, as shown below. ``` instance_name_a instance_a_part_one.abc ... instance_a_part_n.xyz instance_name_b instance_b_part_one.abc ... instance_b_part_n.xyz ... ... instance_name_z instance_z_part_one.abc ... instance_z_part_n.xyz ``` (dir-solvers)= ### Solver Directory The solver directory has the following structure: ``` Solver/ Example_Solver/ sparkle_solver_wrapper.py parameters.pcs ... ``` The `sparkle_solver_wrapper.py` is a wrapper that Sparkle should call to run the solver with specific settings, and then returns a result for the configurator. In `parameters.pcs` the configurable parameters are described in the PCS format. Finally, when importing your Solver into Sparkle, a binary executable of the runsolver tool `runsolver` is added. This allows Sparkle to make fair time and computational cost measurements for all configuration experiments. This same structure holds up for all other executables we refer to as `SparkleCallable` in the Sparkle package, such as Feature Extractors, which are placed in the `Extractor` directory. ### The output directory The output directory is located at the root of the Sparkle directory. Its structure is as follows: ``` Output/ Logs/ commandname_timestamp/ log files Configuration/ configurator/ Raw_Data/ configuration_scenario/ related files Analysis/ Parallel_Portfolio/ Raw_Data/ related files Analysis/ Selection/ selector/ solver_scenario/ related files Analysis/ ``` The `Logs` directory should contain the history of commands and their output such that one can easily know what has been done in which order and find enough pointers to debug unwanted behaviour. Other directories are cut into two subdirectories: `Raw_Data` contains the data produced by the main command, often time consuming to generate, handle with care; `Analysis` contains information extracted from the raw data, easy to generate, plots and reports. For each type of task run by Sparkle, the `related files` differ. The aim is always to have all required files for reproducibility. A copy of the sparkle configuration file at the time of the run and of all files relevant to the run, a copy of any log or error file that could help with debugging or a link to it, and the output of the executed task. *For configuration* the configuration trajectory if available, the training and testing sets, the default configuration and the final found configuration. The performance of those will be in the Analysis folder. *For parallel portfolio* the resulting portfolio and its components. The performance of the portfolio will be in the Analysis folder. *For selection* the algorithms and their performance on the training set, the model(s) generated if available and the resulting selector. The performance evaluation of the selector will be in the Analysis folder. *For analysis* a link to the folder on which the analysis was performed (configuration, portfolio or selection), the performance evaluation from it and the report if it was generated. ### Other directories There are a few other special directories automatically generated by Sparkle. - __Reference_Lists__: Here Sparkle keeps track of user-defined aliases - __Snapshots__: Here Sparkle places your saved snapshots - __Tmp__: Here temporary files are placed that are generated during commands, but should also be removed during the command - Output/__Feature_Data__: Here Sparkle unifies all known/added Feature Extractors, the Instances and their features if calculated. When an extractor or instance is removed, they are also removed here. - Output/__Performance_Data__: Here Sparkle unifies all known/added Solvers, the Instances and their recorded objectives if known. When a solver or instance is removed, they are also removed here. (settings)= ## Platform Settings Most settings can be controlled through the `Settings` directory, specifically the `Settings/sparkle_settings.ini` file. Possible settings are summarised per category in {ref}`settings-details`. For any settings that are not provided the defaults will be used. Meaning, in the extreme case, that if the settings file is empty (and nothing is set through the command line) everything will run with default values. For convenience after every command `Settings/latest.ini` is written with the used settings. Here any overrides by commandline arguments are reflected. This can, for instance, provide the same settings to the next command in a chain. e.g. for `generate_report` after `configure_solver`. The used settings are also recorded in the relevant `Output/` subdirectory. Note that when writing settings Sparkle always uses the name, and not an alias. ```{note} When overriding settings in `sparkle_settings.ini` with the commandline arguments, this is considered as 'temporary' and only denoted in the latest_settings, but does not actually affect the values in sparkle_settings.ini ``` ### Example `sparkle_settings.ini` This is a short example to show the format. ``` [general] objective = PAR10 target_cutoff_time = 60 [configuration] number_of_runs = 25 [slurm] number_of_runs_in_parallel = 25 ``` When initialising a new platform, the user is provided with a default settings file, which can be viewed [here](https://raw.githubusercontent.com/ADA-research/Sparkle/main/sparkle/Components/sparkle_settings.ini). (sparkle-objective)= ### Sparkle Objectives To define an objective for your algorithms, you can define them in the `general` section of your `Settings.ini` like the following: ``` [general] objective = PAR10,loss,accuracy:max,train_loss:metric ``` In the above example we have defined three objectives: Penalised Average Runtime, the loss function value of our algorithm on the task, and the accuracy of our algorithm on the task. Note that objectives are by default assumed to be _minimised_ and we must therefore specify `accuracy`_`:max`_ to clarifiy this. Furthermore, you may have certain objectives that you wish to record, but not actually have configurators and algorithms use as an objective. For this we can specificy `train_loss`_`:metric`_, letting the platform now this value will be present but must not be passed as an optimisable objective. The platform predefines for the user three objectives: cpu time, wallclock time and memory. These objectives will always recorded next to whatever the user may choose. ```{note} Although the Platform supports multiple objectives to be registered for any Solver, not all used components, such as SMAC and Ablation Analysis, support Multi-Objective optimisation. In any such case, the first defined objective is considered the most important and used in these situations ``` Moreover, when aggregating an objective over various dimensions, Sparkle assumes the following: - When aggregating multiple Solvers (Algorithms), we aggregate by taking the minimum/maximum value. - When aggregating multiple runs on the same instances, we aggregate by taking the mean. - When aggregating multiple instances, we aggregate by taking the mean. It is possible to redefine these attributes for your specific objective. The platform looks for a file called `objective.py` in your Settings directory of the platform, and reads your own object definitions. These definitions can either add new objectives to the platform, but also can overwrite existing definitions in the library. E.g. when creating an objective definition with the same name of one that already exists in the library, the user definiton simply overrules the library definition. Note that there are a few constraints and details: - The objective must inherit from the `SparkleObjective` class - The classnames are constrained to the format of alphabetical letters followed by numericals - The objective can be parametrised by an integer, such as `PAR` followed by `10` is interpreted as instantiating the `PAR` class with argument `10` - If your objective is defined over time, you can indicate this using the `UseTime` enum, see the {ref}`types module ` ### Slurm Slurm settings can be specified in the `Settings/sparkle_settings.ini` file. Any setting in the Slurm section not internally recognised by Sparkle will be added to the `sbatch` or `srun` calls. It is advised to overwrite the default settings specific to your cluster, such as the option "--partition" with a valid value on your cluster. Also, you might have to adapt the default "--mem-per-cpu" value to your system. For example, your Slurm section in the `sparkle_settings.ini` could look like: ``` [slurm] partition = CPU mem-per-cpu = 6000 ... time = 25:00 ``` **Discouraged options** Currently these settings are inserted *as is* in any Slurm calls done by Sparkle. This means that any options exclusive to one or the other currently should not be used. The options below are exclusive to `sbatch` and are thus discouraged: - `-–array` - `-–clusters` - `-–wrap` The options below are exclusive to `srun` and are thus discouraged: - `-–label` #### Prepending to Slurm Jobs In case you have specific scripts that need to be executed before running your job, such as activation of environments, you can specify this in the slurm section like: ``` [slurm] ... job_prepend = echo $JOB_ID ``` In case that you have a multi line script, write it down as a file in the Settings directory, for example "slurm_prepend.sh" and reference it like: ``` [slurm] ... job_prepend = Settings/slurm_prepend.sh ``` (settings-details)= ### Options and possible values #### \[general\] `objective` > aliases: `objective` > > values: `str`, comma seperated for multiple > > description: The type of objectives Sparkle considers, see {ref}`Sparkle Objective section ` for more. --- `configurator` > aliases: `configurator` > > values: `SMAC2` > > description: The name of the Configurator class implementation to use. Currently only supports SMAC2. --- `selector_class` > aliases: `selector_class` > > values: Class. > > Description: The ASF Algorithm selector class to use. --- `selector_model` > aliases: `selector_model` > > values: Model. > > Description: The sklearn model to use for algorithm selection. --- `solution_verifier` > aliases: N/A > > values: `{NONE, SAT}` > > note: Only available for SAT solving. --- `target_cutoff_time` > aliases: `cutoff_time_each_solver_call` > > values: integer > > description: The time a solver is allowed to run before it is terminated. --- `extractor_cutoff_time` > aliases: `cutoff_time_each_feature_computation` > > values: integer > > description: The time a feature extractor is allowed to run before it is terminated. In case of multiple feature extractors this budget is divided equally. --- `run_on` > aliases: `run_on` > > values: `LOCAL`, `SLURM` > > description: On which compute to run the jobs on. --- `verbosity` > aliases: `verbosity` > > values: `QUIET`, `STANDARD` > > description: The verbosity level of Sparkle when running CLI. --- `check_interval` > aliases: `check_interval` > > values: int > > description: Specifically for the Wait command. The amount of seconds to wait in between refreshing the wait information. --- #### \[configuration\] `wallclock_time` > aliases: `wallclock_time` > > values: integer > > description: The wallclock time one configuration run is allowed to use for finding configurations. --- `cpu_time` > aliases: `cpu_time` > > values: integer > > description: The cpu time one configuration run is allowed to use for finding configurations. --- `solver_calls` > aliases: `solver_calls` > > values: integer > > description: The number of solver calls one configuration run is allowed to use for finding configurations. --- `number_of_runs` > aliases: `number_of_runs` > > values: integer > > description: The number of separate configurations runs. --- `target_cutoff_length` > aliases: `smac_each_run_cutoff_length` > > values: `{max}` (other values: whatever is allowed by SMAC) --- #### \[slurm\] `number_of_jobs_in_parallel` > aliases: `num_job_in_parallel` > > values: integer > > description: The number of jobs runs that can run in parallel. --- `max_parallel_runs_per_node` > aliases: `clis_per_node` > > values: integer > > description: The number of parallel processes that can be run on one compute node. In case a node has 32 cores and each solver uses 2 cores, the `max_parallel_runs_per_node` is at most 16. --- #### \[ablation\] `racing` > aliases: `ablation_racing` > > values: boolean > > description: Use racing when performing the ablation analysis between the default and configured parameters --- #### \[parallel_portfolio\] `check_interval` > aliases: `check_interval` > > values: int > > description: How many seconds the parallel portfolio waits to check whether jobs have completed. Decreasing the amount increases the accuracy of the report but also significantly increases computational load. --- `num_seeds_per_solver` > aliases: `num_seeds_per_solver` > > values: int > > description: Only relevant for undeterministic solvers. The amount of solvers that will be started with a random seed. --- ### Priorities Sparkle has a large flexibility with passing along settings. Settings provided through different channels have different priorities as follows: - Default - Default values will be overwritten if a value is given through any other mechanism; - File – Settings form the `Settings/sparkle_settings.ini` overwrite default values, but are overwritten by settings given through the command line; - Command line Settings file – Settings files provided through the command line, overwrite default values and other settings files. - Command line - Settings given through the command line overwrite all other settings, including settings files provided through the command line. - Configurators - Each configurator has its own option section and these values will take precedence of any value set in the general configurator section. ## Reporting packages The platform depends on the following user supplied packages to generate its reports: - `pdflatex` - `latex` - `bibtex`