Using mkwind to manage jobs#
Following the installation tutorial from mkwind, we can use the client command to manage jobs in the local computing resource. The mkwind daemon offers three main functionalities:
build
, which parses information for jobs to be executed from the engine and builds it locally on the filesystem, according to the configuration file.run
, which takes jobs which have been built and submits them to the queue.postprocess
, which takes the completed jobs, archives the raw files, and submits the results to the queue.
The integration between mkwind, the engine, and the mkite database is shown in the following figure:
In this tutorial, we will use each of the commands to manage jobs in a client worker.
Job structure#
As mentioned in previous sections, the mkwind
daemons do not have access to the production database.
Instead, they monitor an engine implemented by the mkite_engines
package.
Therefore, jobs come from an external source to be executed locally.
However, creating a job for local execution also requires a local engine.
In this case, the mkite_engines.LocalEngine
allows the use of filesystem directories as an additional engine.
This creates a few folders in a selected path that will be monitored for job building, running, and post-processing.
Building jobs#
As explained in the quickstart guide, information to build jobs can be created using a JobInfo object.
This information can be passed around as a simple JSON file, and is often what is found in an instance of mkite_engines
.
Instead of setting up an engine, in this example we will create a JobInfo very similar to the one in the quickstart, and use it to build a job to be submitted to an HPC scheduler or similar.
First, create a JSON file named jobinfo.json
that contains the same information as the one from the tutorial:
{
"job": {"uuid": "75879e52-4bc6-4623-9f91-2721db0aa7c9"},
"recipe": {"name": "conformer.generation"},
"inputs": [
{"smiles": "Cn1c(=O)c2c(ncn2C)n(C)c1=O"}
],
"options": {"force_field": "mmff"}
}
Then, make sure your config file for mkwind can build a job of recipe conformer.generation
.
One example of a configuration file named cluster1.yaml
would be:
default:
nodes: 1
tasks_per_node: 36
walltime: 24:00:00
partition: pbatch
account: acct
pre_cmd: |
source $HOME/.bashrc
source $HOME/envs/mkite/bin/activate
cmd: kite run
post_cmd: |
touch mkwind-complete
conformer.generation:
nodes: 1
tasks_per_node: 1
walltime: 30:00
partition: pdebug
cmd: kite run
Furthermore, if you do not have a configuration file for your system, you can create one following the instructions in the configuring guide:
MAX_PENDING: 1
MAX_RUNNING: 1
MAX_READY: 1
SCHEDULER: slurm
LOG_PATH: ${_self_}/jobs
BUILD_CONFIG: ${_self_}/../clusters/cluster1.yaml
ENGINE_EXTERNAL: ${_self_}/../engines/global.yaml
ENGINE_LOCAL: ${_self_}/../engines/local.yaml
ENGINE_ARCHIVE: ${_self_}/../engines/archive.yaml
As an example, the settings above will build a job using the template for SLURM and the builder_config.yaml
configuration file.
This means the conformer.generation
job will be created using one node, one task per node, walltime of 30 min, and with the partition debug
.
Although the job could be built directly from an engine, we can use the build_one
command to perform the same action for a given JobInfo file.
First, create the jobs
folder. Then, execute the build_one
command:
mkdir jobs
wind build_one -i jobinfo.json -s settings.yaml -d jobs
The command will build the job and create a folder called jobs/ready
, where the job and its scripts will be placed.
If a local engine had been set up (see section above), the job would have been placed directly on the folder that contains all jobs to be executed (thus, “ready”).
Inside this folder, another folder will be created containing the name of the recipe (conformer.generation
), the UUID of the job, and a timestamp.
The folder contains a job.sh
script that should look like the following:
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=30:00
#SBATCH --partition=pdebug
#SBATCH --account=acct
source $HOME/.bashrc
source $HOME/envs/mkite/bin/activate
kite run
touch mkwind-complete
The job has now been created and is ready for submission.
If a single job is required, the job.sh
file can be submitted with sbatch
.
Otherwise, the format is compatible with mkwind’s run
command, which we will use in the following section.
Note
The only difference between the build_one
and the build
command is that the latter runs a loop that monitors the engine and the folder where jobs will be run.
The build
command also ensures that the folder containing ready
jobs is never filled with jobs, which would make job distribution uneven.
Tip
The number of jobs that can be waiting to be submitted in the ready
state is given in the mkwind settings with the MAX_READY
entry.
Running jobs#
After a job has been built, it can be submitted for running.
The wind run
command monitors the ready
folder and the local job scheduler.
If deployed in a login node of an HPC cluster, the run
daemon will monitor the pending jobs in the queue, and how many can be submitted for execution.
Using the example configuration files above, we can create a runner daemon that uses the local folder specified at engines/local.yaml
for job management.
This configuration file could be similar to:
_module: mkite_engines.local
root_path: $HOME/jobs
move: True
Now, you can execute the run
daemon as:
wind run -s settings.yaml -l 60
This will create a folder $HOME/jobs
where your jobs will be placed and monitored.
Furthermore, the daemon will be executed every 60 seconds to check whether more spaces in the queue are available.
If you want the daemon to run only once, execute it with a --sleep
or -l
argument equal or smaller than zero:
wind run -s settings.yaml -l 0
Note
In principle, you can add any job to the $HOME/jobs/ready
folder and mkwind will take care of executing it for you.
The only requisites are:
The job is inside a folder
The job folder contains a file called
job.sh
Although the job will be executed, it may not be able to be postprocessed by mkwind if it does not follow the jobresults.json
and jobinfo.json
schemas.
Running locally#
As local workstations often do not have a scheduler, the daemon does not check which resources are available when running jobs locally.
To bypass this problem, one can install a local scheduler like pueue, which offers excellent support for local job management.
Then, using mkwind to run jobs locally is just a matter of changing the settings.yaml
to use the pueue
daemon:
SCHEDULER: pueue
Warning
Do not forget to run your pueued
daemon to enable mkwind to interact with it.
Postprocessing jobs#
Once jobs are done, one can postprocess jobs executed with mkite’s JobInfo schema and send the JobResults to the engine. From there, the results can be parsed into mkite without the interference of the client worker.
Postprocessing jobs requires parsing the jobs from a queue with status done
, archiving the raw files, and sending the results to an engine.
Using the configuration files above, you can run the example job of conformer generation using mkwind
.
Then, with the settings in place, run the postprocess
command:
wind postprocess -s settings.yaml -l 0
This command will:
Retrieve the job from the
done
folder (created by the local engine of mkwind’srun
daemon)Process the results
Archive the result into an
archive
folder (also in the same directory, given the configuration file above)Send the results to a
parsing
folder
You can check the contents of all folders and verify that the parsing
folder has one JSON file corresponding to a serialized JobResults that can be parsed into the mkite database.
Note
If the engine is different from the local one, the JobResults will be pushed to that engine. For example, if the selected engine was the Redis engine, we would see the result in the Redis database.