Example 01 - Simple

The simplest example is one job type, where each task has a single thread.

#Filename - preprocess.sh
#HPC jobname=preprocess
#HPC commands_per_node=1

Here is a birds eye view of a simple example.

Submission and output directory structure

Example 01-1

Job execution

Example 01-2

Examine the output

You can examine the Full Jupyterhub Notebook notebook here, and it is included below.

HPC Runner Submission

Submission file

%%bash

cat example_001/preprocess.sh
#HPC jobname=preprocess
#HPC commands_per_node=1
#HPC walltime=00:00:30
#HPC module=gencore/1 gencore_dev
echo "preprocess sample1" && sleep 30 
echo "preprocess sample2" && sleep 30 
echo "preprocess sample3" && sleep 30
echo "preprocess sample4" && sleep 30
echo "preprocess sample5" && sleep 30
echo "preprocess sample6" && sleep 30

Submit to Slurm

hpcrunner.pl submit_jobs --infile preprocess.sh
[2016/11/03 08:50:03] Beginning to submit jobs to the scheduler 
[2016/11/03 08:50:03] Schedule is preprocess 
[2016/11/03 08:50:03] Submitting all preprocess job types 
[2016/11/03 08:50:04] Submitted batch job 23162

[2016/11/03 08:50:04] Submited job /scratch/gencore/nov_dalma_training/example_001/hpc-runner/scratch/001_preprocess.sh
        With Slurm jobid 23162 
[2016/11/03 08:50:04] There are 6 batches for job type preprocess

Directory Structure

%%bash

tree example_001
example_001
├── hpc-runner
│   ├── logs
│   │   ├── 2016-11-03-001_preprocess
│   │   │   ├── 2016-11-03-CMD_001-PID_13774.md
│   │   │   ├── 2016-11-03-CMD_002-PID_13778.md
│   │   │   ├── 2016-11-03-CMD_003-PID_13776.md
│   │   │   ├── 2016-11-03-CMD_004-PID_16555.md
│   │   │   ├── 2016-11-03-CMD_005-PID_16553.md
│   │   │   ├── 2016-11-03-CMD_006-PID_16551.md
│   │   │   └── MAIN_2016-11-03.log
│   │   └── 2016-11-03-hpcrunner_logs
│   │       ├── 001_preprocess.log
│   │       └── 001-process_table.md
│   └── scratch
│       ├── 001_preprocess_001.in
│       ├── 001_preprocess_002.in
│       ├── 001_preprocess_003.in
│       ├── 001_preprocess_004.in
│       ├── 001_preprocess_005.in
│       ├── 001_preprocess_006.in
│       └── 001_preprocess.sh
└── preprocess.sh

5 directories, 17 files

Task Log Output

Each individual task gets its own output file. The structure is date of submission, jobtype, date of executition, task count, and processID.

%%bash

cat example_001/hpc-runner/logs/2016-11-03-001_preprocess/2016-11-03-CMD_001-PID_13774.md
2016/11/03 08:50:11: INFO Starting Job: 1 
Cmd is echo "preprocess sample1" && sleep 30 
 2016/11/03 08:50:11: INFO preprocess sample1
 2016/11/03 08:50:41: INFO Finishing job 1 with ExitCode 0
 2016/11/03 08:50:41: INFO Total execution time 0 years, 00 months, 0 days, 00 hours, 00 minutes, 30 seconds

Slurm Log Output

Additionally, all output from the scheduler is logged. This is useful when debugging submissions. If, for instance, we had mistyped a module name, submitted to the wrong queue, or requested impossible resources, this would be recorded here.

%%bash

cat example_001/hpc-runner/logs/2016-11-03-hpcrunner_logs/001_preprocess.log
Module 'gencore/1' is already loaded
Module 'gencore/1' is already loaded
Module 'gencore/1' is already loaded
Module 'gencore/1' is already loadedModule 'gencore/1' is already loadedModule 'gencore/1' is already loaded

Process Table Output

The process table is a table for the whole job. It records version ( more on this later), the Slurm scheduler ID, the Slurm jobname, any task tags, and process pid, the exit code and the duration.

It is highly recommended to record this table in a project management tool.

%%bash

ls example_001/hpc-runner/logs/2016-11-03-hpcrunner_logs/001-process_table.md
example_001/hpc-runner/logs/2016-11-03-hpcrunner_logs/001-process_table.md
Version Scheduler Id Jobname Task Tags ProcessID ExitCode Duration
0.0 23162 001_preprocess 16551 0 0 years, 00 months, 0 days, 00 hours, 00 minutes, 30 seconds
0.0 23167 001_preprocess 16553 0 0 years, 00 months, 0 days, 00 hours, 00 minutes, 30 seconds
0.0 23163 001_preprocess 13774 0 0 years, 00 months, 0 days, 00 hours, 00 minutes, 30 seconds
0.0 23166 001_preprocess 16555 0 0 years, 00 months, 0 days, 00 hours, 00 minutes, 30 seconds
0.0 23165 001_preprocess 13776 0 0 years, 00 months, 0 days, 00 hours, 00 minutes, 30 seconds
0.0 23164 001_preprocess 13778 0 0 years, 00 months, 0 days, 00 hours, 00 minutes, 30 seconds

results matching ""

    No results matching ""