First Grid job¶
This section summarises all the steps to submit your first job on the Grid, check its status and retrieve the output:
Contents
Warning
You can continue with this guide only after you have completed the preparations for Grid. If you skipped that, go back to the Prerequisites section. Still need help with obtaining or installing your certificate? We can help! Contact us at helpdesk@surfsara.nl.
Once you finish with the First Grid job, you can continue with more advanced topics and also Best practices, the section that contains guidelines for porting real complex simulations on the Grid.
Grid job lifecycle¶
To run your application on the Grid you need to describe its requirements in a specific language called job description language (JDL). This is similar to the information that we need to specify when we run jobs using a batch scheduling system like PBS local jobs, although it is slightly more complex as we are now scheduling jobs across multiple sites.
Except for the application requirements, you also need to specify in the JDL the content of the input/output sandboxes. These sandboxes allow you to transfer data to or from the Grid. The input sandbox contains all the files that you want to send with your job to the worker node, like e.g. a script that you want executed. The output sandbox contains all the files that you want to have transferred back to the UI.
Note
The amount of data that you can transfer using the sandboxes is very limited, in the order of a few megabytes (less than 100MB). This means that you should normally limit the input sandbox to a few script files and the output sandbox to the stderr and stdout files.
Once you have the JDL file ready, you can submit it to multiple clusters with glite-*
commands. The Workload Management System (WMS) will schedule your job on a Grid worker node. The purpose of WMS is to distribute and manage tasks across computing resources. More specifically, the WMS will accept your job, assign it to the most appropriate Computing Element (CE), record the job status and retrieve the output.
The following animations illustrate the Grid lifecycle as described above:
StartGridSession¶
Before submitting your first Grid job, you need to create a proxy from your certificate. This has a short lifetime and prevents you from passing along your personal certificate to the Grid. The job will keep a copy of your proxy and pass it along to the Worker Node.
This section will show you how to create a valid proxy:
Log in to your UI account:
$ssh homer@ui.grid.sara.nl # replace "homer" with your username
Create a proxy with the following command and provide your Grid certificate password when prompted:
$startGridSession lsgrid #replace lsgrid with your VO
Alternatively, you might have to login to a VO group. In that case, the syntax is as follows:
$startGridSession lsgrid:/lsgrid/vo_group #replace both the 'lsgrid' words with your VO and 'vo_group' with the name of your VO group
You should see a similar output displayed in your terminal:
Now starting... Please enter your GRID password: voms-proxy-init -voms lsgrid --valid 168:00 -pwstdin Contacting voms.grid.sara.nl:30018 [/O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl] "lsgrid"... Remote VOMS server contacted successfully. Created proxy in /tmp/x509up_u39111. Your proxy is valid until Tue Jan 11 09:31:56 CET 2016 Your identity: /O=dutchgrid/O=users/O=sara/CN=Homer Simpson Creating proxy ..................................................... Done Proxy Verify OK Your proxy is valid until: Tue Jan 11 09:31:56 2016 A proxy valid for 168 hours (7.0 days) for user /O=dutchgrid/O=users/O=sara/CN=Homer Simpson now exists on px.grid.sara.nl. Your delegation ID is: homer
Note
What does the startGridSession script actually do?
- It generates a local proxy
x509up_uXXX
in the UI/tmp/
directory - It uploads this proxy to Myproxy server
- It delegates the proxy to the WMS with your user name as the delegation ID (DID)
If you want to know more, see the advanced section about Grid authentication.
And now you are ready to submit jobs to the Grid! Or copy data from and to the Grid.
Describe your job in a JDL file¶
To submit a Grid job you must describe this in a plain text file, called JDL. Optionally, you can check the Computing Elements (CEs) that this job may run on. The JDL file will pass the details of your job to the WMS.
Warning
Make sure you have started your session and created already a valid proxy.
Log in to your User Interface.
Create a file with the following content describing the job requirements. Save it as
simple.jdl
:1 2 3 4 5 6 7
Type = "Job"; JobType = "Normal"; Executable = "/bin/hostname"; Arguments = "-f"; StdOutput = "simple.out"; StdError = "simple.err"; OutputSandbox = {"simple.out","simple.err"};
This job involves no large input or output files. It will return to the user the hostname of the Worker Node that the job will land on. This is specified as the StdOutput
file simple.out
declared in the OutputSandbox
statement.
Job list match¶
Before actually submitting the job, you can optionally check the matching Computing Elements that satisfy your job description. It does not guarantee anything about the CE load, just matches your JDL criteria with the available VO resources:
$glite-wms-job-list-match -a simple.jdl # replace simple.jdl with your JDL file
Alternatively, use your delegation ID:
$glite-wms-job-list-match -d homer simple.jdl # replace homer with your delegation id, in this case your login name
Note
The -a
option should not be used frequently. It creates a proxy of your certificate ‘on-the-fly’ when the job is submitted; therefore -a
is quite inefficient when submitting hundreds of jobs.
Your job is now ready. Continue to the next step to submit it to the Grid!
To submit your first Grid job and get an understanding of the job lifecycle, we will perform these steps:
Submit the job to the Grid¶
You should have your simple.jdl
file ready in your UI up to this point. When you submit this simple Grid job to the WMS, a job will be created and sent to a remote Worker Node. There it will execute the command /bin/hostname -f
and write its standard output and its standard error in the simple.out
and simple.err
respectively.
Submit the simple job by typing in your UI terminal this command:
$glite-wms-job-submit -d $USER -o jobIds simple.jdl Connecting to the service https://wms2.grid.sara.nl:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA The job identifier has been saved in the following file: /home/homer/jobIds ==========================================================================
Note the use of -d $USER
to tell your job that it should use your delegated proxy certificate.
The option -o
allows you to specify a file (in this case jobIDs
) to store the unique job identifier:
- You can use this URL identifier to monitor your job from the command line or your browser and to get the job output.
- Note that omitting the
-o
option means that the jobID is not saved in a file. When you do not save this id you will effectively loose the output of your job!
The jobID string looks like this:
$cat jobIds
###Submitted Job Ids###
https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA
Track the job status¶
To check the current job status from the command line, apply the following command that queries the WMS for the status of the job.
After submitting the job, type:
$glite-wms-job-status https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA #replace with your jobID
Alternatively, if you have saved your jobIds into a file you can use the
-i
option and the filename as argument:$glite-wms-job-status -i jobIds
Finally, a third (optional) way to check the job status is with the web browser in which you installed your certificate. In this browser open the jobID link:
https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA #replace with your jobID
Note that the URL can only be accessed by you as you are authenticated to the server with the certificate installed in this browser. If your certificate is not installed in this browser, you will get an authentication error.
Cancel job¶
If you realise that you need to cancel a submitted job, use the following command:
$glite-wms-job-cancel https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA #replace with your jobID
Alternatively, you can use the
jobIds
file:$glite-wms-job-cancel -i jobIds
Retrieve the output¶
The output consists of the files included in the OutputSandbox
statement. You can
retrieve the job output once it is successfully completed, in other words the
job status has changed from RUNNING
to DONE
. The files in the
output sandbox can be downloaded for approximately one week after the job finishes.
Note
You can choose the output directory with the --dir
option. If you do not use this option then the output will be copied under the UI /scratch
directory with a name based on the ID of the job.
To get the output, type:
$glite-wms-job-output https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA #replace with your jobID
Alternatively, you can use the jobIDs file:
$glite-wms-job-output --dir . -i jobIds
where you should substitute jobIds
with the file that you used to store the
job ids.
If you omitted the --dir
option, your output is stored on the
/scratch
directory on the UI. Please remove your files from the
/scratch
directory when they are no longer necessary. Also keep in
mind that if the /scratch
directory becomes too full, the
administrators remove the older files until enough space is available
again.
Check job output¶
To check your job output, browse into the downloaded output directory. This includes the
simple.out
,simple.err
files specified in theOutputSandbox
statement:$ls -l /home/homer/homer_JIVYfkMxtnRFWweGsx0XAA/ -rw-rw-r-- 1 homer homer 0 Jan 5 18:06 simple.err -rw-rw-r-- 1 homer homer 20 Jan 5 18:06 simple.out $cat /home/homer/homer_JIVYfkMxtnRFWweGsx0XAA/simple.out # displays the hostname of the Grid worker node where the job landed wn01.lsg.bcbr.uu.nl
Recap & Next Steps¶
Congratulations! You have just executed your first job to the Grid!
Let’s summarise what we’ve seen so far.
You interact with the Grid via the UI machine ui.grid.sara.nl
. You describe each job in a JDL (Job Description Language) file where you list which program should be executed and what are the worker node requirements. From the UI, you create first a proxy of your Grid certificate and submit your job with glite-*
commands. The resource broker, called WMS (short for Workload Management System), accepts your jobs, assigns them to the most appropriate CE (Computing Element), records the jobs statuses and retrieves the output.
This is a short overview of the commands needed to handle simple jobs:
startGridSession | startGridSession lsgrid |
submit job | glite-wms-job-submit -d $USER -o jobIds simple.jdl |
job status | glite-wms-job-status -i jobIds |
cancel job | glite-wms-job-cancel -i jobIds |
retrieve job output | glite-wms-job-output --dir . -i jobIds |
See also
Try now to port your own application to the Grid. Check out the Best practices section and run the example that suits your use case. The section Advanced topics will help your understanding for several Grid modules used in the Best practices.
Done with the General, but not sure how to proceed? We can help! Contact us at helpdesk@surfsara.nl.