First Grid job

This section summarises all the steps to submit your first job on the Grid, check its status and retrieve the output:

Warning

You can continue with this guide only after you have completed the preparations for Grid. If you skipped that, go back to the Prerequisites section. Still need help with obtaining or installing your certificate? We can help! Contact us at helpdesk@surfsara.nl.

Once you finish with the First Grid job, you can continue with more advanced topics and also Best practices, the section that contains guidelines for porting real complex simulations on the Grid.

Grid job lifecycle

To run your application on the Grid you need to describe its requirements in a specific language called job description language (JDL). This is similar to the information that we need to specify when we run jobs using a batch scheduling system like PBS local jobs, although it is slightly more complex as we are now scheduling jobs across multiple sites.

Except for the application requirements, you also need to specify in the JDL the content of the input/output sandboxes. These sandboxes allow you to transfer data to or from the Grid. The input sandbox contains all the files that you want to send with your job to the worker node, like e.g. a script that you want executed. The output sandbox contains all the files that you want to have transferred back to the UI.

Note

The amount of data that you can transfer using the sandboxes is very limited, in the order of a few megabytes (less than 100MB). This means that you should normally limit the input sandbox to a few script files and the output sandbox to the stderr and stdout files.

Once you have the JDL file ready, you can submit it to multiple clusters with glite-* commands. The Workload Management System (WMS) will schedule your job on a Grid worker node. The purpose of WMS is to distribute and manage tasks across computing resources. More specifically, the WMS will accept your job, assign it to the most appropriate Computing Element (CE), record the job status and retrieve the output.

The following animations illustrate the Grid lifecycle as described above:

StartGridSession

Before submitting your first Grid job, you need to create a proxy from your certificate. This has a short lifetime and prevents you from passing along your personal certificate to the Grid. The job will keep a copy of your proxy and pass it along to the Worker Node.

This section will show you how to create a valid proxy:

  • Log in to your UI account:

    $ssh homer@ui.grid.sara.nl # replace "homer" with your username
    
  • Create a proxy with the following command and provide your Grid certificate password when prompted:

    $startGridSession lsgrid  #replace lsgrid with your VO
    

    Alternatively, you might have to login to a VO group. In that case, the syntax is as follows:

    $startGridSession lsgrid:/lsgrid/vo_group #replace both the 'lsgrid' words with your VO and 'vo_group' with the name of your VO group
    

    You should see a similar output displayed in your terminal:

    Now starting...
    Please enter your GRID password:
    voms-proxy-init -voms lsgrid --valid 168:00 -pwstdin
    Contacting voms.grid.sara.nl:30018 [/O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl] "lsgrid"...
    Remote VOMS server contacted successfully.
    
    Created proxy in /tmp/x509up_u39111.
    
    Your proxy is valid until Tue Jan 11 09:31:56 CET 2016
    Your identity: /O=dutchgrid/O=users/O=sara/CN=Homer Simpson
    Creating proxy ..................................................... Done
    Proxy Verify OK
    Your proxy is valid until: Tue Jan 11 09:31:56 2016
    A proxy valid for 168 hours (7.0 days) for user /O=dutchgrid/O=users/O=sara/CN=Homer Simpson now exists on px.grid.sara.nl.
    Your delegation ID is: homer
    

Note

What does the startGridSession script actually do?

  • It generates a local proxy x509up_uXXX in the UI /tmp/ directory
  • It uploads this proxy to Myproxy server
  • It delegates the proxy to the WMS with your user name as the delegation ID (DID)

If you want to know more, see the advanced section about Grid authentication.

And now you are ready to submit jobs to the Grid! Or copy data from and to the Grid.

Describe your job in a JDL file

To submit a Grid job you must describe this in a plain text file, called JDL. Optionally, you can check the Computing Elements (CEs) that this job may run on. The JDL file will pass the details of your job to the WMS.

Warning

Make sure you have started your session and created already a valid proxy.

  • Log in to your User Interface.

  • Create a file with the following content describing the job requirements. Save it as simple.jdl:

    1
    2
    3
    4
    5
    6
    7
    Type = "Job";
    JobType = "Normal";
    Executable = "/bin/hostname";
    Arguments = "-f";
    StdOutput = "simple.out";
    StdError = "simple.err";
    OutputSandbox = {"simple.out","simple.err"};
    

This job involves no large input or output files. It will return to the user the hostname of the Worker Node that the job will land on. This is specified as the StdOutput file simple.out declared in the OutputSandbox statement.

Job list match

Before actually submitting the job, you can optionally check the matching Computing Elements that satisfy your job description. It does not guarantee anything about the CE load, just matches your JDL criteria with the available VO resources:

$glite-wms-job-list-match -a simple.jdl # replace simple.jdl with your JDL file

Alternatively, use your delegation ID:

$glite-wms-job-list-match -d homer simple.jdl # replace homer with your delegation id, in this case your login name

Note

The -a option should not be used frequently. It creates a proxy of your certificate ‘on-the-fly’ when the job is submitted; therefore -a is quite inefficient when submitting hundreds of jobs.

Your job is now ready. Continue to the next step to submit it to the Grid!

To submit your first Grid job and get an understanding of the job lifecycle, we will perform these steps:

Submit the job to the Grid

You should have your simple.jdl file ready in your UI up to this point. When you submit this simple Grid job to the WMS, a job will be created and sent to a remote Worker Node. There it will execute the command /bin/hostname -f and write its standard output and its standard error in the simple.out and simple.err respectively.

  • Submit the simple job by typing in your UI terminal this command:

    $glite-wms-job-submit -d $USER -o jobIds simple.jdl
    
    Connecting to the service https://wms2.grid.sara.nl:7443/glite_wms_wmproxy_server
    ====================== glite-wms-job-submit Success ======================
    The job has been successfully submitted to the WMProxy
    Your job identifier is:
    
    https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA
    
    The job identifier has been saved in the following file:
    /home/homer/jobIds
    ==========================================================================
    

Note the use of -d $USER to tell your job that it should use your delegated proxy certificate.

The option -o allows you to specify a file (in this case jobIDs) to store the unique job identifier:

  • You can use this URL identifier to monitor your job from the command line or your browser and to get the job output.
  • Note that omitting the -o option means that the jobID is not saved in a file. When you do not save this id you will effectively loose the output of your job!

The jobID string looks like this:

$cat jobIds

 ###Submitted Job Ids###
 https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA

Track the job status

To check the current job status from the command line, apply the following command that queries the WMS for the status of the job.

  • After submitting the job, type:

    $glite-wms-job-status https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA #replace with your jobID
    
  • Alternatively, if you have saved your jobIds into a file you can use the -i option and the filename as argument:

    $glite-wms-job-status -i jobIds
    
  • Finally, a third (optional) way to check the job status is with the web browser in which you installed your certificate. In this browser open the jobID link:

Note that the URL can only be accessed by you as you are authenticated to the server with the certificate installed in this browser. If your certificate is not installed in this browser, you will get an authentication error.

Cancel job

  • If you realize that you need to cancel a submitted job, use the following command:

    $glite-wms-job-cancel https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA #replace with your jobID
    
  • Alternatively, you can use the jobIds file:

    $glite-wms-job-cancel -i jobIds
    

Retrieve the output

The output consists of the files included in the OutputSandbox statement. You can retrieve the job output once it is successfully completed, in other words the job status has changed from RUNNING to DONE. The files in the output sandbox can be downloaded for approximately one week after the job finishes.

Note

You can choose the output directory with the --dir option. If you do not use this option then the output will be copied under the UI /scratch directory with a name based on the ID of the job.

  • To get the output, type:

    $glite-wms-job-output https://wms2.grid.sara.nl:9000/JIVYfkMxtnRFWweGsx0XAA #replace with your jobID
    
  • Alternatively, you can use the jobIDs file:

    $glite-wms-job-output --dir . -i jobIds
    

where you should substitute jobIds with the file that you used to store the job ids.

If you omitted the --dir option, your output stored on the /scratch directory on the UI. Please remove your files from the /scratch directory when they are no longer necessary. Also keep in mind that if the /scratch directory becomes too full, the administrators remove the older files until enough space is available again.

Check job output

  • To check your job output, browse into the downloaded output directory. This includes the simple.out, simple.err files specified in the OutputSandbox statement:

    $ls -l /home/homer/homer_JIVYfkMxtnRFWweGsx0XAA/
    
    -rw-rw-r-- 1 homer homer  0 Jan  5 18:06 simple.err
    -rw-rw-r-- 1 homer homer 20 Jan  5 18:06 simple.out
    
    $cat /home/homer/homer_JIVYfkMxtnRFWweGsx0XAA/simple.out # displays the hostname of the Grid worker node where the job landed
    wn01.lsg.bcbr.uu.nl
    

Recap & Next Steps

Congratulations! You have just executed your first job to the Grid!

Let’s summarise what we’ve seen so far.

You interact with the Grid via the UI machine ui.grid.sara.nl. You describe each job in a JDL (Job Description Language) file where you list which program should be executed and what are the worker node requirements. From the UI, you create first a proxy of your Grid certificate and submit your job with glite-* commands. The resource broker, called WMS (short for Workload Management System), accepts your jobs, assigns them to the most appropriate CE (Computing Element), records the jobs statuses and retrieves the output.

This is a short overview of the commands needed to handle simple jobs:

startGridSession startGridSession lsgrid
submit job glite-wms-job-submit -d $USER -o jobIds simple.jdl
job status glite-wms-job-status -i jobIds
cancel job glite-wms-job-cancel -i jobIds
retrieve job output glite-wms-job-output --dir . -i jobIds

See also

Try now to port your own application to the Grid. Check out the Best practices section and run the example that suits your use case. The section Advanced topics will help your understanding for several Grid modules used in the Best practices.

Done with the General, but not sure how to proceed? We can help! Contact us at helpdesk@surfsara.nl.