now for HLRN III

This is a new version of my now python script for monitoring the HLRN III.
I am usually on the Hannover partition, but it should work perfectly in Berlin too.

In this version, I keep all the features of the cluster version except the “gossip” section for checking on who is calculating what and how many resources are they using, since a supercomputer has too many users and we would need a terminal of a couple of square meters.

Screenshot of now HLRN III

now version 0.5 specially modified for working on HLRN III supercomputer in Hannover and Berlin.

As it can be appreciated on the picture, it contains:

  • Job ID. , so the one for cancelling/holding)
  • Number of nodes
  • Estimation of waiting time until the job starts calculating (red) or reaches the walltime (green)
  • path to output
  • Notification on finished jobs
  • Job history for the last 14 days (displays last day by default)
  • Input/Output performance check* for detecting cases where i.e. our MPI code falls into some “live lock” and does not produce results anymore, but keeps wasting CPU time

(* It contains a little hack for the time, because apparently they are not synchronizing the NTP (time) of the front-ends with the nodes, and the results of the calculations are marked as modified 18 seconds in the future)

You can download the script from my GitHub repository: Source code for HLRN III version of now.

If you are using HLRN III, you might just make a symbolic link to my scripts directory located at:


/home/h/hbcjulen/Scripts

Happy calculating

2 thoughts on “now for HLRN III

  1. Sebastian

    Hey Julen,

    the now script on github has some minor format errors and the code is doubled after line 200. Besides that it works fine. thx

    Reply
    1. larrucea Post author

      Hey Sebastian,
      Arrh… you know… copying and pasting on slow computers… 😛
      It is fixed now. Thanks 🙂

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *