This is a new version of my now python script for monitoring the HLRN III.
I am usually on the Hannover partition, but it should work perfectly in Berlin too.
In this version, I keep all the features of the cluster version except the “gossip” section for checking on who is calculating what and how many resources are they using, since a supercomputer has too many users and we would need a terminal of a couple of square meters.
now version 0.5 specially modified for working on HLRN III supercomputer in Hannover and Berlin.
I already shared my now script intended for a small “in home” computing cluster. This time, I am sharing two other variations of that script, designed for the queuing systems of two different supercomputers: Mare Nostrum at BSC, Juropa at JSC and HLRN.
Both of them are the predecessors of the newer now script, and they only have the very basic features, but if you need an script that works right out of the box for any of the supercomputers mentioned above, this is the quick solution.
Performing job monitoring tasks while running jobs on a cluster/supercomputer can be done by several tools (such as qstat, showstart, …). Unfortunately the most of these standard tools are made by computer scientist for computer scientists, but now, there is an alternative: now.
At some point in my life, I started to get sick of qstats, greps, awks, … because it takes a couple of seconds every time one has to write them. If we multiply these seconds by the number of times I need them and by the number of computational members in my group, we get enough time to prepare a couple of new inputs, or write some post/comments in my blog.
So, I wrote my own job monitoring/visualization script, based on some ideas of my friend Iñaki during my time in the theoretical chemistry group in Donostia. That script, initially did nothing but execute these programs and display ONLY the information I need, and ALL the information I need. This information includes: Continue reading
Sometimes, specially in small computational groups, it is usual that the people uses the workstations for running calculations, and in case of a “trusted” network, were all the colleagues can log-in to each other’s computers, it is sometimes also the case, that everybody calculates everywhere. But at the moment of submitting a calculation, if your own PC is already busy, how do I find a computer in my network, which is not running anything at the moment?
You could ssh to each host and run <it>top</it> or something like that, but that is really slow when you have a lot of computers, and you have vs. to repeat the task very often.
The solution: “checkpc”
checkpc is a python script which can scan a whole network of computers and return the load of all PCs within it in less than 3 seconds. Continue reading