Monday 23 March 2009

Installed Capacity at CHEP

This week is CHEP 09 proceeded by WLCG workshop. I presented some updates on the roll out of the installed capacity document. It included examples of a few sites that would have zero capacity if considered under the new metrics.
Sites should consider taking the following actions.

  • Check gridmap. In particular the view obtained by clicking on the more label and selecting the "size by SI00 and LogicalCPUs".
  • Adjust your published #LogicalCPUS in your SubCluster. It should correspond to the number of computing cores that you have.
  • Adjust your #Specint2000 settings in the SubCluster. The aim is to make your gridmap box the correct size to represent your total site power.
The followup questions were the following. Now a chance for a a more reflected response.
  1. Will there be any opportunity to run a benchmark within the gcm framework?
    I answered that this was not possible since unless it could be executed in under 2 seconds then there was no room for it. Technically there would not be a problem with running something for longer, it could be ran rarely. We should check how the first deployment of GCM goes, longer tests are in no way planned though.
  2. What is GCM collecting and who can see its results?
    Currently no one can see on the wire since messages are encrypted. There should be a display at https://gridops.cern.ch/gcm however currently it is down but once there it will be accessible to IGTF CA members. For now there are some test details available.
  3. When should sites start publishing the HEPSpecInt2006 benchmark?
    The management contributed "Now" which is of course correct, the procedure is well established. Sites should be in the process of measuring their clusters with the HEPSpec06 bench mark. With the next YAIM release they will be able to publish the value also.
  4. If sites are measuring these benchmarks can they the values be made available on the worker nodes to jobs?
    Recently the new glite-wn-info made it as far as the PPS service. This allows the job to find on the WN to which GlueSubCluster it belongs. In principal this should be enough, the Spec benchmarks can be retrieved from the GlueSubClusters. The reality of course is that until some future date when all the WNWG recommendations are deployed along with CREAM also then this is not possible. So for now I will extend glite-
    wn-info to also return a HepSpec2006 value as configured by the site administrators.
  5. Do you know how many sites are currently publishing incorrect data?
    I did not know the answer nor is an answer easy other than collecting the ones of zero size. Checking now of 498 (468 unique?) SubClusters some 170 of them have zero LogicalCPUs.
On a more random note a member of CMS approached me afterwards to thank me for the support I gave him 3 or so years ago while working at RAL. At the time we both had an interest in making grid work. He got extra queues, reserved resources, process dumps and general job watching from me. It was the fist grid jobs we had approaching something similar to the analysis we now face. Quoting the gentleman from his grid experience and results using RAL he obtained his doctorate and CMS chose to use the grid.