Interesting talk about the hardware testing that CERN does both at burn in and also routinely. A lot of these things such as fsprobe, SMART, inventory and memory tests are ran routinely on the boxes I run so it is good to here a description of what they are doing.
Thursday, 26 April 2007
A summary of the WLCG system admin group was given. Some of the things they are up to include starting a wiki to collect tips and scripts together that people are working on. e.g torque, maui and cfengine recipes. It is a general Cookbook of ideas. There is also a subversion repository and again uses gridsite like the wiki does. There are currently 9 scripts in the repository and more are needed. The need more volunteers which is exactly why publicising at events like HEPiX.
Monday, 23 April 2007
Brookhaven presented that they were running batch jobs in XEN machines on there compute farm. The aim is that they also serve the storage space within dCache or xrootd and want to protect this storage from batch jobs that crash the machine.
One intresting idea from PSI. Rather than having a single SSH gateway they have a gateway that once you are logged into allows you at the firewall to login to any machine. Much better, going through gateways is always a pain.
Friday, 20 April 2007
Two new roles have now been added to the dteam voms service. These are ftsadmin and ftsmaster. They are associated with regions, the first say /dteam/uki/Role=ftsadmin would contain a list of people able to edit FTS channels around the globe with RAL as endpoint. We still need to check that it really works on the FTS but it is meant to.
A long meeting this morning about improving the reporting that the FTS is able to give. The addition of summary tables from triggers for instance. But also looking at historical information for the states of the system and individual services. The followup for me is to check what the information is that is going to be recorded and then considering what is useful or missing for admins in particular.
Thursday, 19 April 2007
Wednesday, 18 April 2007
I'm happier about how the FTS prod and pilot services are now represented in CDB. The pilot is no longer a modified prod service and is defined in its own right. This should remove some problems which are not really there for any general FTS installation.
Tuesday, 17 April 2007
Tried and failed a lot today to install a "package" on the FTS oracle servers to do some statistics collection. It is now done but due to the intervention of others. I need to learn some Oracle and fast if I don't want to feel helpless a few times a week as is the case today.
Big discussion today about some gLite middleware that has started all be it two years ago opening up services in user space on the WN so refreshed proxies can be injected from the CE. I'm convinced myself this is a bad idea and it is something we have to stop. Lots of discussion and some reluctance to actually do it but I think soon it will be okay.
Monday, 16 April 2007
Looks like I will now be attending the EMT meeting twice a week. This is good since I will be able to get a good handle on what is coming up in the release. Of course it is two extra meetings a week but it should be useful information that I listen to.
On the FTS I've set up the reporting package, I have to wait really till 08:00 tomorrow to check if it works which is a little boring of course. If it works then I'll look into running three instances of it for the different FTSes but it looks like it is easy to do but only time will tell of course.
Friday, 13 April 2007
Not the most productive of tasks but is friday afternoon after all. Have moved all my bookmarks to del.ico.us. Hopefully it pays of, seems like a good idea though. Big advantage is that bookmarks have tag's rather than being in a directory structure.
Thursday, 12 April 2007
Spent an age trying to mark pages as secret so only a small group can see the pages before we publish them. It is easy to do but an odd thing which makes sense is you then can't search the pages with a formatted search since the search is anonymous. Of course it makes sense but....
Wednesday, 11 April 2007
Wrote a a new page FtsVomsRoles detailing how VOMS roles might be used with the FTS. The FTS already supports this so we just need a schema for actually using them. Needs a bit of central management but still a lot better than what we had before.
Wrote a small micky mouse script to parse the FTS call.logs. Intrestingly it showed that in the last day for some users they are submitting up to 3000 times as many status requests as they are submitting transfers which seems a little excessive. To be followed up.
Tuesday, 10 April 2007
Report of high load of the FTS webservers. It is clear they are busy but there is no script or log file written to do a quick analysis of what the queries are. So it is time to start one. The log format completely changes for release 2.0 so no point spending any time on this.
Started the empty documentation pages for what will be the FTS 2.0 release. There is a lot of information in the old release pages. The challenge is making use of it while not including things that are plain wrong.
Had a very first go connecting to Oracle with the cx_Oracle python module. It is easy of course like everything once you find the correct level of web page to help you do it. It is needed for some of the FTS monitoring that is being done.
I was GMOD last week over the easter period, fairly quiet other than a couple of announcements. The gmod phone did ring but I missed the call, checking the CERN status within SAM it was obvious that something was up since all the queues were in state queuing. I raised a ticket with ce.support and although it was not them who called there was an obvious problem. A cron had been left in place to drain the CERN farm which had been wanted the previous week.
Thursday, 5 April 2007
Have been working hard with FtsWlcg < LCG < TWiki the homepage of the FTS for the WLCG service. A lot of the pages had fallen way out of date, learnt a lot of twiki in the process including how to do templates, search for particular pages related to a page and many other things. It is looking a lot better but there are still things that need to be done for sure.
As per usual having been away a week for a wedding party a very busy day of catchup. Highlights included.
- Attended 9.00 sysadmin meeting.
- Attended 10.00 WLCG meeting.
- Attended 1.30 meeting with Remi to go over CDB, quattor and things.
- Attended 2.30 meeting with LHCb people looking into using MAUI with Dirac.
- Attended 3.00 meeting of sysadmins.
- Attended 4.30 meeting of EMT.