Thursday 26 April 2007

Increasing Reliability by Testing

Interesting talk about the hardware testing that CERN does both at burn in and also routinely. A lot of these things such as fsprobe, SMART, inventory and memory tests are ran routinely on the boxes I run so it is good to here a description of what they are doing.

SL3 Security Updates

Looks like SL3 will be maintained with vendor's security updates until 2010. This is good news for EGEE who are behind the previous published end of life for the release of the SL4 service nodes.

WLCG System Admin Group

A summary of the WLCG system admin group was given. Some of the things they are up to include starting a wiki to collect tips and scripts together that people are working on. e.g torque, maui and cfengine recipes. It is a general Cookbook of ideas. There is also a subversion repository and again uses gridsite like the wiki does. There are currently 9 scripts in the repository and more are needed. The need more volunteers which is exactly why publicising at events like HEPiX.

Monday 23 April 2007

Sharepoint and TWiki

According to the CERN status report the use of Microsoft's sharepoint instead of the CERN twiki. Not something that I would ever like to see.

HEPiX Goes Green

Seems there has quite a theme of building new computer centers around the globe at HEP sites. But interesting is that many of them include the use of waste heat from the machine rooms to heat the office space.

Another XEN advantage for Batch Workers

Brookhaven presented that they were running batch jobs in XEN machines on there compute farm. The aim is that they also serve the storage space within dCache or xrootd and want to protect this storage from batch jobs that crash the machine.

PSI Site Report - Gateway Machine.

One intresting idea from PSI. Rather than having a single SSH gateway they have a gateway that once you are logged into allows you at the firewall to login to any machine. Much better, going through gateways is always a pain.

HEPiX at DESY

This week I'm in Hamburg at a HEPiX meeting. The meeting where the sysadmins from the large labs get together. I'll be writing notes on many of the presentations here.

Friday 20 April 2007

VOMS Groups for the FTS

Two new roles have now been added to the dteam voms service. These are ftsadmin and ftsmaster. They are associated with regions, the first say /dteam/uki/Role=ftsadmin would contain a list of people able to edit FTS channels around the globe with RAL as endpoint. We still need to check that it really works on the FTS but it is meant to.

FTS Status Monitoring

A long meeting this morning about improving the reporting that the FTS is able to give. The addition of summary tables from triggers for instance. But also looking at historical information for the states of the system and individual services. The followup for me is to check what the information is that is going to be recorded and then considering what is useful or missing for admins in particular.

Thursday 19 April 2007

FTSreport Yaim Component

Made some commits today to the a new yaim component for the FTS report server.

In particular this will be the first update I've submitted to an EGEE release for a long time. A good feeling even if it is a trivial change.

IT-GD-OPS Group Meeting

Attended the grid operations meeting with the new format of groups rather than individuals reporting. It made it a lot better and I was not desperate to get out as has been the case before.

Wednesday 18 April 2007

Another FTS CDB Cleanup

I'm happier about how the FTS prod and pilot services are now represented in CDB. The pilot is no longer a modified prod service and is defined in its own right. This should remove some problems which are not really there for any general FTS installation.

CDB Access Controls

After a request for the voms admins I looked into how CDB enforces access controls on templates. It is a simple system with its own commands within CDB to edit the ACLs. Very easy once you know how.

Tuesday 17 April 2007

FTS and Oracle

Tried and failed a lot today to install a "package" on the FTS oracle servers to do some statistics collection. It is now done but due to the intervention of others. I need to learn some Oracle and fast if I don't want to feel helpless a few times a week as is the case today.

Ports on WNs

Big discussion today about some gLite middleware that has started all be it two years ago opening up services in user space on the WN so refreshed proxies can be injected from the CE. I'm convinced myself this is a bad idea and it is something we have to stop. Lots of discussion and some reluctance to actually do it but I think soon it will be okay.

Scribefire an Blogger

For some reason Scribefire stopped working with Blogger today, don't know why just some odd DOM error.

Monday 16 April 2007

Sindes and SSH

Showed another GD admin how to do the upload download ssh key trick in quattor. Its a shame the rest of the world does not do this.

EMT Meetings

Looks like I will now be attending the EMT meeting twice a week. This is good since I will be able to get a good handle on what is coming up in the release. Of course it is two extra meetings a week but it should be useful information that I listen to.

FTS Reports Setup

On the FTS I've set up the reporting package, I have to wait really till 08:00 tomorrow to check if it works which is a little boring of course. If it works then I'll look into running three instances of it for the different FTSes but it looks like it is easy to do but only time will tell of course.

Friday 13 April 2007

Reorganise my Bookmarks

Not the most productive of tasks but is friday afternoon after all. Have moved all my bookmarks to del.ico.us. Hopefully it pays of, seems like a good idea though. Big advantage is that bookmarks have tag's rather than being in a directory structure.

Twiki Pages for FTS 2.0

I've now made a complete copy of the FTS1.5 to become the FTS2.0 pages for the upcoming release. Needs the wise men to look over them all now.

Thursday 12 April 2007

Twiki Experiments

Spent an age trying to mark pages as secret so only a small group can see the pages before we publish them. It is easy to do but an odd thing which makes sense is you then can't search the pages with a  formatted search since the search is anonymous. Of course it makes sense but....

IT-GD Section Meeting

Much discussion about how to improve the current round table, it will be reported just verbally as group reports. Some discussions about GMOD as well which was inconclusive.

Wednesday 11 April 2007

LSF LCG CEs

Had to work out how to drain an LSF CE today at CERN. A trivial things for me in PBS , in fact the LSF info provider looks to be a lot better written. I guess it was written by a sysadmin rather than some random developer.

ce101 Reported as Faulty

Had a report in that ce101 was faulty here at CERN. Starting the draining process so someone else can sort it out. It is great not actually having to do the harddisk but my self :-)

VOMS Roles and the FTS

Wrote a a new page FtsVomsRoles detailing how VOMS roles might be used with the FTS. The FTS already supports this so we just need a schema for actually using them. Needs a bit of central management but still a lot better than what we had before.

Who is Hitting the FTS

Wrote a small micky mouse script to parse the FTS call.logs. Intrestingly it showed that in the last day for some users they are submitting up to 3000 times as many status requests as they are submitting transfers which seems a little excessive. To be followed up.

Tuesday 10 April 2007

High Load on FTS Webserver

Report of high load of the FTS webservers. It is clear they are busy but there is no script or log file written to do a quick analysis of what the queries are. So it is time to start one. The log format completely changes for release 2.0 so no point spending any time on this.

FTS Documentation for the 20 Release

Started the empty documentation pages for what will be the FTS 2.0 release. There is a lot of information in the old release  pages. The challenge is making use of it while not including things that are plain wrong.

cx_Oracle Module

Had a very first go connecting to Oracle with the cx_Oracle python module. It is easy of course like everything once you find the correct level of web page to help you do it. It is needed for some of the FTS monitoring that is being done.

End of GMOD

I was GMOD last week over the easter period, fairly quiet other than a couple of announcements. The gmod phone did ring but I missed the call, checking the CERN status within SAM it was obvious that something was up since all the queues were in state queuing. I raised a ticket with ce.support and although it was not them who called there was an obvious problem. A cron had been left in place to drain the CERN farm which had been wanted the previous week.

Thursday 5 April 2007

FTS TWiki Pages Review



Have been working hard with FtsWlcg < LCG < TWiki the homepage of the FTS for the WLCG service. A lot of the pages had fallen way out of date, learnt a lot of twiki in the process including how to do templates, search for particular pages related to a page and many other things. It is looking a lot better but there are still things that need to be done for sure.

Green Plates Progress

Seven months are starting at CERN there looks to be some progress with my green plates, they should be ready next week or rather I will be able to go to a garage and get them made.

Adding the VOMS manager to Quattor

Going through the many steps at the moment with Remi so that he can install boxes and configured them with CDB, swrep, etc. Seems to hard to do this a bit and not give access for Remi to do everything.

Small FTS Update

Updated the FTS T2 service to use a different oracle string, same database different way of accessing. I need to learn some more oracle.

Trying out Scribefire.

This is a blogging tool integrated into firefox.

First day Back with a Meeting bang.

As per usual having been away a week for a wedding party a very busy day of catchup. Highlights included.
  • Attended 9.00 sysadmin meeting.
  • Attended 10.00 WLCG meeting.
  • Attended 1.30 meeting with Remi to go over CDB, quattor and things.
  • Attended 2.30 meeting with LHCb people looking into using MAUI with Dirac.
  • Attended 3.00 meeting of sysadmins.
  • Attended 4.30 meeting of EMT.
In between cleared up my inbox and processed 900 new mails.

Crank It Up

After sometime and following me trying to remember what I did for an appraisal since starting at CERN I must keep this blog up to date.