12 March 2012

102. Gnu Debugger (dgb) on CentOS/ROCKS 5.4.3

For a distro dedicated to HPC ROCKS seems to lack every single debugging tool that I'm familiar with. Here's another one: gdb

 --START HERE --


First compile texinfo:
cd ~/tmp
wget http://ftp.gnu.org/gnu/texinfo/texinfo-4.12.tar.gz
tar -xvf texinfo-4.12.tar.gz
cd texinfo-4.12/
./configure
make
sudo make install


ln -s /usr/loca/bin/makeinfo /usr/bin/makeinfo

Then gdb:
cd ~/tmp
wget http://ftp.gnu.org/gnu/gdb/gdb-7.4.tar.gz
tar -xvf gdb-7.4.tar.gz
cd gdb-7.4/
./configure
make
sudo make install

If you haven't symlinked makeinfo above you'll get errors.

Usage:
gdb programme
(gdb) run arg1 arg2



101. First adventures in ROCKS 5.4.3

I've recently been given access to a 40 core cluster running ROCKS 5.4.3, which is a customised version of CentOS 5.6. The notes below are older than the build instructions that I've recently posted.

They say that if you want to learn Debian, use Debian; if you want to learn CentOS, use CentOS, and if you want to learn linux, build LFS. I can't vouch for the last item (yet...), but I'd definitely agree with the first two statements. CentOS and Debian are just different enough that it takes a while before you find your way around CentOS if you're used to Debian.

Anyway, with the hope that this might be useful to someone in a similar situation:

Installation
The ROCKS installer is crap. There's no way around it.
Anyway, the first time you boot up from the CD or DVD you get this splash screen (this is from an earlier vbox installation):


You better type
build
quickly or you'll end up in a dead window.

There's an annoying question about the fully qualified domain name -- and it won't accept invalid FQDNs  -- which will screw things up later if you want to change it. I'll leave that one as a challenge.

Assuming you typed build, and everything worked ok up to this point (how about a 'back' button?), you get to choose whether to partition manually or automatically -- with debian I always do it manually, because why not?

Well, with ROCKS it took me a number of tries before I got it right -- and if you get it wrong it crashes and YOU HAVE TO START OVER AGAIN. How about having a 'back' button and clearly displaying the minimum requirements in the gparted screen? To be fair, it's mentioned if you read the instructions on the rocksclusters.org website, but who'd do that?

Anyway, it seems that you need, at a minimum:
16GB : /
3.6 GB: /var
The rest of the disk > 4 GB : /state/partition1 OR /export/home
Either seems to work

Keep that in mind if you're making a virtual machine image -- you'll need a pretty darn big one.

Anyway, presuming that everything works out you'll finish the installation and you'll get to your first boot.

First boot:
There are a few things that I don't like about the default setup

Create your locate database
As root:
updatedb

Create a user
By default there's only root -- apart from preferring to gain superuser powers via sudo, we most definitely need to have normal users present too.
adduser verahill
passwd verahill
To log in immediately
su verahill

First time you log in it will create an RSA keypair -- you're asked to set passwords for the keys -- don't confuse that password with your user password (although it can be the same).

Oh, and change those ugly b/w terminal colours to e.g. fg #FCF2F2 and bg #0E0C56 (this is more a hint for my future self)

Give your user superuser powers:
As root
visudo
and add
verahill ALL=(ALL:ALL) ALL

That'll do the trick
 
/etc/fstab
fstab uses labels by default to keep track of partitions. I don't like labels, and I don't like relative paths, when you can use UUID.
LABEL=/                 /                       ext3    defaults        1 1
LABEL=statepartition1       /state/partition1       ext3    defaults        1 2
LABEL=var       /var                    ext3    defaults        1 2

LABEL=SWAP-sda3         swap                    swap    defaults        0 0
and change to
UUID=779c8a5f-db6a-4433-a3e0-eaf4519e14b1                 /                       ext3    defaults        1 1
UUID=82835cfc-8b86-40b3-9412-f908908714be       /state/partition1       ext3    defaults        1 2
UUID=e286acd2-49cd-437b-bb1d-682faacb0628       /var                    ext3    defaults        1 2



To findout the UUIDs, do
 ls /dev/disk/by-uuid/ -lah
and to map the relative paths to the labels do  
ls /dev/disk/by-label/ -lah
I couldn't find the swap uuid, but I'm not too bothered by that. The example in the screenshot is more complex because I'm dualbooting using two physical harddrives.




/boot/grub/menu.lst
Again, a label tells grub where to find the root partition. No good. Change to UUID instead. Also, comment out hiddenmenu and change quiet to splash. It's grub '1', so you don't need to do update-grub or anything like that to make the changes take effect.

 screen
sudo yum install screen
 Just do the usual -- add  the following to /etc/screenrc
multiuser on
acladd verahill
and
sudo chmod +s /usr/bin/screen
sudo chmod 755 /var/run/screen

/etc/network/interfaces etc.
Well, they don't exist. Instead, you should go to /etc/sysconfig/network-scripts/
Each interface is configure by creating a file called ifcfg-ethX
You can set device specific routing using a file called route-ethX -- the route in the screen grab was to make sure that all traffic went via my gateway server.

Just look at the screen grab:

Oh, and it's not sudo service networking restart, it's sudo service network restart.

There's no /etc/hostname, instead it seems that you edit /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=roxy

Also, you edit /etc/hosts.local, not /etc/hosts
192.168.1.111   roxy
192.168.2.111   foxy
(not easy coming up with names when you have 9 wired ifs in the same office)


I edited /etc/resolv.conf and added my DNS hosts directly -- so far, so good.

Enable ipv6
at the moment it seems that sinfo/d requires ipv6 to be enabled.  And by default it isn't -- change your modprobe to this (i.e. comment out anything about ipv6)

alias eth0 r8169
alias scsi_hostadapter sata_nv
alias net-pf-10 off
#alias ipv6 off
#options ipv6 disable=1
alias eth1 forcedeth
Yum
Compared to apt it's more yuck than yum, but each to their own.
It's pretty straightforward:
yum check-updates
yum install screen
yum erase screen
yum provides /screen
etc.

The repos seem to be defined in /etc/yum.conf


chkconfig
There's no rcconf or sysv-rc-conf, but there's chkconfig:
Be aware that run levels are not the same in CentOS as in Debian: http://www.centos.org/docs/5/html/Installation_Guide-en-US/s1-boot-init-shutdown-sysv.html

Typically you'd be in 3 or 5.

/opt
A lot of what you'll need for scientic endeavours is found in /opt IF YOU INSTALLED EVERYTHING FROM THE BEGINNING:
If you didn't, and e.g. installed openmpi by yourself, then it'll be in a completely different place. You'll be using locate a lot...

For some reason nothing's symlinked from /usr/lib and /usr/lib64, so be prepared to be doing a lot of that by hand (see my posts on build nwchem, sinfo and gromacs on centos/rocks)

/etc/profile
you might want to add
export PATH=$PATH:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin



A bit of a restart and you might have a usable system.

100. Compile strace on ROCK 5.4.3

Maybe I've set things up wrong, but I can't find any strace package in the yum repos on my ROCKS 5.4.3 installation.

The compilation is very easy, but I'll show it here for those who feel nervous about compiling their own programmes:

mkdir ~/tmp
cd ~/tmp

The wget takes a while to figure out where to download from -- be patient:
wget http://sourceforge.net/projects/strace/files/latest/download?source=files
unxz strace-4.6.tar.xz
tar -xvf strace-4.6.tar
cd strace-4.6/
./configure
make
sudo make install


How to use:
While I've spent a couple of years with Debian I'm a CentOS newbie, and I keep being confused about the location of the libs -- for my compiles I need to put libs in /usr/lib, but to execute I seem to need to put symlinks in /usr/lib64. strace can help you track where a program is looking for its libs

e.g. to see what the program sinfo is up to
 strace -o sinfo.log sinfo

Here is a snippet from sinfo.log:

open("/lib64/libc.so.6", O_RDONLY)      = 3
open("/usr/local/lib/sinfo/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/opt/openmpi/lib/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib64/librt.so.1", O_RDONLY)     = 3
open("/usr/local/lib/sinfo/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/opt/openmpi/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib64/libdl.so.2", O_RDONLY)     = 3

You can see that it e.g. looks for libdl.so.2 first in /usr/local/lib/sinfo, then in /opt/openmpi/lib/ and finally finds it in /lib64