Solution to VM missing from Virtual Machine Manager

I restarted a standalone KVM host running RHEL6 and when I opened Virtual Machine Manager, my guest virtual machine wasn't listed at all.

Eek!

Turns out I had been messing around with the XML definition in /etc/libvirt/qemu/machinename.xml (I know, I know...consider my hand slapped) and broke it. Fortunately /var/log/libvirt/libvirtd.log was kind enough to let me know what went wrong:

error : catchXMLError:653 : at line 33: Opening and ending tag mismatch: source line 32 and disk

I freaked out again after I asked virsh to list my virtual machines and the domain was not listed -- it was an empty list:

# virsh list
Id Name State
----------------------------------

Turns out it's just inactive, since it couldn't start up, since I broke the configuration file.

# virsh list --inactive
Id Name State
----------------------------------
- machinename.example.com shut off

I repaired the XML configuration and made sure the the VM was set to autostart on reboot:

# virsh autostart machinename.example.com
Domain machinename.example.com marked as autostarted

Adrenaline level dropping fast.

[ Submitted by John on Wed, 2012-04-18 12:57. | | ]

Using RHEL6 to share RAID volume via iSCSI: the Mystery of the Missing LUN

My use case was pretty simple. I wanted to share out a raw device via iSCSI to a nearby host on the 172.16.2.x network.

In addition to a minimal Red Hat Enterprise Linux 6 (or equivalent) install, a few packages are needed:

# yum install -y iscsi-initiator-utils scsi-target-utils sg3_utils lsof

I knew the device I wanted to share was /dev/sdb by looking at the output from dmesg:

# dmesg | grep sd
...
sd 0:0:1:0: [sdb] 7812456448 512-byte logical blocks: (3.99 TB/3.63 TiB)
...

The target definition in /etc/tgt/targets.conf was simple as well. Isn't everything simple after you've beaten your head against a wall for hours trying to get it working? The following configuration defines one target with one LUN shared as a raw device which may only be connected to by IP address 172.16.2.2:


Starting up the tgtd service and turning it on permanently led to wonderment and success:

# service tgtd start
# chkconfig tgtd on

Depending on your configuration, you may need to open port 3260 in your firewall.

However, after a reboot, only the controller showed up (as LUN 0). LUN 1 had disappeared!

# tgtadm --lld iscsi --op show --mode target
Target 1: iqn.2012-04.com.example:sharename
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET    00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Readonly: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
    Account information:
    ACL information:
        172.16.2.2

Why is LUN 1 not showing up? Telling tgt-admin to reparse targets.conf in verbose mode leads to the reason:

# tgt-admin --update ALL -v
# Removing target: iqn.2012-04.com.example:sharename
tgtadm -C 0 --mode target --op delete --tid=1
# Adding target: iqn.2012-04.com.example:sharename
tgtadm -C 0 --lld iscsi --op new --mode target --tid 1 -T iqn.2012-04.com.example:sharename
# Device /dev/sdb is used by the system (mounted, used by swap?).
# Skipping device /dev/sdb - it is in use.
# You can override it with --force or 'allow-in-use yes' config option.
# Note - do so only if you know what you're doing, you may damage your data.

What? I assure you that /dev/sdb is NOT in use by the system. Show me mounts:

mount

Nope, /dev/sdb does not appear anywhere in the output. Show me a list of open files on /dev/sdb:

lsof | grep sdb

Nothing. Show me active swap:

# swapon -s

Nothing! Finally, choirboy in the #rhel irc channel pointed me to the answer: you must create a filter in /etc/lvm/lvm.conf so that LVM leaves the device alone. Appropriate section of lvm.conf, showing new filter:

    # By default we accept every block device:
    #filter = [ "a/.*/" ]
    # Every block device except /dev/sdb, that is.
    filter = [ "r|/dev/sdb|" ]

After a restart, LUN 1 persists and the world is once again a happy place to be:

# tgtadm --lld iscsi --op show --mode target
Target 1: iqn.2012-04.com.example:sharename
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET    00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Readonly: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
        LUN: 1
            Type: disk
            SCSI ID: IET    00010001
            SCSI SN: 9206CBBG71194900CF07
            Size: 3999978 MB, Block size: 512
            Online: Yes
            Removable media: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/sdb
            Backing store flags:
    Account information:
    ACL information:
        172.16.2.2

[ Submitted by John on Mon, 2012-04-16 15:23. | | ]

RHEV 3.0 Firewall Annotated iptables Configuration for Netfilter

When Red Hat Enterprise Virtualization Manager for Servers is installed, it offers to configure iptables for you:

...
Firewall ports need to be opened.
You can let the installer configure iptables automatically overriding the current configuration. The old configuration will be backed up.
Alternately you can configure the firewall later using an example iptables file found under /usr/share/rhevm/conf/iptables.example
...

Here's an annotated version of what the RHEVM installer will give you:

# ssh
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT

# XBAP clients for Administration Portal
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 8006 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 8007 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 8008 -j ACCEPT

# Web interface to Administrator Portal
-A RH-Firewall-1-INPUT -m state --state NEW -p tcp --dport 8080 -j ACCEPT
# Web interface to Administrator Portal (SSL)
-A RH-Firewall-1-INPUT -m state --state NEW -p tcp --dport 8443 -j ACCEPT

# Portmapper (rpcbind on RHEL6)
-A RH-Firewall-1-INPUT -m state --state NEW -p udp --dport 111 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -p tcp --dport 111 -j ACCEPT

# mountd; NFS MOUNTD_PORT (defined in /etc/sysconfig/nfs)
-A RH-Firewall-1-INPUT -m state --state NEW -p udp --dport 892 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -p tcp --dport 892 -j ACCEPT

# rquotad; NFS RQUOTAD_PORT (defined in /etc/sysconfig/nfs)
-A RH-Firewall-1-INPUT -m state --state NEW -p udp --dport 875 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -p tcp --dport 875 -j ACCEPT

# NFS STATD_PORT (defined in /etc/sysconfig/nfs)
-A RH-Firewall-1-INPUT -m state --state NEW -p udp --dport 662 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -p tcp --dport 662 -j ACCEPT

# nfsd for nfs and nfs_acl
-A RH-Firewall-1-INPUT -m state --state NEW -p tcp --dport 2049 -j ACCEPT

# nlockmgr; NFS LOCKD_TCPPORT (defined in /etc/sysconfig/nfs)
-A RH-Firewall-1-INPUT -m state --state NEW -p tcp --dport 32803 -j ACCEPT

# NFS LOCKD_UDPPORT (defined in /etc/sysconfig/nfs)
-A RH-Firewall-1-INPUT -m state --state NEW -p udp --dport 32769 -j ACCEPT

[ Submitted by John on Fri, 2012-03-02 11:40. | | ]

Why Your KVM Network Bridge Isn't Working

You're trying to get libvirt and KVM working on Red Hat Enterprise Linux 6 or CentOS 6, or maybe even Scientific Linux 6. But it's not going well.

You wanted your VMs to have full access to the network and you've discovered that virbr0 doesn't do that. Finally you stumbled upon the way to do it by Creating a Network Bridge using your primary interface. And yet, something just ain't right.

You've verified that bridge-utils is in fact present:

# rpm -q bridge-utils
bridge-utils-1.2-9.el6.x86_64

Telltale signs are this message during network startup:

Device bridge0 does not seem to be present, delaying initialization.

And the following entries in /var/log/messages:

/sys/devices/virtual/net/bridge0: couldn't determine device driver; ignoring...

and

ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-bridge0 ...
NetworkManager[1802]: ifcfg-rh: error: Bridge connections are not yet supported

You've stared at your /etc/sysconfig/network-scripts/ifcfg-bridge0 file until your eyes hurt, yet nothing seems to be wrong:

DEVICE="bridge0"
TYPE="bridge"
BOOTPROTO="none"
ONBOOT="yes"
IPADDR="1.2.3.4.5"
NETMASK="255.255.255.0"
DELAY=0
GATEWAY="1.2.3.4.254"
DNS1="x.x.x.x"

I'm here to tell you: you forgot to capitalize the word Bridge in your TYPE entry.

DEVICE="bridge0"
TYPE="Bridge"
BOOTPROTO="none"
ONBOOT="yes"
IPADDR="1.2.3.4.5"
NETMASK="255.255.255.0"
DELAY=0
GATEWAY="1.2.3.4.254"
DNS1="x.x.x.x"

Have a nice day!

[ Submitted by John on Wed, 2012-02-29 15:54. | | ]

Solved: Renaming em1 to eth0 on Red Hat Enterprise Linux 6

We had a software package that had a braindead licensing scheme. To generate the license, it uses the MAC address of the network interface card of the machine you are running it on. OK, that's a way of identifying a unique machine. But here's the kicker. It just assumes that your ethernet device is /dev/eth0.

For a while now, NICs that are embedded on the motherboard are identified by udev as em1, em2, etc. This is part of an attempt to make interface naming more predictable and meaningful.

So, how to get em1 renamed to eth0? Here's what worked for me. I should emphasize that I had access to the console, so when ethernet was down I could still access the box.

0. I've been burned enough times to do this out of habit: make a backup of /etc/grub.conf, retaining SELinux info:

# cp --preserve=context /etc/grub.conf /etc/grub.bak

1. Add biosdevname=0 to the kernel boot arguments in /etc/grub.conf.

2. Rename /etc/sysconfig/network-scripts/ifcfg-em1 to /etc/sysconfig/network-scripts/ifcfg-eth0, changing the line

DEVICE="em1"

to

DEVICE="eth0"

3. Delete /etc/udev/rules.d/70-persistent-net.rules

4. Reboot.

Presto, I get eth0 for the former em1 and the rest of the NICs on this Dell R715 are still em2, em3, em4.

Reference:

How to Still Use ethX on Fedora 15
Consistent Network Device Naming
Nicnaming - Solving it with Biosdevname

[ Submitted by John on Thu, 2012-02-23 13:56. | | ]

Solved: Mystery DHCP Requests from Dell R815 Broadcom NeXtreme II NICs

Listen, my friends, and I will tell you a tale of mystery MAC addresses, DCHP, and Broadcom woes.

It all started when we got a new server. A Dell R815, to be exact. The specs looked great, it could hold a ton of RAM for some of our bioinformatics projects, and the cost was low. Everyone was happy.

Then one day someone called and said, hey, there are a ton of DHCP requests and we've traced them back to [room that server is in]. Here's the MAC address: 002xxxxxxxd1.

This seemed weird to me because we had set up static IPs on the server's two live NICs, both on the primary NIC and on the iDRAC enterprise NIC. To my knowledge, there were no NICs set up with DHCP.

Even more weird was the fact that the MAC address 002xxxxxxxd1 did not actually exist. The R815 had five NICs, numbered like this:

002xxxxxxxd0 (NIC 1)
002xxxxxxxd2 (NIC 2)
002xxxxxxxd4 (NIC 3)
002xxxxxxxd6 (NIC 4)
002xxxxxxxd8 (iDRAC)

We pored over the NIC configurations in Windows 2008 R2 server and its virtual switch and virtual machines (since it was running the Hyper-V role). Nothing. Meanwhile, our sniffer showed that we were not dreaming...indeed, DHCP requests were emanating from that MAC address.

We pulled the network cable and the DHCP requests stopped. We pulled out our hair and ceremoniously waved a Windows 98 boot disk over the server -- we keep one around for just these sorts of occasions.

Finally we turned up something...in the iDRAC remote management screen we found a list of MAC addresses which included the mysterious one we were looking for.

I'll cut to the chase. The Broadcom NeXtreme II 5906 NICs have iSCSI capability built-in. The embedded iSCSI HBA has an IPv4 configuration setting that defaults to DHCP. This setting was on by default, and means that DHCP requests were emanating from the server, receiving a 10.11.x.x address.

The DHCP requests cannot be disabled without installing the BroadCom Managed Applications Control Suite and then following the instructions here.

It is very probable that I'm not very bright and other people expect DHCP requests to come from the iSCSI hardware on their NICs by default. Just thought I'd share. And I note that this is only happens on Windows, not Linux on the same hardware.

tl;dr DHCP requests are issued by default on Broadcom NeXtreme II network adapters on Windows servers unless you download a special tweaker to turn them off, and it took me a while to figure this out.

Reference: Broadcom iSCSI offload engine defaults to DHCP enabled (IBM, 2011-03-03)

[ Submitted by John on Thu, 2011-12-15 16:00. | | ]

Slides from DrupalCamp Iowa 2011

Here are the slides from my morning presentation at yesterday's "DrupalCorn" DrupalCamp in Des Moines, Iowa. The presentation was designed to be a brief high-level overview of Drupal.

Introduction to Drupal 7 Architecture (PDF, 1.8MB)

[ Submitted by John on Sun, 2011-09-18 20:46. | | ]

Solution to Invalid command 'PubcookieAppID'

When setting up Pubcookie, you may encounter an "Internal Server Error" in your browser and the following error in /var/log/httpd/ssl_error_log (or wherever you're keeping your SSL error log):

[alert] [client] /var/www/html/foo/bar/baz/.htaccess: Invalid command 'PubcookieAppID', perhaps misspelled or defined by a module not included in the server configuration

This might be a headscratcher for a while, since you've probably been working hard to make sure that your Pubcookie configuration is nice and tidy and your .htaccess file is set up with PubcookieAppID, like this:

AuthType NetID
PubcookieAppID fribble
require valid-user

In fact, Apache is telling you exactly what you need to know: it can't make sense of the PubcookieAppID directive because the module that interprets that directive is not loading and thus is "not included in the server configuration."

To solve this, make sure that a line like this is actually somewhere in your configuration, normally somewhere like /etc/httpd/conf.d/pubcookie.conf:

LoadModule pubcookie_module modules/mod_pubcookie.so

In my case, I had commented it out while getting SSL to work. Duh.

[ Submitted by John on Tue, 2011-08-30 14:50. | | ]

Installing rpy and rpy2 on RHEL6

I wanted to install rpy and rpy2 on Red Hat Enterprise Linux 6. Here's how I did it. There have been several fixes in the rpy SVN repository that have not shown up in the version downloadable from SourceForge. Hopefully that's been fixed by now, but here's how I installed it by retrieving rpy directly from the repository.

First I made sure that the python-devel package was installed to avoid the error src/RPy.h:63:20: error: Python.h: No such file or directory:

yum install python-devel

Then I installed R and R-devel from EPEL:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/6/x86_64/epel-release-6-5.noarch.rpm
yum install R R-devel

Then I downloaded rpy and extracted it:

curl http://rpy.svn.sourceforge.net/viewvc/rpy/trunk/rpy/?view=tar > rpy.tar.gz
tar xvzf rpy.tar.gz
cd rpy
python setup.py install

Lo and behold, it imports without error:

python -c "import rpy"

Yay!

Besides R and R-devel, rpy2 requires that the readline-devel package be installed, otherwise installation fails with ./rpy/rinterface/_rinterface.c:79:31: error: readline/readline.h: No such file or directory.

yum install readline-devel

After that, rpy2 installs easily:

curl http://pypi.python.org/packages/source/r/rpy2/rpy2-2.2.0.tar.gz#md5=a42a7f1e6ddb10dc3a1886c2f4309fab > rpy2-2.2.0.tar.gz
tar xvzf rpy2-2.2.0.tar.gz
cd rpy2-2.2.0
python setup.py install

[ Submitted by John on Wed, 2011-06-08 14:38. | | ]

Solution to nodereference_autocomplete_access error

We recently rolled out a site to testing and got a mysterious error:

warning: call_user_func_array() [function.call-user-func-array]: First argument is expected to be a valid callback, 'nodereference_autocomplete_access' was given in /var/www/html/drupal/includes/menu.inc on line 453.

We were using Acquia Drupal, so the nodereference module was right where one would expect. What was Drupal's problem?

Turned out to be totally unrelated. The developer had missed checking Drupal's .htaccess file into the repository, so this copy of Drupal was missing its .htaccess file, and all sorts of havoc (including the message above) resulted. The tip was when clicking on the login form resulted in a 404 Page Not Found, which led to the realization that clean URLs weren't working, which pointed the big fat Drupal finger directly at .htaccess problems.

[ Submitted by John on Mon, 2011-06-06 13:27. | ]