Lazy Linux Admins Going to Server Rooms Less: Forced Reboot, Auto Reboot after Kernel Panic and Email Notification after Reboot

Having to go the the server room to reset servers is the most headache thing for admins managing a cluster of Linux servers in a remote site. Either you can ping the server but can not ssh to it, or you even can not ping it. There are various reasons that may cause a Linux server crash or fail to be connected to by SSH. The most common two from my experience are: there may be a bad behaving progress that use up almost all physical memory and swap or there may be a kernel panic. In this post, I describe several techniques I learned to make myself go to the server room less by dealing with these kinds of failures.

datacenter-servers.jpg

Force Linux to reboot even you could not start a shell via SSH

If the server is too busy, creating the shell via SSH may also fail even though sshd is alive. Some times, you get lucky that you can remotely execute some commands by ssh directly. You may try to make use of the magical SysRq to force Linux to restart.

ssh root@server_home \
'echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger'

Reference: Force Linux to reboot.

After this command, if you find your server disappear from the network, it may be rebooting itself. Wait for a while and it may come back.

Make Linux reboot automatically after a kernel panic

Some times, you get bad luck that there is a kernel panic. Almost everything including the network stop working and you can not connect to the server any more. That is not good but may not be too bad if we did some home work before by configuring Linux to reboot itself after kernel panics.

Linux has a nice feature that reboots itself after a timeout if a kernel panic happened. Usually, it is disabled. We can turn it on as we are lazy system admins. It can be enabled by setting the kernel.panic kernel parameter.

For a running system:

# echo 20 >/proc/sys/kernel/panic

Here, 20 is the number of seconds before the kernel reboots. 0 means this feature is disabled.

To make the configuration persistent, you have at least 2 choices:

  • add the kernel parameter panic=20 to your bootloader (grub or grub2).
  • add kernel.panic = 20 to /etc/sysctl.conf .

I prefer the second method that writes the configuration to /etc/sysctrl.conf.

For more details, please check How to make Linux automatically reboot after a kernel panic.

Email notifications after Linux reboot

Auto reboot is good. It will be better that the server also notifies the admins after a reboot. The technique discussed at How to email admins automatically after Linux server starts makes the server send email notifications after reboots.

It makes use of the @reboot cron jobs and mailx by adding an entry like

@reboot date | mailx -S smtp=smtp://smtp.example.com -s "`hostname` started" -r zma@example.com zma@example.com

For sending emails, you may either Sending Email Using mailx in Linux Through Internal SMTP or Sending Email from mailx Command in Linux Using Gmail’s SMTP.

Eric Zhiqiang Ma

Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

4 comments:

  1. What if the remote root ssh is disabled for security purpose ? How do you reboot remotely ?
    Shouldn’t these critical server have remote power management tools like ILO by HP, DRAC by DELL.

    I think remote power management tools are the best options in such conditions.

Leave a Reply

Your email address will not be published. Required fields are marked *