| |

Lazy Linux Admins Going to Server Rooms Less: Forced Reboot, Auto Reboot after Kernel Panic and Email Notification after Reboot

Having to go the the server room to reset servers is the most headache thing for admins managing a cluster of Linux servers in a remote site. Either you can ping the server but can not ssh to it, or you even can not ping it. There are various reasons that may cause a Linux server crash or fail to be connected to by SSH. The most common two from my experience are: there may be a bad behaving progress that use up almost all physical memory and swap or there may be a kernel panic. In this post, I describe several techniques I learned to make myself go to the server room less by dealing with these kinds of failures.

Force Linux to reboot even you could not start a shell via SSH

If the server is too busy, creating the shell via SSH may also fail even though sshd is alive. Some times, you get lucky that you can remotely execute some commands by ssh directly. You may try to make use of the magical SysRq to force Linux to restart.

ssh root@server_home \
'echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger'

Reference: Force Linux to reboot.

After this command, if you find your server disappear from the network, it may be rebooting itself. Wait for a while and it may come back.

Make Linux reboot automatically after a kernel panic

Some times, you get bad luck that there is a kernel panic. Almost everything including the network stop working and you can not connect to the server any more. That is not good but may not be too bad if we did some home work before by configuring Linux to reboot itself after kernel panics.

Linux has a nice feature that reboots itself after a timeout if a kernel panic happened. Usually, it is disabled. We can turn it on as we are lazy system admins. It can be enabled by setting the kernel.panic kernel parameter.

For a running system:

# echo 20 >/proc/sys/kernel/panic

Here, 20 is the number of seconds before the kernel reboots. 0 means this feature is disabled.

To make the configuration persistent, you have at least 2 choices:

  • add the kernel parameter panic=20 to your bootloader (grub or grub2).
  • add kernel.panic = 20 to /etc/sysctl.conf .

I prefer the second method that writes the configuration to /etc/sysctrl.conf.

For more details, please check How to make Linux automatically reboot after a kernel panic.

Email notifications after Linux reboot

Auto reboot is good. It will be better that the server also notifies the admins after a reboot. The technique discussed at How to email admins automatically after Linux server starts makes the server send email notifications after reboots.

It makes use of the @reboot cron jobs and mailx by adding an entry like

@reboot date | mailx -S smtp=smtp://smtp.example.com -s "`hostname` started" -r zma@example.com zma@example.com

For sending emails, you may either https://www.systutorials.com/sending-email-using-mailx-in-linux-through-internal-smtp/ or https://www.systutorials.com/sending-email-from-mailx-command-in-linux-using-gmails-smtp/.

Similar Posts

  • MFC程序使用系统风格界面

    VC6默认编译出来的程序在XP下Luma风格下运行也是Windows的经典界面, 有损界面的美观与统一. VC2008默认设置下如果不是使用的unicode也是如此. 本文给出使VC6和VC2008可以编译出使用系统界面风格的解决方案. 1. 使VC6编译出使用系统风格的程序 步骤如下: 1) 创建一个.manifest文件的资源. 在res/文件夹下创建一个跟以程序名加.manifest的文件, 如果程序为test.exe, 则创建test.exe.manifest 文件可由此下载: https://www.systutorials.com/t/g/programming/resultcollector.manifest/ 注意要使用utf-8编码保存。 2) 将新定义的资源加入到.rc2文件中, 类型设为24. 打开res/文件夹下的.rc2文件, 在其中加入如下定义: 1 24 MOVEABLE PURE “res/test.exe.manifest” 其中的文件地址按1)步中修改的设置即可. 之后编译即可, 为了使程序界面可能充分利用系统的界面特性, 可以将界面字体设置为TrueType类型的, 利用Windows XP等系统的屏幕字体平滑特性. 2. 使VC2008编译出使用系统风格的程序 在VC2008下就比较简单了, 如果程序字符集使用unicode则默认就是使用系统界面风格的, 如果选择其它的类型, 则编辑下stdafx.h即可. 最后面部分找到这么一段: #ifdef _UNICODE #if defined _M_IX86 #pragma comment(linker,”/manifestdependency:”type=’win32′ name=’Microsoft.Windows.Common-Controls’ version=’6.0.0.0′ processorArchitecture=’x86′ publicKeyToken=’6595b64144ccf1df’ language=’*'””) #elif defined _M_IA64 #pragma comment(linker,”/manifestdependency:”type=’win32’…

  • | |

    Understanding Cloud Storage Consistency Models

    Cloud storage systems utilize various consistency models to balance performance, availability, and data accuracy. This article explores these models, their trade-offs, and examples of systems using them. We’ll also discuss the CAP theorem and its implications. Consistency Models Strong Consistency Definition: Guarantees that any read operation returns the most recent write for a given piece…

4 Comments

  1. What if the remote root ssh is disabled for security purpose ? How do you reboot remotely ?
    Shouldn’t these critical server have remote power management tools like ILO by HP, DRAC by DELL.

    I think remote power management tools are the best options in such conditions.

Leave a Reply

Your email address will not be published. Required fields are marked *