Matillion : Automate – Restart matillion Services Weekly
Non-PRD Matillion Servers are heavily used by
developers daily and there will be many process / connections opened on server
level, due to this it will not allow other users to login into system or server
performance will be degraded. To resolve this issue admin team has developed an
automated script which will recycle Matillion non-prod services without any
manual intervention and sends email alerts on success / failure of service. On
recycle of services all hung process or unused process will be killed. This
script execution flushes out temporary Memory on Linux servers used.
- The script gets executed every time
from a external scheduler or cron tab. - The script initially checks the
tomcat service run state. - If the tomcat services not running
by default, then restart activity is not performed and a failure mail is sent. - If the tomcat service runs then the
script executes stop command and stops the tomcat service. - After a minute the script checks
whether the tomcat service got stopped. - If the tomcat services are stopped
then the script executes start command. It starts the tomcat service and
confirms the tomcat service start, then sends a success notification email to
recipients added. - If the tomcat services not stopped
then the script executes stop command. It tries to stop once again and proceed
on step 4. - If the tomcat services are not
stopping after 3 consecutive attempt then a failure notification mail is sent. - All the activities are captured in
Restart_log.txt file.
Benefits:
v
Kills unwanted process and refresh server CPU
more to Matillion.
v
Avoids Jobs does not get into hung state.
v
Keeps web services always alive
v
Kills the existing long running jobs
v
No manual intervention
v
Alerts the admin team by sending notifications
on success/failures
Process Highlight:
v If
the webserver is in stop state as per any adhoc request, then the script never
executes stop command but sends an alert. A failure mail notification.
v The
script tries three attempts to stop the webserver and sends 3 failure attempt
mails.
v The
script writes its actions as a log for easy analysis.
Log Files: Every action executed
in script is written in restart_log.txt log file. Timestamp is written along
with log for tracing. The time format used in log is UST. The file
gets updated during every execution of the script.
Email
Notifications: The script sends 3 types of mails as per the scenario
• Restart
Success Mail
• Restart
Failure Mail
• Already
in stop state Mail
Script
Details
·
Script name: restart_sh.sh
·
Script Path: /home/centos
·
Script Owner: Root
·
Script Permission: Read, Write, Execute
·
Supported files: The script generates
below files.
o
Status.txt – This file is used to check the
status of tomcat service within the script. This file gets newly created during
every execution in the same path.
Restart_log.txt
– It is a log file, It contains the actions performed by script as text with
date and time. The time format used in log is UST. The file gets updated during
every execution of the script
Script:
#!/bin/bash
recipients=”ambarish@abcd.com”
cd /home/admin_usr1/
systemctl status tomcat8.service > /home/admin_usr1/status.txt
run_flag=`grep ‘active (running)’ /home/admin_usr1/status.txt|wc -l`
if [ $run_flag -eq 1 ]
then
echo “`date “+%D %T”` :webserver is active”>> restart_log.txt
else
echo “`date “+%D %T”` :webserver is inactive”>> restart_log.txt
fi
attempt=1
while [ $attempt -le 3 ]
do
echo “`date “+%D %T”` :stop webserver attempt: $attempt”>> restart_log.txt
if [ $run_flag -eq 1 ]
then
sudo systemctl stop tomcat8.service
echo “`date “+%D %T”` :stop command executed”
echo “`date “+%D %T”` :stop webserver cmd executed”>> restart_log.txt
echo “`date “+%D %T”` :waiting for webserver to stop”>> restart_log.txt
#echo “`date “+%D %T”` :waiting for webserver to stop”
sleep 50s
systemctl status tomcat8.service > /home/admin_usr1/status.txt
run_flag=`grep ‘active (running)’ /home/admin_usr1/status.txt|wc -l`
if [ $run_flag -eq 0 ]
then
sudo systemctl start tomcat8.service
echo “`date “+%D %T”` :start command executed”
echo “`date “+%D %T”` :start tomcat cmd executed, attempt $attempt successful”>> restart_log.txt
sleep 50s
systemctl status tomcat8.service > /home/admin_usr1/status.txt
run_flag=`grep ‘active (running)’ /home/admin_usr1/status.txt|wc -l`
if [ $run_flag -eq 1 ]
then
#echo “`date “+%D %T”` :start tomcat cmd executed, attempt $attempt successful”
echo “`date “+%D %T”` :service is in running state”>> restart_log.txt
echo “subject: QA Matillion Services Restarted Successfully”| /usr/sbin/sendmail -f ambarish@abcd.com -t “$recipients”
echo “`date “+%D %T”` :success mail sent”>> restart_log.txt
fi
break
else
echo “`date “+%D %T”` :websever not responding, attempt $attempt failed”>> restart_log.txt
echo “subject: QA Matillion Services Restart Attemp $attempt failed”| /usr/sbin/sendmail -f ambarish@abcd.com -t “$recipients”
echo “`date “+%D %T”` :failure mail sent”>> restart_log.txt
else
echo “`date “+%D %T”` :existing webserver was already in stopped state, restart aborted”>> restart_log.txt
#echo “`date “+%D %T”` :existing webserver was already in stopped state, restart aborted”
echo “subject: QA Matillion Services Already In Stop State, Restart Attemp $attempt failed”| /usr/sbin/sendmail -f ambarish@abcd.com -t “$recipients”
echo “`date “+%D %T”` :failure mail sent”>> restart_log.txt
break
fi
attempt=`expr $attempt + 1`
done
echo “`date “+%D %T”` ——————end of execution—————–“>> restart_log.txt
#to edit crontab
#crontab -e
#To check the running logs
#tail -f /home/centos/restart_log.txt