bash scripting

I am not a scripting expert, I have to admit that. I am not a sysadmin either. But I think that every developer has to possess wider knowledge about the tools and the platforms he uses.

At least for me, some basic – intermediate knowledge makes me more productive, has saved the day in the past and also makes my life easier.

For example, the power of Linux is that you can write scripts and schedule them with Cron to do a lot of things that is considered to be “dull-programming-stuff”. Let’s start with backups.

Or scheduling automatic backups for things that matter and are also more tricky to perform than simple file backups.

Or also monitoring website up-times and logging the results.

Usually most of the sites use MySQL databases, so the tool that is the most popular out there for dumping database data is mysqldump. Many people mentioned also mysqlhotcopy for MyISAM tables, (usually joomla, wordpress etc sites) as it is faster, but because I have InnoDB databases for transaction and foreign key support in applications, I just go the generic way and use mysqldump.

Because I schedule it at really off-peak times on the server (04:00, 05:00) I cannot really tell the difference in speed and I don’t really care about it.

Now, mysqldump has this really useful command line parameter option of “–all-databases” that backs up all database schemas of the database server.

There is a drawback, of course, as always.

This option was really working fine for me, until my archived-data database schemas grew up to be more than 1 GB of size. It was clearly stupid to waste all this bandwidth to dump an archive database that doesn’t change sometimes even once a week, daily.

It would be a really nice option of mysqldump to have an exclude option for databases too, but with a quick search in the “world wide web” I didn’t find such a hack. So I was a little bit frustrated about it.

For no actual reason. No reason to be frustrated, because

“When you cannot find a tool or a feature for doing something, you can always build your own – Anonymous”

And so I decided to do so.

MySQL (and every serious DBMS) has one database schema that contains all the database names, metadata etc. This schema for MySQL is called information_schema. So, instead of writing separate dump statements for 45 of the 50 database…

…you can use the following script I wrote to do the trick and exclude specific databases:

#!/bin/bash

# This script performs a full backup of a MySQL database server,
# excluding databases explicitly. It's extremely helpful if you have 50 databases
# and you want to exclude 5 or 6 of them. It dumps the contents into a .sql file, like always.
# That's why I wrote it the first time. I really kept it simple to use and really hope you find it helpful.
# I am really open to comments, suggestions etc...

# Alex Hughes, London 2012.

echo
echo "Database Backup Initialized"

# selecting database schema names that are going to be excluded
# it is a good idea to exclude the performance schema and the information schema as well.
query="SELECT schema_name "
query+="FROM information_schema.schemata " 
query+="WHERE schema_name "
query+="NOT IN ('anExcludedDB1', 'anExcludedDB2', 'information_schema', 'performance_schema') "

#grep is used to get rid of the column name, which is not a db
databases=$(mysql -u aUsername -paPassword -B -n -e "$query" | grep -v "schema_name")

#performing the dump and placing it in a file.
#I prefer to have it in my dropbox so it is automatically impossible to lose any data.
mysqldump -u aUsername -paPassword --databases $databases > /home/user/Dropbox/server_backup/db.sql;

#also a mini log is being kept. Just to know that cron is working.
log="DB Backup: "
log+=$(date)
log+=" completed succesfully"

#Every backup instance is appended at the end of the file
echo $log >> /home/user/Dropbox/server_backup/log

echo "Database Backup Completed"
echo

You can follow updates to this script and updates to the repository here.

As you can see and understand from the comments, I can query the database and exclude any databases I don’t want for backup. I am also explicitly letting out information_schema and performance_schema. This is mainly done for portability issues.

So, voila, I am done. If file size does matter, you can also compress the resulting db.sql file using gzip and archive it using tar. Just like that:

#put all files in an archive and compress them
tar -zcf db.tar.gz db.sql

Should do the trick if you have really big active databases to backup. Me personally, I keep my active databases as lightweight and scalable as possible, so size is not a problem and I keep the file as it is to be able to restore in seconds.

I also developed this following simple script to monitor website status:

#!/bin/bash
#Simple Script to check whether all of my clients' websites are up and running.

#This function takes the site's url as an argument and checks the curl command
#exit code. If it's different than 0, it assumes that the site is down.

#Future additions would be a nice mailing or SMS function that notifies the
#administrator, as well as more detailed info about the errors encountered.

#Alex Hughes, London 2012

check_site () {
	log="site: "
	log+=$1
	log+=" || "
	curl -s -o "/dev/null" $1
	if [ $? -ne 0 ]; then
		log+="DOWN || "
		log+=$(date)
		echo "*******DOWN*******"
		#TODO mailing function
	else
		log+="UP || "
		log+=$(date)
		echo "*******UP*******"
	fi
	echo $log >> /home/user/web_monitor_log
}

sites[0]='site1.com'
sites[1]='site2.com'

for i in "${sites[@]}"
do
	check_site $i
done

cp /home/user/web_monitor_log /home/user/Dropbox/

Again, you can follow updates to this script and updates to the repository here.

The only thing it does is to keep a log whether a site was down or up, according to the http status code. It could do really more things if I wanted to, but I chose to keep it that way, simple.

As I have multiple domains hosted “somewhere” I can really count the instances of up and down later once a month and gather some statistics that prove whether my hosting provider is promising false up-times.

Just for sharing information, I don’t know exactly how this is done, but so far for months, up-times are just in the threshold between lies and truth. (So my hosting provider tells me the raw truth and I like it. – 1%-2% downtime)

In order to have correct data in my hands, I use two Linux servers to run the script every hour and then I compare the results.

So far these servers it seem they were never down or they were exactly the same time. Or they were down together (so romantic). You can calculate these odds for me if you have the time. For me it is not even worth trying, I trust my data so far.

If I ever have to sue the hosting provider, the actions that I will have to do and prove, is another story. I don’t think that there will be a winner in this case though.

 A nice addition to this script would be to send an email to the administrator of the website.

The reason I did not implement it straight away, was that I haven’t found (or haven’t searched enough) to find:

“How to connect to an smtp server (like smtp.google.com) to send mail using bash.”

I don’t have any mailing application like sendmail installed and running on these servers and I am not planning to do so.

So if you want to help, I would be really grateful if you can mail me or comment on the question:

“Is there a simple way to connect to another smtp server using bash script language”

Leave a Reply