The holy grail of I/T is to be up all the time with no user impact. As most companies mature in the I/T area there is a shift from reactive tactics to proactive measures. There is a strong desire to be an industry leader and adhere to a “Five 9’s” mentality. The “Five 9’s” is a desire for 99.999% uptime of any given system or application.
So, how is this done? From a DBA’s perspective this means redundancy, the ability to pick up where we left off before an outage or performance impact occurred. There are many tools to address this issue: Power HA, DB2’s HADR, Veritas, TSA, Purescale, and Data Propagator just to name a few. For the purpose of this article we are going to focus on IBM’s Power HA (HACMP) and do a very brief comparison to DB2’s HADR. These are the more common methods of redundancy that a DBA may face.
What’s the difference?
To borrow from Ember’s previous post on HADR (Reference Article), HADR is essentially “DB2′s implementation of log shipping. Which means it’s a shared-nothing kind of product. But it is log shipping at the Log Buffer level instead of the Log File level, so it can be extremely up to date. It even has a Synchronous mode that would guarantee that committed transactions on one server would also be on another server”.
Power HA (or HACMP) is IBM’s solution for high availability clusters at an OS and hardware level. Power HA can run on up to 32 nodes “each of which is either actively running an application (active) or waiting to take over when another node fails (passive)” (Source: Wikipedia). Essentially specific disks are detached, swung over to a waiting passive node, and brought online.
Which solution is for you? There are arguments for and against both, and your individual environment and corporate policies will dictate which you choose. This article addresses the ins and outs of Power HA from a DBA’s perspective.
What this article will and will not cover.
Power HA is an unusual tool. You need a firm grasp of server level engineering as well as DB2. Unless you have a solid understanding of both worlds you will most likely be paired up with a System Administrator as you swap ideas, teach each other, and collaborate.
We will not be covering the nitty-gritty of Power HA configuration. What I will relay is how DB2 comes into play with failover – from a perspective of setting up DB2 for failover, to setting up filesystems, and using scripts to gracefully bring DB2 up and down.
I will be covering how Power HA is set up in my shop for a High Availability OLTP banking application where transaction times are measured in milliseconds and uptime is critical. We are running DB2 v9.7 on an IBM 570, model 9117-MMA, frame with POWER version 6 processors (P6) running AIX v6.1 TL6. This is a dual node cluster with each logical partition (LPAR) defined in a separate frame. The LPAR profiles are configured with a desired setting of 4.0 physical processors and 30 GB of physical memory.
As we discuss how DB2 and Power HA interact, I will reference our methodology and plan of attack for our specific environment. I will also provide and reference a script to control DB2 during failover. This script was developed hand in hand with a system administrator, tested in multiple environments, and deployed into production. The information provided here, including the script, is to be used at your own risk. Neither I nor my co-author guarantee the work or are responsible for its behavior in your environment.
Ok, now that legal is happy, let’s begin.
It’s all about the disk.
To over-simplify the process, Power HA listens to a heartbeat of the primary active server. When trouble arises and X amount of heartbeats are missed – Power HA yells “HANG ON”, initiates a failover, and swings defined volume groups from primary server to failover server. If you planned ahead, your application servers are already pointing to a Floating IP address, so the application continues to communicate during failover and is none-the-wiser when you come online on your standby server (which is now primary).
Now, it’s time to learn to speak geek of a different dialect. Want to impress your system administrator at a party? Drop big words like “Enhanced Concurrent Capable Volume Group”. You get geek points for that.In short, an “Enhanced Concurrent Capable Volume Group” is a very long description of what is “shared” or what disks fail over to the secondary server.
My counterpart and system administrator, Scott Robertson, described to me:
To build a volume group, you need physical disks.
To build a logical volume, you need volume groups.
*
To build a file system, you need logical volume.
*This is the level at which “sharing” between servers is defined.
In spending time with Scott, I learned that database administrators and system administrators look at this from opposite points of view. A SA looks at things from a bottom up perspective which starts at physical disks. A DBA looks at it from top down which starts at a filesystem. For example, from a DBA’s perspective we see.
Filesystem => Logical Volume => Volume Group (Enhanced Concurrent Capable) => Physical Disk
This means that when engaging the SA to figure out what “swings” between the servers you need to get your point across at the Volume Group level, and that may take a little one on one work to make sure you both understand each other clearly.
So how do we map this out? What swings over in failover and what doesn’t? Let’s take a closer look.
Red rover red rover, send the data disks on over.
In our shop, we designed our DB2 filesystems to a 1:1 match from filesystem to disk level (1 Isolated Filesystem to 1 Isolated Logical Volume to … you get the picture). This follows best practice for OLTP and gives us better speed, reliability, and reduces a single point of failure. Essentially, we look something like this (filesystem names are changed to protect the innocent)
Example of DB2 setup in a HACMP Cluster.
- /opt/IBM/db2 – DB2 Binaries [Local]
- /db2 – Where DB2 and Instances are held [Shared]
- /db2data – Data is held here [Shared]
- /db2log – Active Logs [Shared]
- /db2logarchive – Archive Logs [Shared]
- /db2backup – Database Backups [Shared]
Notice that /opt/IBM/db2 is NOT shared between active and passive. That is important because it means you have to be cognizant of a few things:
- DB2 has to be installed individually on each server.
- Licensing has to be handled individually on each server.
- The database needs to be swung from active to passive server to upgrade DB2.
(Binaries, instance, and DB are upgraded on active; Binaries are the only upgrade on passive)
If you have come this far, you have won half the battle. Defining what swings over and managing DB2 logistics on two different servers is significant. Once done though, it’s easy to keep up with.
UNIX SA Note:
If this is a new cluster, or even if you’re upgrading, take time at this point to test your Power HA configuration and ensure the “bones” of your failover are behaving as expected. Be diligent with your verification, try things such as, ifconfig down the primary Ethernet adapter from a console login, or try an immediate halt of the LPAR from the Hardware Management Console (HMC), or other actions that will simulate a failure of the primary node. You get the idea – just never assume it will work; test while you have “maintenance/build” time available.
What do you mean my scripts stopped running (and other “gotcha’s”)?
There are a few other “gotcha’s” to watch for when preparing for a Power HA setup. If you aren’t watching for these during the setup and maintenance phases you could run into performance issues upon failover. Worse yet, 30 days from failover you suddenly need to recover from a backup and realize that your backup script hasn’t been running.
First, the servers must be an exact match. This doesn’t mean kind of the same, or mostly the same, but an exact match. OS and DB2 versions down to the Tech Level and Fixpack should match. Local filesystems that don’t swing over should match in size and configuration. CPU, memory, and paging space should also mirror each other.
Learn from my fail (DBA):
I had a production server swing during an outage, and then chased my own tail for two weeks on new performance problems. It wasn’t until the SA and I did a deep dive that we noticed OS and DB2 were not on matching Fixpacks, memory was slightly less on the failover box, etc. We wasted a lot of time and effort stabilizing a box that should have been an exact match.
UNIX SA Note:
If your SA needs convincing about this, bribe them with doughnuts, bacon, and coffee, anything to help get the point across. Just as Mike stated, each AIX LPAR in the HACMP cluster should be identical in OS software version/revision/patches, filesystem and paging space sizing, as well as LPAR profile configuration for CPU/Memory, or you’ll find yourself chasing performance ghosts for days.
There are exceptions to this. For example, a company could make a strategic decision that the failover server is to limp along and “keep us afloat” as we get the other server fixed. The failover server is not meant to hold a full load for long and may have half the CPU and memory. However this is not common and I wouldn’t recommend it.
Your second concern is “Crontab” – Your Crontab is local to each machine and does not swing back and forth upon failover. This means you must maintain two sets of Crontab entries and keep them in sync. Unfortunately this is a necessary evil and a chore. However, my SA taught me a neat trick: If I edit each of my maintenance routine scripts to include one line in the beginning of the script, I don’t have to worry about commenting out Crontab on the failover server; I just make sure it is a duplicate.
In each script I was encouraged to add a line similar to this:
test -f /path/to/active/only/file || exit
If you test for something that exists only on the active server at the time – “db2profile” for example – the maintenance script will run on the active server but exit on the inactive server. With this method you won’t have to worry about what is commented out and what is active. You can say with confidence that your backups and maintenance routines are kicking off when you recover to the new server.
At this point, we have initial setup done. Now we need to tell Power HA how to handle DB2 in crisis. If you follow IBM’s recommendation you essentially force the database to a hard stop. So let’s look how I approached this scenario within my own shop.
Now, now … you two play nice.
Scott, my SA, can make Power HA do some pretty cool stuff. The failover capability and speed in which things can move are impressive. However, at some point, Power HA has to hook into DB2 to bring databases offline on the old server and online on the new server. If you ask your SA how Power HA is stopping DB2, they like Scott, will probably give you a confused look followed by ,“Well, uh … DB2 Off? No, Kill -9? Something stop?”
It’s not their fault – they are handling a complex setup with Power HA but rely on IBM to provide guidance on this. But IBM isn’t much better. Even if you look at example HACMP scripts provided by DB2 you see that in the end all that is issued is a “db2stop force”.
[Side note: Did you know IBM provides a ton of example scripts? Everything from useful administrative scripts to HACMP and HADR scripts. Check out “/opt/IBM/db2/V9.7/samples/” on your own server.]
This makes sense. From an SA point of view (and even IBM’s view), the server is “dorked” and probably not responding well. Because of this we need to bring down everything – we don’t really care how – and get over to the new server.
DB2STOP FORCE works fine and is not as detrimental as a db2_kill. It actually tries to force everyone off and do a rollback of work before stopping DB2. However, if your database was activated explicitly the database will never actually be deactivated via a DB2STOP FORCE and resources are not released naturally. This may – or may not – be the reason I have seen a few occasions of databases going into crash recovery on the new primary server even during a controlled failover.
I’ve also found that Power HA only fails over automatically a small fraction of the time. Often we see a failure coming and initiate failover ourselves (while the active server is still functional). Other times we decide we want to fix/upgrade one server so we manually fail to the other to buy us time.
If the server is functional most of the time, and failover is in a controlled manner, why do we not try to shut down gracefully? Let’s edit the DB2 shutdown scripts to force applications, deactivate databases, shutdown DB2 first. Then if all else fails issue a “db2stop force”?
Everyone in the lifeboat – the ship is going down!
Remember that HACMP “hook” into DB2? This is where you work with your SA to bring DB2 down gracefully. Essentially, Power HA operates under root, which can be defined to “su – “ to the proper instance ID, and execute a “Stop DB2” script and a “Start DB2” script. You can edit these DB2 scripts to include logic beyond the sample scripts which use only a “db2stop force”.
Here is how we manipulated our scripts – now, our scripts have been made to be more robust to handle multiple instances and multiple databases as well as some error checking, but this is the gist of our methodology.
Stop DB2 – stop_db2.ksh
- Force Applications All
- Deactivate each database
- Force a DB2 Stop -db2stop force
- Kill Straggling db2bp processes
- Issue IPCLEAN to clear memory
(Essentially, we try a clean shutdown; if we can’t, then force DB2 down).
Start DB2 – start_db2.ksh
- Update sqllib/db2nodes.cfg to reflect new server name
- Start DB2 – db2start
- Activate Database – activate database
Updating the db2nodes.cfg file can be done a few ways. Because of limitations placed on us in our environments we have to physically change the hostname (see code below). However, I have seen virtual IP’s and HACMP Labels referenced in the db2nodes.cfg file so no alteration has to take effect. I have not done this explicitly in our environment but am aware of others that have.
Example Code to extract Hostname:
local_HOST=$(hostname | cut -d”.” -f1)
echo “0 ${local_HOST} 0″ >db2nodes.cfg
Explicit activation of the database may seem pointless when the application or user can connect and bring the database online. However, this step is important for two reasons. First, some DB2 features require explicit activation of the database to function. An implicit activation doesn’t cut it. Second, explicit activation can improve performance by reducing startup time by initializing the database. The need for explicit activation is dependent on multiple factors, but in short explicit activation can’t hurt.
When developing your own shutdown/startup scripts you don’t have to reinvent the wheel. As I suggested, check out DB2’s sample directory which provides free example scripts to use for various scenarios ranging from single node to multi-node databases.
Sample Script HACMP/Power HA Directory:
/opt/IBM/db2/V9.7/samples/hacmp
You can also download a copy of the script developed by Scott Robertson and I which we use in our own environments. For the most part, this script should be “plug and play” for any single node distributed server. Our scripts contain basic error checking and can work with multiple instances, databases, and versions of DB2 residing on one server or LPAR.
Example HACMP Scripts can be downloaded here – Downloadable Zip
As stated before, use this script at your own risk. We do not guarantee the script nor are we responsible for any damage caused. This is a starting place for you to develop your own script and methodology.
Before I go on, I would like to take a second to thank Scott Robertson. This script is the result of locking ourselves in a small meeting room, putting thoughts to paper, ultimately coding these scripts, and days of testing. His scripting ability is much stronger than mine and this script could not have been developed without a collaborative effort with him. Well … and coffee – lots of it. I don’t know about you, but I bribe my SA with coffee to help me.
Final UNIX SA Note:
If coffee is your preferred “incentive” for requests like this, insist on quality flavors from one of the nearby name brand coffee houses, not the break room.
Attaining the “Five 9’s”
In the end, an uninterrupted service time is almost unattainable. However, many times it is about progress, not perfection, and proactive planning will help you get closer to that goal. Power HA or HACMP is only one tool in your arsenal of many to help keep your system online. Hopefully, this article gets you a little closer to that goal.
Good luck with your own setup. I encourage you to write feedback below (or send e-mail) letting us know your successes and failures with your Power HA setup. As you work your own environments, please feel free to contact Scott or I for any follow-up questions you may have. You have resources available to help you here – take advantage of them.
Finally, I would like to thank my good friend Ember for allowing me to share her little section of the internet. Without her blog we would be unable to pass on lessons learned and help others through our writing.
About the Authors:
![Picture of Scott Robertson]() |
Scott Robertson has been a Unix Systems Administrator for over 24 years in production support of anything from small applications to very large database warehouses. His experience over the years includes many Unix aspects such as systems installation, routine maintenance, day-to-day user support, filesystem management, network management, security management, and is always trying to replace repetitive tasks with a shell script. He can be reached at “unxscorob @ gmail.com” Linked-in profile: http://www.linkedin.com/in/unxscorob |