Friday, July 1, 2011

vCenter database growth problem and fix

Symptom: Disk space low after a scheduled job at night in dev environment. DBA & I were alerted, DBA looked into it initially.

Short Term Cause: Short term cause (figured out by DBA) was when the index for the VPX_HIST_STAT2 table is re-organized (when fragmentation > 20%). The table was huge (42,000,000+ rows, while prod holds 7,000,000+) and holds the weekly history stats. This table, when running correctly, will hold the last week's worth of stats based on the stat level associated..and nothing else. I changed the stat level in dev to reflect the same settings as prod, restarted services and did some general poking with no success in a reduction of the table size.

Long term cause: When looking at the "VPXV_HIST_STAT_WEEKLY" view in dev, there were dates listed all the way back to 2009, although nothing very synchronous. Further digging in SQL Server brought me to the sql agent job that is supposed to do the rollup sequence and purge old records from that table ("Past Week stats rollup"). At some point (unknown date), the third step that purges the old record was set to never run (the second step would quit the job upon success). I shut down vCenter services and ran the third step in the job to start purging the records.

Hiccup: The purge job grew the transaction log to the point that it filled up the partition, grinding SQL Server to a halt. I was trying to cancel the job/shut down the sql server agent service, when the gears stopped. I added a new 25G disk to the VM and moved the log file (then at 11G) to this new partition. I started SQL Server back up and banged on it a bit to start the virtualCenterDev database up again*. Once up, I re-started the rollup/purge job and had to run to an appointment.

Finale: Watched by DBA, the job completed successfully then ran again successfully & quickly on the next schedule (probably because the vcenter service was still off). DBA ran a full backup then shrank the data & log files on both the vcenter db & tempdb. Ran another full backup & t-log backup. The log file is still on the new partition and can stay there through the lifespan of this VM (not long, as 4.1 is imminent).
vCenter services are now on and the database is at it's [probably correct] size of 1.2G.

Moral of the Story: Perhaps I should pay better attention to the database size. There is an estimator built into the vCenter client on the statistic level settings page for the db size based on the stat levels you choose. vcenterdev is now at that estimated size (rather than the 4.2G with one table running at 2.2G).

* : database offline, sp_detach_db, sp_attach_DB...

No comments:

Post a Comment