Wednesday, October 30, 2013

Top 6 features of Hyper-V 2012 R2...

...that the vSphere Admin thought were already there.

I'm not getting into a feature debate. Too much marketing on either side. Yes, VMware's licensing is complicated. AGREED.
What I'm talking about here is when I go through the "New Features" articles of Hyper-V 2012 R2 and see things that I ASSUMED WERE THERE FROM DAY ONE. Day one being ~2008. ESXi 3.5 came out in ~2007.
However, day one from Hyper-V didn't have "live migration" just "quick <cough power off cough> migration" so maybe I was expecting too much from VMware's competition out the gate.
Anyway, here's my list:

6) PXE Boot. Your Hyper-V hypervisor is now going to get out of the way of your virtual network adapter. Move out of the way. Available in ESX since probably the beginning or close to it as the generic virtual ethernet is Intel e1000 in ESX(i) and the VM traffic stack is separated from the management stack. Restricted to Gen2 VMs.

5) Online virtual HDD resizing. This feature in ESX(i) has been saving my butt since ESX(i) 3.5 (or 4.0, but I'm pretty sure 3.5). Windows Server introduced live resizing of the boot partition in Server 2008, 2003 you could only live resize non-boot partitions (shutdown, gparted, start). This was a function that the Windows Server team that got it right. Awesome feature. Because Hyper-V Gen1 VMs are IDE, you couldn't use this OS feature. Unbelievable. Oh, the fine print on the Hyper-V shrink HDD is that is has to be unpartitioned. That's reasonable. Restricted to Gen2 VMs.

5b) Clarification in that Gen1 VM boot partitions were restricted to being IDE. Other added disks could be SCSI (although I'm not sure if live resizing was available. Looks like no).

4) Live Migrate from Hyper-V 2012 server to 2012 R2. AKA, non-disruptive vMotion onto latest version. This is one-way onto 2012 R2, no clusters of different versions for any period of time. This has been available since at least ESX 3.5 circa 2007, probably before.

3) Clone a running VM. Available since at least ESX 3.5 with VMware. Again.

2) Virtual SCSI HDD boot disk. What? You mean Hyper-V guests have been running on virtual IDE? Yuck. Restricted to Gen2 VMs.

1) "VM Direct Connect" aka Virtual Console! You know that thing that happens when you screw up the network mask or IP and lock yourself out. Or maybe you had a security breach and want to keep the machine running without it being on the network. Well now you can! Introduced in the first version of VMware's hypervisor circa 2001 now coming to Hyper-V 2012 R2.

0) Restricted to Gen2 VMs. This is Windows Server 2012 and Windows 8 only. ONLY. Though if you're using the Hyper-V equivalent of vCenter (VMM) it can't see Gen2 VMs. Doh!

Other bits:

- Dynamic memory of Linux. If you set this too low you will break your Linux VM (see FYI).
- SMB over RDMA (RoCE). Shared memory between physical servers. Like HPC infiniband solutions. While a very cool idea I think most Hyper-V deployments won't be shucking the dollars for this architecture.
- NVGRE is cool. At least Ivan thinks so.

vCSA 5.1u1b Login Timeouts

Had an interesting thing happen in an environment. Not sure which knob I tweaked did the actual fix so I thought I would point out the problem and all three modifications.
  • vCSA pointing directly to a Windows 2008 R2 AD DC/GC.
  • SysAdmins add a Windows 2012 AD DC/GC into the domain.
  • Timeouts on vSphere logins suddenly start: "The command has timed out as the remote server is taking too long to respond."
Frequently with the C# client but also with the web client. There are so many chatty logs that damned if I wasn't going to be able to figure out what was going on by log checking. Trial and error was the quickest way to a solution.

Fix 1) We have a main * and a * Turns out that while my primary DNS servers and hostname for the VM at was all well and good, some part of ldap  (/var/log/ldapmessages) was trying to contact which did not have an A record. A record created in subdomain.

Fix 2) Swapped URL from port 636 (LDAPS) to 3269 (Secure Global Catalog). Same 2008 AD DC.

Fix 3) Bumped client timeout from default 30 seconds to 60 seconds.

Now, if you look at the readme for 5.1u1b it states pretty close to the top of the page that they've fixed timeout issues. Har. It was probably the LDAPS to SGC port change that fixed the issue, so if you are reading this I would start there.

Also, while you're looking through the readme files, look at 5.1u1c. How many items are fixed? 1.
How many known issues are there? 99. How many of those known issues are listed as new? 2/99.  Make your own choice if you want to update to 5.1u1c or just wait for an upgrade that actually does something.

Wednesday, October 2, 2013

Update to latest 5.1 build on vCSA

Thanks to Virtual Aspects for a smooth second attempt at updating to the latest build of the 5.1 vCSA. My search term of vami-sfcb was surprisingly good. After watching vami-sfcb fumble in the logs repeatedly I was just trying to find out what the service did.. but there was the fix.

Monday, September 16, 2013

Distributed vswitch security policies PowerCLI

Whatever your chosen acronym for distributed vSwitches, the default security policy is to accept MAC address changes and forged transmits. Even the hardening guides think this is a bad idea (and we know how checking those security those checkboxes from the hardening guide spreadsheet makes us feel, miright?).

GeekAfterFive has the code from his vCloud deployment that also works handy for 5.0 VDS's (and probably 5.1 and 5.5).

Even the 5.0 and 5.1 hardening guides don't have PowerCLI code to change the setting from the default insecure to reject this traffic. Pretty awesome resource there, GeekAfterFive. Good and bad that you're not blogging anymore from "the mothership".

Wednesday, September 4, 2013

5.0 -> 5.5

Per VMWorld session on vCenter, recommendation to stick with 5.0 and wait for 5.5 in order to not deal with the crap SSO in 5.1. Still no GA release date. Crossing my fingers that 5.5 is that much better...


Tooting my own horn here. VCAP5-DCA passed, even while giving AudoDeploy the finger. Thank you VMware for the super fast turn around on results (rather than the 3 weeks of dread last time). VCP5-Desktop next up.

Sunday, August 25, 2013

vCSA 5.1u1b saves the day

Windows shop install of a "simple install" vCenter. Piece of cake and the local IT staff should be able to manage it in the long term. Except they don't put the new vCenter server into DNS and I don't realize it. >_< "Error 29155.Identity source discovery error" but install of SSO continues so I think it's something we can work with. Turns out, not so much but it was a pair of bad decisions by me that turned into a 5 hour headache and wanting to do bodily harm to the manager who decided that SSO should be required for all vCenter deployments. Hindsight #1: It's a VM. Use snapshots before mucking with things or installing. "Nah, it'll be fine, simple install...right?" Hindsight #2 and a bit: Do not uninstall SSO. Do not try to re-install everything from the "simple vCenter install" and expect it to work. Clean system or nothing. vCSA 5.1u1b (which the local IT staff won't know what to do with) was downloaded, installed, configured in about an hour. Setting the appliance for AD auth automatically set the AD domain in SSO (although with insecure settings) but it worked and I didn't have to muck with it. Saved the day. So thank you VMware for investing time into the vCSA and screw you for bad SSO error messages (or just SSO in general) and uninstall procedures that don't clean the system correctly. You cost me an entire Saturday afternoon and into the evening. Should I manually check DNS for the new vC server that I was handed? Yes. Should the SSO installer have continued upon error? NO.

Friday, August 16, 2013

New Cluster Name

While I've seen a whole host of boring cluster names in my travels, I like to open it up to voting. The ESXi cluster name is just a fairly arbitrary text value in the database that can be changed anytime so why not have some fun with it. :)

Please vote once, you can choose as many names as you like. Obvious ballot stuffing will disqualify the entry.

Wednesday, July 3, 2013

Dell R805 SD install of ESXi 5.1

I have a stack of oldish Dell PowerEdge R805 servers that have been running ESX and then ESXi 5.0 on a standard hard drive configuration. Now that they'll not be in production use I was giving them a once over and saw they had an internal 1G SD card that was off in the BIOS.


I decided to turn it on and install ESXi 5.1 on the SD card. I'm pretty sure that the version of ESXi preinstalled was the OEM embedded Dell 3.0 version of ESXi. The 5.1 installer throws quite the fit when doing the check of what's running in the card but it looks like it installs and runs just fine. Mostly python and libparted, see screenshot.

"The selected storage device contains ESX(i). The version could not be determined. Only installation is possible." Fine by me.

Friday, June 14, 2013

root cause analysis log hunting

I was asked to unravel the all-to-common filled datastore and the response from the local IT staff that ultimately made the company pull a critical server from backup (the actions of the local IT staff that is).

My problem? "Task: Delete file" from the datastore browser.
What file?

Looks like the verbose and trivia[l] vpxa.log would have this information ...but... the logs I have don't go quite far enough back. vpxa is CHATTY. Search the log for nfc.

Now, I was almost lucky in this case because the datastore was only used by two hosts. In a larger environment, which ESXi log would have this info? Hopefully whichever host the guest was associated with, but again.. unraveling what host a guest was on at a particular point could be time consuming.

Hopefully I have better luck next time and the client asks for this analysis closer to when it happened!

Saturday, April 27, 2013

5.1u1 rant. Too much fail in readme

5.1U1 might not work if you're attached to lots of AD groups, so use Windows passthrough 
Oh, sorry. I'll do that now... Oh wait.
I can't login with Windows passthough because of SSO config issues. NetBIOS? REALLY?
Or maybe my login issue is that SSO seems to be only hanging onto to UUID and not the username so when my account was deleted I get locked out.
Or I use the vCSA and it doesn't allow Windows passthrough (<--see comment for KB fix).
Or I (some would say correctly) setup a SSO load balanced HA configuration, and it doesn't allow windows passthrough either. ARG.

I've been reading the upgrade readmes from VMware for years now and none of them seem to have as much fail as this one. No Windows Auth anymore for SQL Server? What? 
That's ok I guess, since I couldn't install the schema in the first place.

So many of the issues, work arounds or otherwise identified issues have to do with special characters and non-English strings. Why is there no standard lib for those crazy double quotes in passwords? Really!?!

Who the hell is coding this? I've been to VMware HQ...which building basement houses those monkeys?

Enjoy. It would make for a good drinking game whenever SSO, special characters or database issues are mentioned. Please, not all three at'll get alcohol poisoning.

Monday, January 28, 2013

Can't even install vSphere DP? This is not a good sign

Downloaded current 5.1.1 (build of vDP 0.5T ova. 

From C# Windows Client, able to get through the ovf wizard, fails on creation disk 1 of 4:
Unable to access file:
Unable to connect to the remote server

On the web client, after RDP'ing to a Windows host that had access on port 9443 and installed Flash and installed the client integration plugin (arg), the wizard wouldn't let me pass the Select Storage. I said Next it went back to Select Storage. I hit Setup Network, it went back to Select Storage.

I think the issue here is that the 0.5T appliance is actually more like 900G when thick and the largest datastore I have in this environment is 700G (and yes, it's empty for this purpose).

I am choosing thin provision during the wizard but obviously this isn't good enough for vDP.

From the horror stories in the community forums I'm just going to say screw it.

This is a big issue when I move my production vCenter to 5.1. I'm only backing up 5 VMs with vDR in prod (why? because it's unreliable) but at least I can install (and re-install and re-install) it.

Thursday, January 3, 2013

vCSA update 5.1.0a to 5.1.0b

Just a few hiccups when upgrading to 5.1.0b... Nothing like the koolaid I had to mix up (here and here) to get from 5.0U1a to 5.0U1b to 5.1a.
I read the release notes and rebooted the VM at the end of the update.

Then nothing worked.

I logged into the console of the VM and saw that it had reverted to the temporary IP that I gave the appliance when I initially made the upgrade from 5.0 to 5.1. This may be due to the vApp options -> Properties still having the temp IP... I just changed it to the correct IP in /etc/sysconfig/network/ifcfg-eth0 (then ifdown eth0 / ifup eth0).

This worked for the main services, but not, it seems for SSO. I can't login to the web client, the error message clearly shows that it's trying to talk on the temp IP. I expect another reboot (or actually a shutdown/startup so that I can change the vApp settings) needs to happen for vmware-sso to start listening on the right address (restarting the vmware-sso did nothing).

/etc/vmware-sso/ls_url.txt shows the correct IP (lookup service)
/etc/vmware-vpx/vpxd.cfg did not. I restarted vmware-vpxd and still couldn't login.. a restart of the vmware-client service did the trick (and while it's called vmware-client, it did not quit active sessions with the C# client).

So I'm not sure if my editing of the vpxd file and the restart of vmware-client was needed or just the restart of vmware-client... at least this post will point someone in the right direction if they have the same issue.

The other fun thing was the ESXi hosts disconnected to vCenter and needed to be manually connected back. Not sure if this was because of the IP address change or something else.