Verified SQL 2005 Cluster bug…

In an earlier post, I mentioned a bug that I think that we found in SQL Server 2005 Clustering. I have confirmed the bug. The bug goes again like this:

You have 2 Windows 2003 machines and one or both have been named with lower at least 1 lower case letter in the name. Now you may think, how can this happen, because when I installed the Servers I saw in the name of the machine all upper case letters in the dialog box. What you don’t know is that when you don’t put CAPS LOCK on or don’t hold the shift key down, it is recording the name in mixed case, even though the dialog shows upper case. This will be important later.

This is my findings (with help from another student in the Clustering Class from Ameriteach in Denver

Now you have 2 servers and you are ready to cluster them together. Now you launch Cluster Administrator in Windows 2003 and it allows you to cluster. Now the interesting thing is that when you see the dialog box to add the server to the Cluster, it will show you a name in the box. This name is retrieved from HKLMSystemCurrentControlSetServicesTcpIpParamtersHostName registry key. The interesting thing is that when you add it to the Cluster, it will show the cluster machines in the list in Cluster Admin, and the name is retrieved for the list from HKLMSystemCurrentControlSetComputerNameActiveComputerName so if they are different (which I have not found a reason that they would be, but this is what our testing revealed) then you would see wierd behavior.

When you get the machines in the cluster, then you will attempt to install SQL Server. If you have any lower case letters in the machine name, it will let you run setup, and then it will get to the part where it wants to install SQL on the clustered machine, and it fails to find the node. My hunch is that it just does a binary compare on the machine name and does not find it since “ben1” is not equal to “BEN1” unless it looks case insensitive. You will get an error that the “Parameter is incorrect” and it will fail and rollback the install.

The other thing to note, is that when SQL Setup rolls back the install, it will not remove the Cluster Resource based on the dll and it will not delete the SQL Resources that it created, you will have to do that manually. Not a very good cleanup routine for a setup.

I will be filing a bug in Connect so that the SQL Team can have a look. I will update this post or create a new one when I have created the bug and it has a URL.

Happy landing and see you out there.

Read More

SQL Server and Clustering

SQL Server

I will be setting up an environment to repro an issue that we found in installing a SQL Server cluster. We had 2 Windows Server 2003 R2 Enterprise servers that we were to cluster and then put SQL 2k5 in a cluster as well.

We go to the point of installing the SQL Server in a cluster and just at the point of completing the install and removing the temporary files, a message came up and indicated failure. No real earth shattering message, but just that it failed because a Parameter was incorrect. That is it.

After we spent some time trying to reinstall and force it through and a support call with Microsoft, we and the support engineers observances were put to the test. The one thing that was noticed first by the support engineer, was that the server names in the Cluster Admin were not the same case (one was UPPER123 and one was lower393). We could not imagine that it could be that, but were willing to try anything. So with a little registry editing, we got the names in the Cluster Admin to be UPPER and LOWER. Then we attempted the install of SQL Server.

To our amazement, the install completed. We completed our install and then for good measure we rebooted the machines. Again to our amazement, the cluster would not start.

To finish a long story, we unclustered, ripped SQL out by hand and took the lower machine and removed it from the domain and renamed it with UPPER case and then rejoined the domain and reclustered and reinstalled SQL Server, and all was well.

I am going to repro that today in a test environment so that I can make sure that it repros, and then I am going to file a bug.

Any ideas from others that have done this, please feel free to comment.

Read More