The above error may sometimes show up for a cluster in the Alert Log, accompanied by sporadic connection timeouts and the
HEALTH ALERT : DB server not reachable error as well. An example of the message
CID: 1, Client IP: , User: Unknown, Debug_Code: 525, Message: HEALTH ALERT : DB server not reachable (TDBSQL12:1433) outbound(172.xx.xx.xx) Attempts:(0/3): connect() failed: Connection timed out, State: -1, SSID: 0, DB: , DB IP: , Type: 37
CID: 1, Client IP: , User: Unknown, Debug_Code: 527, Message: DB INSTANCE RESOLUTION ALERT: The port against database instance 'TDBMSSQL20' on db '172.xx.xx.xx' changed. (Old:1433, New:0), State: -1, SSID: 0, DB: , DB IP: , Type: 37
These alerts show up when the SQL Browser service running at the database is unable to respond in a timely manner to ScaleArc, resulting in ScaleArc closing the port it had open to receive the response from the SQL Browser service. SQL Browser service is used to resolve database instance names to ports, and the performance of this service can degrade if the load on the database server is high leading to response times in the order of 1-2 seconds whereas typically the service responds within 10ms.
The issue can be addressed by reconfiguring the cluster to use port numbers instead of instance names so that the name resolution step can be bypassed. To do this follow these steps:
- Take a backup of the configuration database:
cp /system/lb_1.sqlite /system/lb_1.sqlite.bck
- Find out the port configured for the cluster. This can be discovered from a cluster in a parallel environment (Production, Pre-production, QA) or from the error message which has a block (Old:1433, New:0) stating the old port number which is the port number that the cluster is trying to connect on.
- Using sqlite3, execute the following update query on the configuration database:
update lb_servers set port = <port number found in previous step>, instancename = <port number found in previous step>;
- Re-start the cluster after these changes are made.
Once the cluster restarts, the port number should be visible on the UI as shown here (port updated to 1433) and the errors will also stop showing up in the Alert Log.