We have a customer that is using SQL Server 2000, and experiencing the following intermittent problem -- occasionally templog.ldf will grow and grow until it fills the entire disk.
This is a rare problem, it has happened less than 10 times in the past two years, for a job that runs once a minute. But it has happened on 3 different SQL Servers, (two production servers, and one test server).
Our suspicion is that the root cause is an OPENQUERY() to pull data from a linked server (Oracle database on Unix). We have seen that these OPENQUERY() statements occasionally hang and cannot be killed. The OPENQUERY() is used to populate a local table, and when we originally populated permanent tables with OPENQUERY(), then there would be a lock on this permanent table and nothing could be done until SQL Server was stopped and restarted.
To workaround the immediate problem, last year we modified all OPENQUERY() statements so that the local table was a #temp table. In that case, cancelling the job left the hung process, but the hung process had a lock on a #temp table and so that didn't keep the next execution of the job from creating a new #temp table and populating it.
However... it appears that the hung process, because it leaves an SPID with an open transaction, keeps the tempdb log file from being truncated. And eventually you run out of disk space, no matter how much disk space is on the server.
We have tried using "kill" on the SPID with the open transaction, but that doesn't work - the process never dies (we waited several days) and the transactions are never rolled back.
We would like to resolve this issue, because it has started coming back with some regularity, and of course it usually happens on nights or weekends. Plus there is the problem of scheduling a stop/restart of SQL Server on a production server, which we would like in all cases to avoid.
Here is some data from the most recent instant of this problem (test server):
Product version: 8.00.760
Product level: SP3
Edition: Standard Edition
Version: Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Standard Edition
on Windows NT 5.0 (Build 2195: Service Pack 4)
SELECT * FROM master.dbo.sysprocesses returns this suspicious SPID:
waitresource: _name of Oracle linked server_
login_time: 2006-08-07 18:53:04
last_batch: 2006-08-07 18:53:04
program_name: SQLAgent - TSQL JobStep ...
nt_domain: NT AUTHORITY
loginame: NT AUTHORITY\SYSTEM
Note 1: This data is from 2006-08-09, so the SPID has been hanging around for almost 2 days.
Note 2: We did not attempt to KILL this SPID. In the past when we have done so that changes the status of the SPID but even after waiting several days does not get rid of the SPID & open transaction.
Note 3: These OPENQUERY() statements normally run in 1-5 seconds, so it does not seem likely that it takes more than 2 days to roll this transaction back when we attempt to kill it.
Note 4: From a web search, I have found other people with similar problems but no resolution.
Note 5: We have consciously not installed SQL Server 2000 SP4, because it is our understanding that there is a problem in SP4 with online defragmentation holding locks.
Any suggestions on how to:
1) Kill an SPID like this, when it is waiting for OLEDB
2) Keep templog.ldf from growing without limit when we get a hung transaction like this