Martin Smith gave a great answer on StackOverflow - as he often does - when someone was looking to get a table name from the information in sys.dm_tran_locks.
Here's my take at an answer - trying to cut down on the rows returned to keep the list short in busy databases.
Let's give it a test.
Good enough for now.
So let's commit the transaction in order to release the locks. And I'll have more for you another day...
Wednesday, May 10, 2017
Thursday, April 27, 2017
The system is a 6 vcpu, 48 GB RAM instance of SQL Server 2016 SP1.
I've been running through a lot of checkdb/checktable scenarios lately, been trying to keep up with blogging whenever I find something interesting.
This particular case is interesting because it defies expectations.
Here's my test, including instrumentation. The test subject is that lonely checktable command nestled in there :-) I use tablock because if not tablock or in an explicit snapshot, each time checktable is run a new internal snapshot will be created and won't be able to compare warm vs cold database cache.
So... its pretty verbose, sorry about that. But stuffing @@io_busy, virtual file stats, wait stats, spinlock stats and latch stats into temp tables before and after so they can be compared. I've got the dbcc dropcleanbuffers in there because I ran this scenario many, many, many times to verify I wasn't being thrown off by something. But my results were stable and reproducible.
So I ran the test below with cold database cache. (My database was still; might have to issue a checkpoint before the dropcleanbuffers to ensure the cache really is cold if issuing checktable against a table that's recently been edited.)
OK. Well - here's the output from the checktable and statistics time.
I've got all those temp tables for instrumentation - let's cash the check 😀
All right. Lets test again - this time without dropping clean buffers, so we can compare behavior with warm cache.
Here's the output from checktable and statistics time.
Huh. Almost 11.5 seconds of elapsed time with warm cache, vs just over 6.2 seconds of elapsed time with cold cache. That's unexpected.
Let's see what the temp tables caught - maybe they'll shed some light.
Well. No reads according to virtual file stats - that's what we expected. No latch waits. (That varied a little among test runs, but it was never a significant amount of latch waits.) The spinlocks are uninteresting... I was kind of hoping to find a spinlock-related reason for the warm cache checktable taking longer. No dice.
So what's going on? Well - first of all, take a look at CPU ms between the cold cache and warm cache runs. 34534 for cold cache, 33907 for warm cache. Even though the cold cache run finished substantially faster, it took 627 ms more CPU time to complete. That's less than 2% variance. But its stable in the context of this test. I ran it over and over and although the times shifted around a bit, the cold cache run was always faster but required around 500 more CPU ms to complete.
On the one hand that makes sense: the cold cache run issues 8103 read IOs according to virtual file stats and @@io_busy says 125 milliseconds of CPU time just to issue those IO commands.
But it still finished faster!
As far as I can tell, the secret is in the wait stats - cxpacket in particular. Below are the cold cache wait stats again.
And below are the warm cache wait stats again. The warm cache checktable - which I would have expected to finish faster - somehow incurred almost 3 times as much cxpacket wait time as the cold cache checktable - and over 3 times as much signal wait time for the cxpacket wait type.
The warm cache checktable had less work to do than the cold cache checktable. And, indeed, it did less work from an accumulated CPU time standpoint. But somehow, it didn't divvy up its work as well among the parallel workers. Due to a significant increase in cxpacket wait time and cxpacket signal wait time over the cold cache operation, it finished in almost twice the time!
This is one reason its extremely important to look not just at work, and not just at work + waits... not even at work + waits + management. Appropriate scheduling and distribution of work among resources is also important to keep everyone happy when they want extreme performance.
Wednesday, April 26, 2017
Here's a blog post from May of 2010 describing the longstanding strategy of using the scripting functionality from SSMS to create a statistics clone database in SQL Server.
How to create a SQL Server Clone Statistics Only Database
I wanted to put this scripting option in SQL Server/SSMS 2016 head-to-head against the clonedatabase command to see how everything turns out. Why not do that in front of a crowd of 250 people? (Well - not really. I did run the clonedatabase in front of everybody though.)
Here's what was in the source database. The scripting option won't let me script out the Query Store, of course. But clonedatabase command ill include it unless the option to exclude is specified.
So, to make sure it was apples-to-apples comparison between the 'clone' methods, I toggled a few of the advanced scripting options.
Then I kicked it off. Thank goodness it was several hours before my presentation. I bet a dollar with a colleague over how long it would take to complete. He guessed 56 minutes. I thought 1 hour 25 minutes. Thank goodness it was "Price is Right" rules, cuz we were both way off. It took 2 hours and 2 minutes. Now I'm $1.00 richer.😄
The end result of the scripting exercise was this file clocking in at over 500 mb. I haven't loaded it into a target database yet - I'll report how long it too and resulting database size once I have.
Then... during my presentation late in the day, in front of about 250 people, I launched the dbcc clonedatabase command. It took 37 seconds. Yayy!! 😁
The end result? 1096 mb of transaction log, and 896 mb of row files. (Remember that 666 mb of the data in the row files is Query Store.)
Let's take a look at the inventory of the Query Store. There's a lot of stuff in there.
I will point out that the only new concern (that I'm aware of) with the dbcc clonedatabase command, versus the scripting option is the possibility of sensitive data included in Query Store query plans and query text.
There's a very good post below for security/sensitivity considerations.
Understanding data security in cloned databases created using DBCC CLONEDATABASE
In spite of the security/sensitivity considerations, I'm really excited about the clonedatabase command. Not just because of the incredible speed bump compared to scripting. Also because including Query Store in the clone is a natural fit.
Even if security/sensitivity issues complicate sharing a clone database with Microsoft or other off-site vendors, the value of using a clone database for investigating query shape, cardinality estimates, or even query store investigation decoupled from a production database should be high.
Monday, April 24, 2017
In Oracle, there are 4 histogram types:
Frequency histogram: for low numbers of distinct values. If ndv < bucket count limit, then each ndv gets a histogram bucket. If ndv > limit, frequency histogram is disqualified.
Height-balanced histogram: histogram buckets with approximately the same number of rows in each bucket
Top-n Frequency: lowest value, highest value, n-2 most popular values in between as bucket end-points.
Hybrid: Top-n Frequency not qualified because N end-point values do not cross threshold to qualify.
In SQL Server 2016 (as has been the case for several major versions), one histogram type is available. Of the histogram types available in Oracle 12cR2, it is most similar to the hybrid histogram. For a nullable column, a NULL histogram bucket will be available. For non-null values, up to 200 additional buckets are available. The desired number of buckets cannot be specified. For values of particular significance, or in order to accommodate skew in data, filtered statistics can be created. For filtered statistics to reliably be consulted by the optimizer, a query filter predicate must match the predicate of the filtered statistics.
March 2017 Whitepaper (1.1 mb pdf)
Understanding Optimizer Statistics with Oracle Database 12c Release 2
Three part Oracle Database Histogram series from Jonathan Lewis
Another three part Oracle Database Histogram series from Jonathan Lewis
Two part Oracle Database Histogram series from Mohamed Houri
Friday, April 21, 2017
Requesting the actual plan for the CREATE, I find this in the plan XML. 1197064 as the SerialDesiredMemory, the RequiredMemory, the RequestedMemory.
That's a big grant for something that operates on 0 rows. When creating many CCIs concurrently, that's a problem - queries can timeout.
Might have to use Resource Governor creatively to allow high concurrency CCI creation.
Grant request is smaller if only one column in the CCI. So I guess could create a smaller CCI and add columns. That's also kinda painful. I'll ask on #sqlhelp, see if anyone has something clever...
So far the only things to come up on Twitter were the min_grant_percent and max_grant_percent query hints introduced in SQL Server 2012 SP3. But...
"Note These two new query memory grant options aren't available for index creation or rebuild."
Later I found this while looking around.
"Starting with SQL Server 2016, you can create the table as a clustered columnstore index. It is no longer necessary to first create a rowstore table and then convert it to a clustered columnstore index."
Hmmm. Here's a great post from Pedro Lopes (Twitter @sqlpto), which includes some details for using Extended Events in memory grant investigation.
So far, it looks like creating the clustered columnstore inline with table creation, as below, avoids a memory grant. (That makes sense.) Whereas creating a clustered columnstore index on an empty rowstore table requires a memory grant - at the same size as up to 1 million rows for the table! (That's a surprise.)
If you'll be creating CCIs at fairly high concurrency, you may also want to consider creating the CCI inline with table create, as below, in order to avoid a potential bottleneck in memory grants.