SQL Server 2014 の新しい基数推定について自分で一から勉強するには少し大変なので、2014年4月に公開されたホワイトペーパーである『 Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator 』に頼りつつ勉強してみた - 都内で働くSEの技術的なひとりごと / Technical soliloquy of System Engineer working in Tokyo

　SQL Server 2014 で基数推定が新しくなりました。2014 がリリースされてから、一か月ほど経ちますが、まったく勉強していませんでした。ゴールデンウイーク中に時間が取れたので、『 Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator 』のホワイトペーパー ( Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator.docx ) を参考にしつつ勉強してみました。※このホワイトペーパーは、新しい基数推定についてかなり参考になります。興味のある方は、是非じっくり読んでみてください。

　今回は、偉大なこのホワイトペーパーの力をお借りして ( 手抜きともいいますwww)、色々実験してみます。サンプルに使用するのは、いつもの『 AdventureWorks2012 』データベースです。ホワイトペーパーの P12 くらいからがわかりやすいですかね。下記のクエリを実行してみましょう。( 互換性レベルをSQL Server 2012 とした場合です。 )

USE [master];
GO

ALTER DATABASE [AdventureWorks2012] SET COMPATIBILITY_LEVEL = 110;
GO

USE [AdventureWorks2012];
GO

SET STATISTICS XML ON;

-- Legacy
SELECT	[AddressID],
	[AddressLine1],
	[AddressLine2]
FROM Person.[Address]
WHERE [StateProvinceID] = 9 AND
	[City] = 'Burbank';

-- New CE
SELECT	[AddressID],
	[AddressLine1],
	[AddressLine2]
FROM Person.[Address]
WHERE [StateProvinceID] = 9 AND
	[City] = 'Burbank'
OPTION (QUERYTRACEON 2312);
GO

SET STATISTICS XML OFF;

f:id:koogucc11:20140503233652p:plain

　実行プラン上だけでは、何の変化もありません。推定行数の差を見ることで違いが判断できます。新しい基数推定のほうが、実際の行数に近い値になっており、より正確に推定していることが判断できます。

※SQL Server 2012 以前

f:id:koogucc11:20140503234411p:plain

※SQL Server 2014

f:id:koogucc11:20140503234427p:plain

　ただし、SQL Server 2014 の基数推定が万能ではないので、現行システムで動作しているクエリはすべてテストする必要があります。下記のような感じですね。

No change in performance

Improved

Performance

Degraded

Performance

No changes to estimates or the query execution plan

No action necessary

No action necessary

Not related to the new CE, but general performance tuning may be required

No changes to estimates but the query execution plan changed

No action necessary

No action necessary

Unlikely to be related to the new CE, but general performance tuning may be required

Changes to the estimates but not the query execution plan shape

No action necessary

No action necessary

Action related to the new CE may be necessary if the degradation exceeds workload performance service level agreements

Changes to the estimates and the query execution plan shape

No action necessary

No action necessary

Action related to the new CE may be necessary if the degradation exceeds workload performance service level agreements

　基数推定の何が SQL Server 2014 で変更されたのかは、P19 ページ以降に記載があります。

In this section, we discuss the primary changes made to the CE component for SQL Server 2014. The legacy CE’s default behavior has remained relatively static since SQL Server 7.0. The SQL Server 2014 changes address performance issues for key scenarios based on benchmark and customer workload testing. The new CE design also lays the groundwork for future improvements.

While changes to the CE can improve overall workload performance, on average, over a range of applications, some queries may still regress. If this occurs, see the troubleshooting method section of this paper for potential solutions.

　そうなんですね。Legacy な Cardinality Estimator は 7.0 から大幅に変化はないんですね。新しくなっても、まだまだ課題はあるようです。きっちり検証する必要がありますね。

　次に、SQL Server 2014 における行数の推定方法を見てみましょう。下記のクエリを実行してみます。

USE [AdventureWorks2012];
GO

SELECT	[AddressID],
	[AddressLine1],
	[AddressLine2]
FROM Person.[Address]
WHERE [StateProvinceID] = 9 AND
      [City] = N'Burbank' AND
      [PostalCode] = N'91502'
OPTION (QUERYTRACEON 9481); -- CardinalityEstimationModelVersion 70
GO

f:id:koogucc11:20140504003955p:plain

　実際の行数が 194 行、推定行数が 1 行となっています。かなりの差がありますね。

f:id:koogucc11:20140504004214p:plain

　この原因は、レガシーな基数推定にあります。レガシーな推定行数の公式は下記の通りです。Pnは密度を表し、P0は一番選択性の高いものになります。

推定行数 = 総行数 / ( 1 / ( P0 * P1 * P2 ... * Pn ) )

　それぞれの列密度は下記の通りです。密度はそれぞれの列統計を作成することで判断できます。

StateProvinceID：0.009891
City：0.009993
PostalCode：0.232691

　上記の値から推定行数を計算すると、

19614 / ( 1 / ( 0.009891 * 0.009993 * 0.232691 ) ) = 0.451109368

となります。これにより、従来の SQL Server は選択性が非常に高くなってしまい、推定行数を 1 と判断してしまいます。これが、新しい基数推定になると、下記の通り結合選択性の高い算出方法に変更されます。P0 が選択性が一番高く、P1、P2、Pn段々と選択性が低くなっていきます。

推定行数 = 総行数 / ( 1 / ( P0 * P1^(1/2) * P2^(1/4) ) ... Pn^(1/2^n) ) )

　上記の公式に当てはめて計算を行うと、

19614 / ( 1 / ( 0.009891 * 0.009993 ^ ( 1/2 ) * 0.232691 ^ ( 1/4 ) ) ) = 13.46942986

となります。実際に実行した結果は下図の通りです。

f:id:koogucc11:20140504010505p:plain

f:id:koogucc11:20140504010516p:plain

　そのほかに、Join、Distinct のアルゴリズムも変更されています。拡張イベントと連携した診断機能なども新しく追加されています。基数推定周りは色々とおもしろそうなので、今後も取り上げていきたいと思います。

※今回の記事では統計情報の見方・考え方はかなり省略した説明になっています。このあたりを理解するには、

Statistics - 都内で働くSEの技術的なひとりごとを参照するとより理解が深まります。( 深まるはずですww )

※統計まわりこれで勉強するとよいです。私も Kindle 版もってますー。

完全独習統計学入門

作者: 小島寛之
出版社/メーカー: ダイヤモンド社
発売日: 2006/09/29
メディア: 単行本（ソフトカバー）
購入: 215人クリック: 3,105回
この商品を含むブログ (112件) を見る

※これ、本当にほしい。

Microsoft SQL Server 2014 Query Tuning & Optimization

作者: Benjamin Nevarez
出版社/メーカー: McGraw-Hill Osborne Media
発売日: 2014/09/26
メディア: ペーパーバック
この商品を含むブログを見る

	No change in performance	Improved Performance	Degraded Performance
No changes to estimates or the query execution plan	No action necessary	No action necessary	Not related to the new CE, but general performance tuning may be required
No changes to estimates but the query execution plan changed	No action necessary	No action necessary	Unlikely to be related to the new CE, but general performance tuning may be required
Changes to the estimates but not the query execution plan shape	No action necessary	No action necessary	Action related to the new CE may be necessary if the degradation exceeds workload performance service level agreements
Changes to the estimates and the query execution plan shape	No action necessary	No action necessary	Action related to the new CE may be necessary if the degradation exceeds workload performance service level agreements