ColdFusion Summit Notes: SQL, I Learned Enough to Break Everything - Dave Ferguson

Execution Plans

How to get to your data faster
Query creates an execution plan
Used by database to find an optimized path to the data
Can view the execution plans
In MS SQL, Management Studio ships with a built-in graphical plan viewer

What does the query plan actually tell you?

The path of how it got to the data
Ala a Java stack trace
Index usages - how the indexes are being used (or not)
Cost of each section of the plan
Possible suggestions on how to make it better
Other statistical info

In MS SQL, uses a shared cache for execution plans

What causes plans to get flushed from the cache?

Forecd via code
Memory pressure- ran out of memory
Alter statements
Statistics updates
“Auto_update statistics on” on a table will do it too

Utilizing the plan cache is crucial in some large systems

Ways to utilize the cache better

Get more RAM (and configure the database to use it)
Use cfqueryparams
Used stored procedures
Use fewer queries if possible

Execution plans are great
However there is a downside

SELECT * FROM TABLE WHERE ID = 2
SELECT * FROM TABLE WHERE ID = 3

2 execution plans will get created because the queries are not the same
The variant is the “id”
So we need 2 query plans

Use QueryParams, they solve that

SELECT * FROM TABLE WHERE ID = ?

? = placeholder for the ID, same query plan can get used for both queries

Execution plans can cause less than expected results
When your data struct changes (“ALTER” statements)
When your data volume grows quickly
Odds are statistics get outdated quickly and are updated often
When your data has a high degree of cardinality
Cardinality - the uniqueness of your data
High cardinality - columns with values that are very uncommon/unique - id numbers, email addresses, user names

Managing large amounts of data

Only return what you need (don’t do “SELECT *”)
Try to page the data in some fashion
Either in the database, or pull all the data, then cache it in the browser somehow
Optimize indexes to speed up “where” clauses
Avoid using triggers on large volume inserts
Reduce any post query processing as much as possible
Let the database do as much as it can before you get the data handed to the web server

Inserting / updating large data sets

Reduce calls to the database by combining queries
Use bulk loading features of your db
Use XML / JSON to load data into the database

Combining Queries

Don’t do queries in a cfloop - duh
Speed comes with some inherent dangers
Errors could cause the whole batch to fail
Overflowing allowed query string size
Database locking can be problematic
Difficult to get any usable result from the query

Faster processing

Reduced network calls to the db
Processed as a single batch in the db
The processing time is generally faster
Databases like “1 big thing” more than “thousands of little things”

Indexes

Unique Index - the primary key on your table
Clustered - the way the data is physically stored, table can only have one of these (in MS SQL, others can have more that one)
Non-clusterd - contains pointers back to the actual data, a little slower than a clustered index

What should you index?

Do index:

Large datasets where 10 - 15% of the data is usually returned
Columns used in WHERE clauses with high cardinality (username, email, etc)
Think “user name” style column where values are unique

Don’t index:

Small tables (under 100 rows)
Columns with low cardinality, lots of dupes (status columns, etc)
Any column with only a couple of values
Takes more time to use index on these than to just access the table directly

Seeking and Scanning

Index scan / table can
Touches all rows
Generally bad
Use only if the table contains a small amount of data

Index seek

What you really want
Only touches rows that qualify
Useful for large data sets or highly selective queries

Even with an index, the optimizer may still opt to perform a scan

Do i need an index?
it depends.
different aspects of your data, etc.

Performance Impacts

Processor

Give SQL Server process CPU priority
It will be the #1 thing when the CPU starts overloading
Any normal sever operation won’t impact the database
Watch for other procs on the server using excessive CPU cycles
Have enough cores to handle your database activity / load
Keep average proc load below 50% so the system can handle spikes gracefully
Don’t want it to spike to 100%

Memory / RAM

Get a ton of RAM, RAM is cheap
Make sure you have enough RAM to keep the server from doing excess paging
You fix paging by adding RAM
Make sure your DB is USING the RAM on the server
Allow the DB to use RAM for cache (instead of the hard drive)
Look for other procs using excessive RAM

Drive I/O

Drive I/O is usually the only moving part on a server - slowest piece in the chain
If you can, use solid state drives
They used to be bad for databases, nowadays they’re better
Make sure you don’t run out of space for the DB
Purge log files as needed
By default SQL log files are sent to “Unlimited growth”, don’t do that - cap their size
Don’t store all DB and log files on the same physical drives as the database
On windows, Don’t put your DB on the C: drive

Log files are in a constant “write” state
Database is in a constant “read” state usually
Hard drives can only do 1 at a time
So split these up

Log drives should be in “write priority mode” - write ops take precedent, so write ops run faster
Data drives should be in “read priority mode”
(Only useful on spinning disk drives, SSD makes it less important)

Network Stuff

Only matters if DB and app aren’t on the same server
They shouldn’t be on the same server
Do your best to minimize the network hops between the servers
The more hops, the slower things run
Watch for network spikes that hurt data retrieval

If you can, put dual NICs in your servers, and have SQL traffic isolated to 1 NIC, so general network traffic doesn’t impact DB traffic
Network can affect performance if loading large amounts of data on a client
Want to separate that all as best you can

Offloading / Caching

Use a secondary database for reporting
Most of the time, reports don’t need “real time” data
Reports are generally pretty heavy on a database

Store static lookup data (things that never change - countries, states) on the local server
CouchDB, mongoDB, redis, SQLlite
No benefit to storing this in the main SQL database

Cache as much app data as best you can
But make sure you have a way to flush it out when necessary

Important Statistics

Recompiles of a stored procedure are generally caused by memory issues
You should never see a “recompile”

Latch Waits

Low level lock inside DB
Should be < 10ms
If latch wait is over 100ms, the database is probably dying

Lock Waits

Amount of time your database is waiting for locks to clear
Want this as low as possible

Full scans

Select queries not using indexes

Cache Hit Ratio

How much you're actually USING the cache vs going to get the data again
Want this number as high as possible

Disk read / write times
SQL proc time
SQL memory