ColdFusion Summit Notes: SQL, I Learned Enough to Break Everything - Dave Ferguson
October 07, 2019
Execution Plans
- How to get to your data faster
- Query creates an execution plan
- Used by database to find an optimized path to the data
- Can view the execution plans
- In MS SQL, Management Studio ships with a built-in graphical plan viewer
What does the query plan actually tell you?
- The path of how it got to the data
- Ala a Java stack trace
- Index usages - how the indexes are being used (or not)
- Cost of each section of the plan
- Possible suggestions on how to make it better
- Other statistical info
In MS SQL, uses a shared cache for execution plans
What causes plans to get flushed from the cache?
- Forecd via code
- Memory pressure- ran out of memory
- Alter statements
- Statistics updates
- “Auto_update statistics on” on a table will do it too
Utilizing the plan cache is crucial in some large systems
Ways to utilize the cache better
- Get more RAM (and configure the database to use it)
- Use cfqueryparams
- Used stored procedures
- Use fewer queries if possible
Execution plans are great
However there is a downside
SELECT * FROM TABLE WHERE ID = 2 SELECT * FROM TABLE WHERE ID = 3
2 execution plans will get created because the queries are not the same
The variant is the “id”
So we need 2 query plans
Use QueryParams, they solve that
SELECT * FROM TABLE WHERE ID = ?
? = placeholder for the ID, same query plan can get used for both queries
Execution plans can cause less than expected results
When your data struct changes (“ALTER” statements)
When your data volume grows quickly
Odds are statistics get outdated quickly and are updated often
When your data has a high degree of cardinality
Cardinality - the uniqueness of your data
High cardinality - columns with values that are very uncommon/unique - id numbers, email addresses, user names
Managing large amounts of data
- Only return what you need (don’t do “SELECT *”)
- Try to page the data in some fashion
- Either in the database, or pull all the data, then cache it in the browser somehow
- Optimize indexes to speed up “where” clauses
- Avoid using triggers on large volume inserts
- Reduce any post query processing as much as possible
- Let the database do as much as it can before you get the data handed to the web server
Inserting / updating large data sets
- Reduce calls to the database by combining queries
- Use bulk loading features of your db
- Use XML / JSON to load data into the database
Combining Queries
- Don’t do queries in a cfloop - duh
- Speed comes with some inherent dangers
- Errors could cause the whole batch to fail
- Overflowing allowed query string size
- Database locking can be problematic
- Difficult to get any usable result from the query
Faster processing
- Reduced network calls to the db
- Processed as a single batch in the db
- The processing time is generally faster
- Databases like “1 big thing” more than “thousands of little things”
Indexes
- Unique Index - the primary key on your table
- Clustered - the way the data is physically stored, table can only have one of these (in MS SQL, others can have more that one)
- Non-clusterd - contains pointers back to the actual data, a little slower than a clustered index
What should you index?
Do index:
- Large datasets where 10 - 15% of the data is usually returned
- Columns used in WHERE clauses with high cardinality (username, email, etc)
- Think “user name” style column where values are unique
Don’t index:
- Small tables (under 100 rows)
- Columns with low cardinality, lots of dupes (status columns, etc)
- Any column with only a couple of values
- Takes more time to use index on these than to just access the table directly
Seeking and Scanning
- Index scan / table can
- Touches all rows
- Generally bad
- Use only if the table contains a small amount of data
Index seek
- What you really want
- Only touches rows that qualify
- Useful for large data sets or highly selective queries
Even with an index, the optimizer may still opt to perform a scan
Do i need an index?
it depends.
different aspects of your data, etc.
Performance Impacts
Processor
- Give SQL Server process CPU priority
- It will be the #1 thing when the CPU starts overloading
- Any normal sever operation won’t impact the database
- Watch for other procs on the server using excessive CPU cycles
- Have enough cores to handle your database activity / load
- Keep average proc load below 50% so the system can handle spikes gracefully
- Don’t want it to spike to 100%
Memory / RAM
- Get a ton of RAM, RAM is cheap
- Make sure you have enough RAM to keep the server from doing excess paging
- You fix paging by adding RAM
- Make sure your DB is USING the RAM on the server
- Allow the DB to use RAM for cache (instead of the hard drive)
- Look for other procs using excessive RAM
Drive I/O
- Drive I/O is usually the only moving part on a server - slowest piece in the chain
- If you can, use solid state drives
- They used to be bad for databases, nowadays they’re better
- Make sure you don’t run out of space for the DB
- Purge log files as needed
- By default SQL log files are sent to “Unlimited growth”, don’t do that - cap their size
- Don’t store all DB and log files on the same physical drives as the database
- On windows, Don’t put your DB on the C: drive
Log files are in a constant “write” state
Database is in a constant “read” state usually
Hard drives can only do 1 at a time
So split these up
Log drives should be in “write priority mode” - write ops take precedent, so write ops run faster
Data drives should be in “read priority mode”
(Only useful on spinning disk drives, SSD makes it less important)
Network Stuff
- Only matters if DB and app aren’t on the same server
- They shouldn’t be on the same server
- Do your best to minimize the network hops between the servers
- The more hops, the slower things run
- Watch for network spikes that hurt data retrieval
If you can, put dual NICs in your servers, and have SQL traffic isolated to 1 NIC, so general network traffic doesn’t impact DB traffic
Network can affect performance if loading large amounts of data on a client
Want to separate that all as best you can
Offloading / Caching
- Use a secondary database for reporting
- Most of the time, reports don’t need “real time” data
- Reports are generally pretty heavy on a database
Store static lookup data (things that never change - countries, states) on the local server
CouchDB, mongoDB, redis, SQLlite
No benefit to storing this in the main SQL database
Cache as much app data as best you can
But make sure you have a way to flush it out when necessary
Important Statistics
Recompiles of a stored procedure are generally caused by memory issues
You should never see a “recompile”
Latch Waits
- Low level lock inside DB
- Should be < 10ms
- If latch wait is over 100ms, the database is probably dying
Lock Waits
- Amount of time your database is waiting for locks to clear
- Want this as low as possible
Full scans
- Select queries not using indexes
Cache Hit Ratio
- How much you're actually USING the cache vs going to get the data again
- Want this number as high as possible
Disk read / write times
SQL proc time
SQL memory