CF Objective Notes - The Top 5 Things You are Doing Today to Hinder Scalability

May 14, 2014

I'll be posting the various notes I take during CF Objective this week. A lot of it is probably a copy/paste from the different slide decks, which will be available via the conference itself. For my own "I learn better this way" reasons, I tend to type a lot of it out in-line with my other notes anyway. Enjoy.

The Top 5 Things You are Doing Today to Hinder Scalability, Dan Wilson

(Dan is always a great speaker. He's got a good combination of interesting info, clear slides and sense of humour.)

scaling = is your system available to the users that want it?
Your brother's construction bus website doesn't need to "scale"
the next social media site needs to scale

the top 5 list
1. monolithic design
2. miss-identification / misuse of shared resources
3. over reliance on shared scopes
4. stupid developer tricks
5. failure to design for cachability

Monolithic Design
"Walls aren't square" for building a house, easy to tell the problem
with software, not so easy
usually the database is part of what makes it monolithic
not easy to add more databases to a system
1 way is to split into a "read database" and "write database"
propagates writes over to the read DB later
put can't do this with ORM because it's not supported
OR
have different databases per client

have to think "what kind of data am I storing? Can I group that data together and wall it off?"
if you have all your data in 1 DB, breaking it apart can be a mess

Can use a Load Balancer w/ a "service oriented model" to have scalability
but unwinding an app after the fact to add this can be difficult

Food trucks
separate people into queues based on what they need, rather than putting everybody in 1 line

Shared resources
planes on a tarmac

Cost of WebServer File
Web Server Thread (assuming it's available)
+
network to file system
+
disk wait time
+
disk seek time
+
transfer time to web server
+
transfer time to complete HTTP request
= total time
None of this is FREE (though often the bits are very tiny, but they add up)

Types of Shared Resources
Memory, CPU, Disk space
if you move 1 bottleneck guess what's behind it...another bottleneck
disk Transfer, etc

Shared Scopes
Session scope is biggest culprit
Sometimes you have to deoptimize a few operations in order to provide a scalable solution (like stop caching a state query in application)

if you want to use a shared scope and want to keep an eye on scalability use a "lookup factory"
init()
destroy()
load()
set() – sets the current user

Avoiding Shared Scopes--

Distributed Cache = session scope clustering that WORKS.

Stupid Developer Tricks –
Looped Queries

EntityToQuery()

LazyMan Pagination
<cfoutput query startrow maxrows>

No encapsulation –
first thing to encapsulate: the database

very difficult to keep database indexing organized
performance can suffer

Caching Layers
lots of styles and products out there

CDN – Akami, Cloudfront etc
pools of servers geographically distributed that cache entire HTTP requests (full page, piece of a page, image, JS file, chunk of JSON, anything you can use to get an HTTP request)

Reverse Proxy Cache –
AOL used to cache everything on their internal network this way (which is why you might see stale data from time to time)

Caching HTTP Responses
request example.com/index.html

Response
Last Modified – cheap way to see if the data has changed
if we don't set this, it's useless.

ETag – hashed version of Last Modified (kind of)
another way to verify "freshness" of content

Designing for a CDN or Reverse Proxy Cache
(if you use the correct HTTP verbs, caching is easier – use POST for sending data and GET for getting data)
-use HTTP "get" for cachable content
-use Last Modified and ETag headers properly
-encapsulate code that changes content to centralize cache purge activity
-Personalize with Cookies/JavaScript (i.e. "hi Dan" on the pages)
-Set up shadow DNS to bypass caching
-use JavaScript for interactions
-design data for high cache to hit ratios – no point caching content no one's going to use
Better to load the whole magazine at once, rather than loading the pages 1 by 1

NoSQL as a cache
Can "unwind" data out of SQL server and store it in NoSQL and use that as the cache

Encapsulation – the choices you make today will be wrong at some point in the future. 30 years ago we were all COBOL programmers.

Designing for a Nosql CACHE
Encapsulate calls to nosql engine
Encapsulate data publish methods
Design content purge algorithms well
ensure the nosql engine is the authoritative source for the data being stored

key / value stores

encapsulate for key / value store cache
encapsulate data access calls so you can purge / reload
avoid cffunction output = true
choose correct cache expire stategy (time, LRU)
easy to integrate with CFORM
use to remove load hotspotting

Facebook and Twitter are big key/value store users
they do mysql then key/value stores for caching it all to avoid disk writing, synchronization, etc.

Increasing load time by 2 seconds increases abandonment by 8%
if you make people wait, they will go away

SQUID for testing
JMeter
Fusion Reactor

for testing data caches just write your own CF pages and hit the cache
cfquery cache is ok but can only fit 70 queries in the cache...is that a hard-coded number? Can I change it in CF Admin somehow?

MemcacheD – distributed cache that Facebook and Twitter like a lot
just a key/value store
C program
can give it an immense amount of memory

ehcache – the 1 in CF, very sophisticated
has garage collection so once you get above 4 – 6 GB of ram, the garbage collection starts to affect this and it isn't quite as useful

Dan's contact info:
nodans.com
twitter @danwilson