SQL Saturday Exeter was great fun and a good learning experience. If you are a data professional working with SQL Server, you should get yourself along to the next one in 2016 and get involved!
This is not polished enough to be considered a full review of SQL Saturday Exeter. It’s just my mental notes committed to page before I forget. I might tidy this up later if I get time :-)
Friday was a full day’s workshop on using Red Gate’s new solution for Database Lifecycle Management. Part 3 of a trilogy (part 1 is source control, part 2 is continuous integration).
This session was the compelling business reason to come to Exeter. At work we are planning the next phase of our automatic database deployment strategy. We already trust Red Gate’s database development tools, and we want to build on that investment.
It’s early days, but to me it looks like SQL Release is a winning product, and the missing piece of our database deployment puzzle.
I hope to write more about this soon :-)
Yarr! Plastic dreads, gutrot rum, comical accents, gold (chocolate) coins, and FOOT JENGA!
“What has spinach got to do with database development?”
As it turns out, they both have data quality issues in common. People believed for generations the bad data about the iron content of spinach.
Your most important job as a database professional is to defend against bad data. If you don’t do this, anything else you do will just be helping to deliver the wrong answer more quickly.
The keynote was delivered as a video and published online, so you can watch it again from the comfort of your own bed! At just six minutes long, you could even watch it over a coffee break.
What can you do to tune SQL Server when you don’t have control over the database code? Actually quite a few things.
Richard reminded us that the server’s main job is to respond to requests for data. That data can be retrieved from three physical places: CPU cache, memory, and disk. Each subsequent location is roughly an order of magnitude slower to access than the one before. (Read Jeff Atwood‘s post on relative latencies for a great general overview of the fundamental problem).
We had an overview of the server-side lifecycle of a database query, from initial receipt of the SQL query to the final submitting of a TDS (tabular data stream) response.
A key stage of this lifecycle is when SQL Server looks in the plan cache.
Richard showed us how to reduce cache bloat with a server configuration option for storing a “stub” plan the first time around.
He pointed out that the query text must match exactly to the text stored in the cache for the cached plan to be usable. Coding standards might seem onerous, but in this case they may relieve memory pressure!
You can monitor the plan cache with an odd but effective query from Jonathan Kehayias. Odd because it use the WITH XMLNAMESPACES clause (new to me!) and XQuery, effective because you can pull out missing index information from the each XML plan into a single database table.
A related trick is that we can append OPTION(RECOMPILE) to the end of our diagnostic queries to stop them being stored in the plan cache. This defends against bloating the cache and also pushing out more important plans.
SQL Server has a threshold at which any query may be run in parallel. In theory parallelism can increase performance, but in practice the server’s default threshold is “not fit for purpose” for today’s workloads.
Before you change it, you can review the cost of the cached plans to find the ones whose plans may change.
A one-hour version of parts one and two of the DLM sessions. Alex discussed some of the reasons that database lifecycle management is so hard, and demoed some Red Gate tools that remove some of the major pain points.
Some of the issues in DLM are inherent to the fact that databases contain state that you must keep. You can’t upgrade a database by throwing the old one away and replacing it with the new version.
SQL is a (the only?) strange language in that the person doing the deployment often knows more about the language than the person who wrote the code!
Some of the issues are caused by working culture and poor communication. It boils down to “if it’s not in version control, it doesn’t exist.”
The audience was already comfortable with version control, so we went straight ahead with a demo of versioning, packaging and publishing a database project.
Demo setup: Windows, SVN, TeamCity, SQL Source Control, SQL Test, SQLCI plugin for TeamCity
- Use SQL Source Control to script out all the database objects into a new folder in SVN
- Add a new VCS root in TeamCity for the new folder
- Add a new build step to the existing build configuration that connects to the VCS root, compiles the database code , and packages the database
- Add a second build step that runs the unit tests in the package
If you have been in IT for long enough you will recognise a pattern. The shift from mainframe to client-server was a lot like today’s shift from on-prem to cloud.
The reality for many organizations is today a hybrid model. Some data and applications must stay on-prem for various reasons, but it makes sense for others to go into the cloud simply because it is more cost-effective.
Machine Learning is one of the services offered on Azure (Andrew Fryer’s Machine Learning session was on at the same time). To get the most from machine learning, you need to have a solid understanding of statistics. There are many good books now that approach statics not from a dry mathematical perspective but a practical, data professional perspective. Chris recommended “Statistics for Dummies” as a good starting point.
Chris demoed how to provision SQL Server in the cloud. Once you already have an Azure account, it’s as easy as logging in, filling in a few forms with your system requirements and credentials, and waiting a few minutes for the machine to be provisioned. It looks awesome, and I can’t wait to get the chance to play with this.
Someone asked about the DBA’s role in the future. The blunt answer is “Cloud is no excuse for bad code. With all the time we save on provisioning, now we can focus on adding value through performance tuning”.
I’m halfway through a domain migration project at work, so I came to this session to pick up any tips about upgrades that might also apply in this situation.
The session was an overview of the high-availability and disaster recovery features of SQL Server and how you can use them to implement an upgrade strategy. Backup/restore, log shipping, clustering, replication, and mirroring all have their place.
Unfortunately the demo machine was down (!), but we had some interesting discussion nonetheless.
The best part of this was how to define availability. Yeah, people like to talk about “five-nines availability”, but in what context? Not everyone is a 52 x 24 x 7 business (five mins downtime allowed per year).
If you are more like a 48 x 12 x 5 business, you can measure your five nines within this window, which sounds tighter (seconds of downtime allowed), except that you now have all that out of hours time to do any maintenance work!
And of course, planned, discussed downtime that happens in business hours does not count towards the SLA. The SLA measures unplanned outage time.
To complete the domain migration, we’ll need to do load testing to do a before-and-after comparison of performance.
By the end of the day my brain was getting pretty fried from all the new stuff (and sleep loss!) but I managed to hang on long enough to get some good stuff from this one too.
Neil demoed how the Distributed Replay features in SQL Server 2012 allow you to replay captured trace files so that you can recreate the conditions for a particular performance testing scenario.
There is a quite a lot of setup and preprocessing to make distributed replay work with the trace files, and he demoed that as well. There are some command-line tools for the distributed replay client you can use to this.
Post-Session Chat and Chillout
Prize draws, curry, beer, bed!
Also had a interesting chat with people from Micron, an SSD manufacturer. It was their first SQL Saturday. I hope they felt welcome!
We were wondering if there is ever a use case nowadays for spinning magnetic disks over SSDs. All we could think of was, if you are wanting to store the data and don’t really care about the access times, then it will surely be cheaper per gigabyte to store it.
Maybe we should give up on magnetic altogether and just store slow data on blu-ray like Facebook started doing!