Deadlocks … Part 2

So, how do we prevent deadlocks?  Well, there are a number of different ways to go about doing it, so we’ll only concentrate on a couple.

Small.  This may seem obvious, but it is one of the most overlooked items in the book.  The smaller the transaction (the fewer rows that have been updated, inserted or deleted) the less likely you are to trample on someone else.  The fewer locks you hold the better off you’ll be.  One of the biggest problems in this regard is applications that update a row, even when it doesn’t need to be updated.  The application follows a specific path and part of that path says that Row X in Table Y needs to have certain values.  The row is not checked to see if it has those values, it is just automatically updated.  Well, that placed a lock on that row even if the values remained the same.

Fast.  This may also seem obvious, but the shorter the amount of time you hold those locks the less likely you are to have a problem with deadlocks.  One of the classic problems here is that the application does something to the database right away, say updating a table indicating where the user is at in a process, and then does a lot of calculations or a lot of work and then does more updating at the end, interspersed with a smattering of updates / inserts / deletes.  The locks are held from the first update to the final commit and as that length of time increases, the more likely it is that deadlocks will occur.

Application Developer Guide.  OK, this one is a little esoteric, so let me explain.  One of the ways that deadlocks can happen is if one part of the application updates tables in this order – Table A, Table B, Table C – while another part of the application updates tables in the reverse order.  By explaining the order in which tables should be updated in the application’s developer guide everyone on the project will know the order that they should be doing things.  Changing the order should be something that is discussed with the DBA prior to being implemented as changing the order may impact more than just deadlocks.

As usual, there are dozens of other things to take into account, but these are some of the items with the biggest payback.

Hot Fixes

What is a hot fix?  This question seems to be coming up  more often and I think it needs a bit of discussion in this arena.  Definitions of hot fix that I have seen include:

  • A hotfix is code (sometimes called a patch) that fixes a bug in a product. (Source)
  • Microsoft’s term for a bug fix, which is accomplished by replacing one or more existing files (typically DLLs) in the operating system or application with revised versions. (Source)

I think we can all agree that a hot fix is something that fixes a bug.  The question now arises as to the size of the patch.  The second definition is important in this aspect as it talks about replacing one or more DLLs.  So, a hot fix will fix a bug by replacing an indeterminate number of DLLs.  Darn it, I’ve used that word again: replacing.  That happens to be the crux of the problem that we are experiencing.

Replacing DLLs does not mean the uninstaling of the entire application and the installation of a new version of the application which has the bug fix inside.  This is simply an install of the application.  A hot fix would take the DLLs that were changed, package those up and install those on affected machines.  This is standard practice used by Microsoft, IBM, Sun, Oracle, Hewlett Packard, PeopleSoft, SAP, Symantec, Trend Micro, Adobe, Electronic Arts, Intuit, AutoDesk Check Point, and, quite literally, millions of other companies.  You don’t re-install Windows every time there is a hot fix for Windows.  You don’t re-install your anti-virus software every time there is an update to the software.  You don’t re-install your entire application because there is a spelling mistake on a page.

If you are asking for a migration to a Shared Environment, and you are essentially asking us to install a new version of the application, don’t call it a hot fix, as you are disagreeing with the vast majority of the IT world and the definition that the Deployment Team uses for a hot fix.  A hot fix replaces DLLs.  By packing everything up into a new install for the application you are potentially including other changes in your fix that are not related to the bug you are trying to install.

Deadlocks … Part 1

Deadlocks are an interesting condition within the database.  In very simplistic terms, it occurs when two people are both trying to change data that the other person has already changed.  For instance, Person A has already updated Row 1 in Table 1 and now wants to update Row 2 in Table 2.  However, Person B has already updated Row 2 in Table 2 and is trying to update Row 1 in Table 1.  As you can see, unless someone gives in they could sit there all day.  The database is the arbiter in this case and makes a decision as to which person is going to get the error message when their transaction is canceled.

Now, you may have heard the phrase “Deadlocks are a natural part of application processing and are not a large concern.”  While I do not advocate violence, the person who says this should be slapped in order to knock some sense into them.  Deadlocks are not a natural par tof an application.  I previously worked on a web-based system that had, as regularly peaked at over 650 simultaneous users, with over 250 of those being “hard core” users. This is 10 times the size of any application we have currently running.  If we got a single deadlock during the day we had to investigate why the deadlock occurred and determine if the deadlock could be prevented.  We would go for weeks, or even months without a deadlock, but when one occurred we dropped everything and investigated the problem.

Why do we get deadlocks in an application?  Sometimes it is because in one part of an application we update Table 1 and then Table 2, but in other parts we update Table 2 and then Table 1.  This is a disaster waiting to happen.  In other circumstances we have background tasks running that are not properly tuned and they try to update too much at once before committing their data.  This is also another disaster.

There are a number of surprisingly simple and, to most people, obvious strategies to use that will eliminate deadlocks to the “rare” occurrence that they should be.  We’ll discuss those tomorrow.

A Common Phone Call

Occasionally we receive calls at the Deployment team that follow this sort of pattern:

“My application isn’t working any more. It was fine yesterday, but today it’s broken.”

“Well, that seems a little strange as there were no changes to the environment last night.”

“Oh, I forgot to mention that it’s broken in UAT as well.”

“When did this happen?”

“Well, this particular functionality has always been broken in UAT. We just thought that it would work when we moved to Production.”

I know that some of you are laughing out there, but I also know that some of you are saying “Is he talking about me? Is he talking about ME?!?” If something doesn’t work in our UAT environment, there is no guarantee that it is going to work in Production. Indeed, my bet would be that it doesn’t work in Production either.

So why doesn’t it work in UAT when it worked in Development? Well, unfortunately, I only have about 500 words to write an answer and in this case the answer is more like a 500 page novel. One of the biggest reasons is something we’ve talked about before: running something as an admin in development but with minimal rights in UAT. Just because you can run your application as an administrator on the box does not mean that you should.

I can’t stress this often enough, and based on current evidence I definitely haven’t, you need to run your application under the least amount of rights possible. If something doesn’t work, fix it. Don’t assume that there is something wrong with the environment and that it is magically going to get better when it is migrated to the next environment, as it may even get worse.