Trust

One of the biggest things we need to deal with in applications is trust in other parts of the system.  How much trust do you have in the calling application?  How much trust do you have in the common routines that you call?

For instance, you’ve created a method that will take an XML stream and use it to update a row in a table.  How much faith do you have that the calling application has sent you the proper types?  Should you check everything to ensure that strings are strings and numbers are numbers?  Do you double check to ensure that dates are actually valid dates and that the time listed actually exists?

I used to work for a manager that insisted that every time your method gets invoked it should double check all of the data being passed in before it did any work: verification was the first thing you did.  Being young and full of myself, I didn’t follow that rule because, well, to be honest, I was writing both sides and I knew what I was passing myself!!!  Fast forward a couple of years and someone else is maintaining the code.  Well, they made some changes that didn’t follow the rules and, in production, it blew up horribly because the method did not verify that the correct type of data was being passed.  Being on the support side I was called in to troubleshoot and instantly recognized what the problem was and the solution that was required.  A quick compile and test and the application no longer died horribly, but gave a nice, easy to understand error message.

With today’s modern languages much of this work is taken care of for you by the development tool during design time as you need to ensure that you are calling with the correct types, or the the compiler won’t even compile your application for you.  However, there is a problem when you are using XML or if you are taking in a string and attempting to use it as a numeric value.  This is of particular concern to user interfaces as pretty much everything you retrieve from the UI is a string that you need to convert and use.

A user interface should place no trust in the end user entering in the correct type of data into the text box.  But, how much trust do you place in one piece of code calling another piece of code?  I guess that depends on whether or not you are going to be the person maintaining the code for the lifetime of the application.  If you are, then I guess you can trust the code.  If you aren’t then being paranoid may be beneficial.


n-Tier environment … design

One of the biggest differences, at least in terms of perspective, between a Win32 application (aka Fat Client) and a web browser application is that the Win32 application is used by a single person whereas the web application may be used by hundreds, or thousands of people at one time.  OK, quiet down out there, let me explain.

A Win32 application is run on a desktop and is run by a single person at a time.  The application may be installed on hundreds or thousands of machines, but each machine has a single person running the application.  In this manner the workload is distributed between each workstation and the database server.  In a web based application all of that processing needs to occur somewhere.  While some of it happens on the browser, a good portion of it runs on the web server.  Therein lies the problem.

Many web applications are built in such a manner that they consume large amounts of CPU and memory in order to serve a single client.  Multiply by the number of people that are going to be accessing the application and you have a nightmare on your hands. Many years ago servers used to be much more powerful than desktops.  This was due to the advance architecture used in the servers and the very expensive chips used.  As prices have dropped, however, this distinction has almost completely disappeared.  Indeed, a new desktop may, in some circumstances, be faster than the server to which it is communicating.  Because the average developer has a fairly powerful machine, what seems to run quickly on their desktop completely bogs down on the server when multiple people try to access it.

Our current Production servers are quite large and contain a lot of raw horsepower.  We do not have a large number of concurrent users.  I would personally be surprised if we hit 40 on any one server.  This is in comparison to a server at Google that is designed to support hundreds of concurrent users.  On-line games, such as World of Warcraft, support thousands of concurrent users.  While we don’t need to write our applications so that we can support thousands of concurrent users, we should always be cognizant of the fact that our application does not operate alone.  It needs to co-exist with itself and others.

Batch Processing … Part Two

I must be honest, I am sometimes quite surprised by the reaction my notes some times invoke.  For instance, the note about batch processing generated a number of “high fives” in the hallway and a couple of 500 word responses.  The response was, as I expected, all over the place with some people telling me I was not in my right mind while other people were saying that this is what they have always believed.  Never one to let the flames of disagreement burn out, I thought I would list my personal rules as to whether or not something should be done in batch.  This is not something you need to follow, but it may help illuminate my comments from the other day.

Interfaces with other systems.  I originally had the word “external” but I replaced it with “other” as different systems internal to an overall application may not be able to handle a continuous stream of data.  For instance, our interface to IMAGIS is done through a file transfer which is done once a day.  To ensure that we get as much as possible into the system we do the transfer close to the cut off time that we have arranged with the IMAGIS team.  In this case the target system is just not designed to handle us sending them information multiple times per day.  Now, it may be the case that there are other ways to communicate with IMAGIS that we have not utilized, but with the current set up, we need to do it on a scheduled, batch basis.

Reports. This seems like a logical item to do in a batch process.  Whether this batch process is done via another tool, such as ReportNet scheduling the report, or whether it is scheduled via Windows Scheduler, reports are good batch residents. However, in my mind a report does not process and create data, it merely reports on the data.  If your “report” creates and stores data then that portion should be separated out and done in an asynchronous manner.  A report, any report, should be able to be generated quickly from data that is already stored in the database.  In addition, in many cases, reports do not need to be run on a scheduled basis, as long as the report can be generated on demand and that it will contain the identical content as if it had been generated on a previous day.  For instance, if reporting on the applications that were approved on July 12th, it would list all approvals, even if one of the application was subsequently denied.

And that’s pretty much it.  It’s a very short list of things that need to be done in a batch window.  For me it all comes down to this:

It is our job as IT Professionals, however, to not just do what our clients say, but to educate them as to what can be done, to show them new opportunities, and to give them something better than what they had, not just something newer.

The n-Tier world .. hardware

Does an n-tier hardware environment actually make things better?  I was asked this question recently and, to be honest, it made me pause and reflect on the promises of the n-tier world versus the reality of an n-tier world.

The n-tier push really started gaining hold, again, in the 90’s when the Internet was young and foolish and so was Al Gore.  As a result, most people associate the idea of an n-tier environment as one that is web browser based.  While this is not always the case, the majority of applications currently being developed are web based, so we will run with that and assume, for the purposes of this discussion, that our n-tier environment consists of a web browser communicating with a web server that in turn communicates with a database server.

With regard to the web browser, the idea was that with a “light” front end that was downloaded every time you requested it you did not need as much processing power on the desktop (Internet Computer anyone?) and you could make changes to the application without having to distribute an application to hundreds, thousands or even millions of customers.  This has proven to be a valuable and worthwhile objective of browser based deployments and allows for quicker changes with less client impact.

Separating the main logic on to a web server or cluster of web servers (see previous notes about this) then allows the developer to change the application and only have to do it in a limited number of locations.  While this has allowed the developer to deploy applications quickly, the problem here lies in the fact that the developer(s) build the application as if it was the only thing running on the server, when in reality it is usually one of many applications.  Resource contention (memory, CPU) usually mean that going to a cluster of servers is a requirement.  It is also a common misconception that adding more servers will make the application run faster.  Adding more servers allows you to support more simultaneous users, but does not necessarily make the application run faster.  As a result, a poorly performing application will perform just as poorly one machine as on a cluster of 20, although you can annoy more people on cluster.

By placing all of the database access on a machine, or cluster of machines, there are fewer connections to the database that need to be monitored and managed on the database server.  This reduces memory usage, CPU usage and allows the database to concentrate on serving data to clients.  Unfortunately, this is the where one of the biggest problems is in the n-tier world.  Developers need to optimize their code when accessing the database.  Whether it is reducing locks and deadlocks, reducing transaction length or simply designing better applications and databases, the database back end, regardless of whether it is an Intel box, a Unix server or an OS/390 behemoth can only handle so much work.  Web servers can be clustered, but in order to get more than one database server to service requests against the same data you need to have a much more sophisticated and much more complicated environment.  Just adding another server won’t cut it as the database server is constrained in terms of the memory and CPU we can use.

So, has n-tier lived up to it’s promise?  Sort of.  The web browser side:  yes.  The web/application server side:  mostly.  The database side:  not as much as expected.  The problem is not the technology, rather the people.  We have created an infrastructure that can do great things.  What we need to do now is teach people how to create great things with that infrastructure.

Clusters

Clustered servers.  One of the most important reason for having clustered servers is so that in the event that one server fails the other servers will pick up the load and take over for the failed server.  However, there are some things you need to know before everyone goes rushing off to say “Clusters will solve all of our problems.

Applications must be built to function in a cluster.  So, what does this mean?  It means that if you are going to store a file being exchanges with the client, the file needs to be stored in a location that all servers can access.  It means that if session state is stored it needs to use SQL Server as the storage mechanism as this is used across all servers.  Essentially, nothing should be stored on the local machine.  Nothing.

Connections are “sticky”, but don’t depend on it.  By default the load balancer will try to send the user back to the same server that they were on previously.  However, in the event that the server is out of the cluster or there is a particularly heavy peak load on the server, the load balancer may send the user to another server.  This is expected and desired from a load balancing perspective, so your application better not rely on the user always going back to the same server.

Clusters do not protect against soft failures.  If a server dies, blows up, is incinerated by aliens from a distant galaxy, is cryogenically frozen in a block of nitrogen, or has it’s network connection severed, the load balancing software will automatically move users to one of the other servers in the cluster.  However, in the event that the error is a soft error, an error in which the application responds, but does so incorrectly, the load balancer does not know that there is a problem and will continue to send users to that server.  Indeed, if the error is pervasive and actually causes all pages to fail, and fail quickly, the load balancer may be confused to the point where it thinks that the server is not under a large load and that it can handle more connections.

If you keep these things in mind you’ll understand that while clusters do help us out a lot, there is some work to ensuring that they can help and that the help is what we are expecting.  Overall, however, the benefits of a clustered environment far outweigh the disadvantages.

Flow Control

Computer languages contain a number of things generically called “flow control“.  These construct (words) are what allow a programmer to change the course of the application.  They contain things such as:

  • if … then
  • do … while
  • while … wend
  • plenty of other examples

The one thing they have in common is that they have the potential to do something based on a condition.  But you know what?  Every time you execute one of these flow control statements you have the potential of slowing your application down.  You also have something else you need to check.  In addition, if you’ve chained a number of these statements in a row you must ensure that you are checking them in the proper order.

Essentially, what it comes down to is trying to avoid these statements in the first place.  If you want to get carried away, there are some object oriented purists who believe that you should be able to write your entire application without the use of an if … then statement.  (My personal belief is that while this is an interesting thought, if … then can save a lot of extra coding.)

This is also true of any business processes that may be implemented by the project.  Avoiding the use of flow controls makes the process easier to use, easier to follow and easier to understand.  So, whether it is an application or a business process, minimize the use of conditional statements and you’ll find that things go faster, smoother and with fewer problems.

Hot Fixes

What is a hot fix?  This question seems to be coming up  more often and I think it needs a bit of discussion in this arena.  Definitions of hot fix that I have seen include:

  • A hotfix is code (sometimes called a patch) that fixes a bug in a product. (Source)
  • Microsoft’s term for a bug fix, which is accomplished by replacing one or more existing files (typically DLLs) in the operating system or application with revised versions. (Source)

I think we can all agree that a hot fix is something that fixes a bug.  The question now arises as to the size of the patch.  The second definition is important in this aspect as it talks about replacing one or more DLLs.  So, a hot fix will fix a bug by replacing an indeterminate number of DLLs.  Darn it, I’ve used that word again: replacing.  That happens to be the crux of the problem that we are experiencing.

Replacing DLLs does not mean the uninstaling of the entire application and the installation of a new version of the application which has the bug fix inside.  This is simply an install of the application.  A hot fix would take the DLLs that were changed, package those up and install those on affected machines.  This is standard practice used by Microsoft, IBM, Sun, Oracle, Hewlett Packard, PeopleSoft, SAP, Symantec, Trend Micro, Adobe, Electronic Arts, Intuit, AutoDesk Check Point, and, quite literally, millions of other companies.  You don’t re-install Windows every time there is a hot fix for Windows.  You don’t re-install your anti-virus software every time there is an update to the software.  You don’t re-install your entire application because there is a spelling mistake on a page.

If you are asking for a migration to a Shared Environment, and you are essentially asking us to install a new version of the application, don’t call it a hot fix, as you are disagreeing with the vast majority of the IT world and the definition that the Deployment Team uses for a hot fix.  A hot fix replaces DLLs.  By packing everything up into a new install for the application you are potentially including other changes in your fix that are not related to the bug you are trying to install.

Deadlocks … Part 2

So, how do we prevent deadlocks?  Well, there are a number of different ways to go about doing it, so we’ll only concentrate on a couple.

Small.  This may seem obvious, but it is one of the most overlooked items in the book.  The smaller the transaction (the fewer rows that have been updated, inserted or deleted) the less likely you are to trample on someone else.  The fewer locks you hold the better off you’ll be.  One of the biggest problems in this regard is applications that update a row, even when it doesn’t need to be updated.  The application follows a specific path and part of that path says that Row X in Table Y needs to have certain values.  The row is not checked to see if it has those values, it is just automatically updated.  Well, that placed a lock on that row even if the values remained the same.

Fast.  This may also seem obvious, but the shorter the amount of time you hold those locks the less likely you are to have a problem with deadlocks.  One of the classic problems here is that the application does something to the database right away, say updating a table indicating where the user is at in a process, and then does a lot of calculations or a lot of work and then does more updating at the end, interspersed with a smattering of updates / inserts / deletes.  The locks are held from the first update to the final commit and as that length of time increases, the more likely it is that deadlocks will occur.

Application Developer Guide.  OK, this one is a little esoteric, so let me explain.  One of the ways that deadlocks can happen is if one part of the application updates tables in this order – Table A, Table B, Table C – while another part of the application updates tables in the reverse order.  By explaining the order in which tables should be updated in the application’s developer guide everyone on the project will know the order that they should be doing things.  Changing the order should be something that is discussed with the DBA prior to being implemented as changing the order may impact more than just deadlocks.

As usual, there are dozens of other things to take into account, but these are some of the items with the biggest payback.

Deadlocks … Part 1

Deadlocks are an interesting condition within the database.  In very simplistic terms, it occurs when two people are both trying to change data that the other person has already changed.  For instance, Person A has already updated Row 1 in Table 1 and now wants to update Row 2 in Table 2.  However, Person B has already updated Row 2 in Table 2 and is trying to update Row 1 in Table 1.  As you can see, unless someone gives in they could sit there all day.  The database is the arbiter in this case and makes a decision as to which person is going to get the error message when their transaction is canceled.

Now, you may have heard the phrase “Deadlocks are a natural part of application processing and are not a large concern.”  While I do not advocate violence, the person who says this should be slapped in order to knock some sense into them.  Deadlocks are not a natural par tof an application.  I previously worked on a web-based system that had, as regularly peaked at over 650 simultaneous users, with over 250 of those being “hard core” users. This is 10 times the size of any application we have currently running.  If we got a single deadlock during the day we had to investigate why the deadlock occurred and determine if the deadlock could be prevented.  We would go for weeks, or even months without a deadlock, but when one occurred we dropped everything and investigated the problem.

Why do we get deadlocks in an application?  Sometimes it is because in one part of an application we update Table 1 and then Table 2, but in other parts we update Table 2 and then Table 1.  This is a disaster waiting to happen.  In other circumstances we have background tasks running that are not properly tuned and they try to update too much at once before committing their data.  This is also another disaster.

There are a number of surprisingly simple and, to most people, obvious strategies to use that will eliminate deadlocks to the “rare” occurrence that they should be.  We’ll discuss those tomorrow.

Interim

Interim is an interesting word.  When people look at the word and use it, they might actually have a different opinion as to what the word means.  For instance, they may say that something is “an interim solution“.  Based upon how most people in the IT world use the word they would assume that the solution is a short-term solution and that something is coming to replace it.  They would be wrong, in more ways that one.

Strictly speaking, the definition of interim is something like this:

… the period of time between one event and another …

Not very specific is it?  According to the definition, an interim solution would be a solution that is in place prior to the final solution being installed.  IT people, however, haven’t really been using the word in quite this manner, or rather, they have been hoping the word is not used in this manner.  They honestly believe that the phrase “short term” is in the definition.  Sorry, but it isn’t there.

What this means is that if you specify an “interim solution“, you had better be comfortable with that solution because there is no time limit on how long that solution will be in place.  If you are not comfortable with the solution as a long-term solution, do not promote it because many interim solutions have become long term solutions, regardless of how much we dream and pray.