Traditional load estimation
Traditionally servers are provisioned to support a maximum number of concurrent users, and load is often defined this way.
There is a problem with using this approach to run a queue for a website - a problem we solve with rates.
The problem arises because visitors are only actually using the webserver while their browser is loading pages.
Here's an example
For example, let’s say a site is selling tickets. From the start of the visitor journey to the end of the visitor journey there are six pages as the visitor
- arrives at the site,
- goes to the events page,
- sees a page for a specific event,
- chooses seats and tickets,
- enters payment details and
- sees an order confirmation page.
The entire transaction process takes five minutes, on average.
Let's say, again for example, 100 visitors arrive to buy tickets every minute, and the average time to complete the process is five minutes. In any given minute there are 500 people in the transaction process, and the server must be provisioned to support 500 concurrent users.
Counting concurrent users
One might think that one could create a Concurrent User Counter that is incremented every time a new visitor arrives at the home page, and decremented every time a visitor completes the transaction, and that this would tell you how many concurrent users your web server has at any one time, but this won’t work.
It won’t work because not every visitor completes the transaction. People change their minds, or get distracted, or go away and come back later.
Because the webserver is only interacting with the visitors when their browsers are loading pages, the webserver has no way of knowing when this has happened, or to whom.
Instead, to determine that a visitor is no longer part of the transaction process, the webserver has to wait to see if no more page requests arrive from that visitor and then time out that visitor’s session. Timeouts must always be set to much longer than the average transaction time (say 20 minutes in this example) as some visitors take much longer than others to complete their transaction.
One could add the facility to also decrement one’s notional Concurrent User Counter every time a visitor session times out in this way, but this gives very poor results.
If 10% of the 100 visitors that arrive every minute do not go on to complete their transaction, the ones that do complete will be active for five minutes on average and the ones that don’t complete will be considered active for 20 minutes always.
That’s 90 * 5 + 10 * 20 = 650, and the server will report 650 concurrent users, even though it is actually less busy!
What about the timing out users?
Furthermore, as many as 10 * 20 = 200 of those concurrent users are not actually using the site and are in the process of timing out, which is over 30% of the reported concurrent users, even though it’s only 10% of the visitors that fail to complete the transaction.
Now let’s say one wishes to add a queue to this website, controlled by our Concurrent User Counter. Once the site is at capacity, then an individual who is at the front of the queue will only be passed to the site once the counter decrements. This is called a one-out-one-in queue. What will happen is that 30% of the time, a person at the front of the queue will be waiting for someone who isn’t going to complete their transaction to finish not completing their transaction.
That is very clearly broken.
There is also the additional technical integration work and load on the website of creating and running the counter, and sharing that information with the queue system. If the sharing or counting mechanism goes down or breaks, the queue gets stuck too. Even worse, if the web server gets so busy it can't update the queue server to tell the queue server to let the next person in, the queue gets stuck too.
The solution: Use Rates
All these problems are easily and simply solved by instead sending the visitors from the front of the queue at a constant, fixed rate – in this case 100 visitors per minute.
That way there’s no need to measure the actual number of concurrent users, and no need for complex integration with the ticketing website, and no risk of the queue getting stuck.
That’s why we invented and patented the rate-based Virtual Waiting Room for busy websites in 2004.
If you know your web server’s maximum number of concurrent users and the average transaction time or visit duration, from the first page in your transaction flow to the order confirmation page, you can convert this into a Queue Rate by dividing the number of users by the duration, like this:
Queue rate = Concurrent users / Transaction time
An Easy Way To Calculate the Queue Rate
What if you don't know your Concurrent Users or Transaction Time? You can look at the page that is likely to be your bottleneck - usually the one that is the result of clicking a "Buy Now" button. Use Google Analytics to find the monthly unique visitors to that page, or count your monthly orders. Divide this by 30 * 24 * 60 = 43,200 which is the number of minutes in a month (approximately). That's your average visitors per minute over the whole month. Multiply this by three. That's your average visitors per minute during business hours (approximately). Double this. That's probably a safe figure for the Queue Rate to use.
For example, let's say you process 100,000 orders per month - that's 100,000 clicks of the "Buy Now" button. That's 100,000 / 43,200 = 2.31 orders per minute. You would expect most of these orders to be during the day, and your servers to be quieter at night, so multiply this by 3 and that's 7 orders per minute as a rough estimate of how busy your server is during business hours. If the resulting figure is less than 50: there will be peaks and troughs in demand, so if your server is not noticeably slow in peak hours, multiply this by 2 to get 14 users per minute. If the figure is more than 50: minute to minute peaks and troughs will be smaller in comparison, and it's not safe to double this. The number you end up with is probably a safe figure for the Queue Rate to start with - and you can always increase it if you find your systems are still responsive at that rate.
If your orders are timestamped, you can also look at the maximum orders you took in a single minute in the last month - but use with caution as you won't know how many orders you may have dropped during this minute due to your servers slowing, so reduce this by 20%.
The rest of this article discusses some other ways to work out the Queue Rate.
Gotcha #1: Concurrent Users vs Concurrent Requests
It's worth pointing out that there are at least two definitions of "Concurrent Users" in common usage.
We use the definition that Concurrent Users is the number of people engaged in a transaction flow at any one time. That's the key number you need to know to set the Queue Rate. That's how many people are viewing your site right now.
Contrast this with Concurrent Requests, which is the number of HTTP requests being processed by your web server at any one time. Very confusingly, a lot of tech people will mean Concurrent Requests when they say Concurrent Users.
Then there's Concurrent Connections, which is the number of TCP/IP Sockets open on your web server at any one time. When making page requests, browsers will by default leave the connection open in case any further requests are made by the page, or the user goes to a different page. Timeouts for these connections vary by browser, from 60 seconds to never-close. Your web server may automatically close connections after a period of no activity too. Again, some people call this "Concurrent Users" too.
Indeed if you ask your hosting provider to tell you the maximum number of Concurrent Users that your web server will support, they will probably actually give you a figure for Concurrent Requests or Concurrent Connections, for the simple reason that they don't know your average transaction time, number of pages in your transaction flow, or any of the other information that would allow them to tell you your maximum Concurrent Users.
So, if you are asking your hosting provider or tech team for Concurrent Users information, it's super-important that you clarify whether they really mean Concurrent Requests or Concurrent Connections.
Getting this wrong can crash your web site!
Here's why. Each page is a single HTTP request, but all the images, scripts and other files that come from your web server that the browser uses to display the page are also HTTP requests.
Let's imagine you've been told by your tech team that the server supports 500 Concurrent Users, but they actually mean 500 Concurrent Requests. With your 5 minute transaction time, you use the above formula and assume that your site can support 100 visitors per minute.
Can it? No.
As people go through the transaction flow, they are only actually making requests from your servers while each page loads. Out of the five minute transaction time, that's only a few seconds. You might therefore think that 500 Concurrent Requests means you can handle a lot more Concurrent Users, but you may well be wrong.
Converting Concurrent Requests to Concurrent Users
To work out your maximum Concurrent Users from your maximum Concurrent Requests, you also need to know
- The number of pages in your transaction flow
- The average visitor transaction time from first page to last page in your flow
- The average number of HTTP requests that make up each page
- The average time your server takes to process a single HTTP request
You probably know 1) and 2) already - in our example it's 6 pages and 5 minutes. You can easily count the pages you see while making a transaction. If you don't know the average transaction time, Google Analytics may tell you, or you can check your web server logs.
For 3) and 4), the Firefox browser can help. Right click on a page on your site, choose Inspect Element, and the Network tab. Then hit CTRL-SHIFT-R to completely refresh the page. You'll see network load times for every element of the page in the list. You want to make sure that you can see transfer sizes in the Transferred column, as otherwise files might be served from a cache which can mess up your calculations. You might see some scripts and other resources come from servers other than your site, so you can type the domain name for your site in the filter box in the left. To see the Duration column, right click any column header and select Timings -> Duration from the pop up menu. Your screen should look like this:
The Firefox Network tab for this page, showing Duration and number of Requests from queue-fair.com
Files used in the display of your pages can come from a number of different sites, so you want to also use the filter in the top left to just show those from your site - but only if you are sure that those files from other sites are not the reason for slow page loads, or part of your bottleneck.
Firefox counts the requests for you in the bottom left of the display, and shows 36 HTTP requests for just this one page.
You need to do this for every page in your transaction flow - count the total and divide by the number of pages to find the average number of HTTP requests for each page, number 3) in our list.
For number 4), you need to look at the Duration column and find the average for all the HTTP requests for all your pages. If you're not sure, assume half a second - there's a lot of uncertainty in this anyway (see below).
Doing the math
Let's give some example numbers. We've already said there are six pages in the example flow, which is 1), and that the average transaction time is five minutes, which is 2). Let's assume 36 HTTP requests per page for 3), and half a second for the server processing time for each HTTP request, which is 4).
With those numbers, a server that can handle 500 Concurrent Requests can handle 500 / (0.5 seconds) = 1000 HTTP requests per second, which is 60,000 HTTP requests per minute, when it's completely maxed out.
Over the five minute transaction time, it can handle 5 * 60,000 = 300,000 HTTP requests. Seems like a lot, right?
But, for each visitor, there are six pages with an average of 36 HTTP requests each, so that's 6 * 36 = 216 requests
So, the 300,000 HTTP request capacity can in theory handle 300,000 / 216 = 1,389 Concurrent Users
Gotcha #2: Web Servers Get Slower With Load
Hey that's great! We thought we could only have a queue rate of 100, but 1,389 / 5 minutes = 278 visitors per minute, so we can have a higher queue rate!
Well, probably not. For one, your visitors won't neatly send requests at intervals of exactly half a second, as the above calculation assumes. More importantly, you'll have been measuring your input data when the site isn't busy. Garbage in, garbage out.
When the site is busy, the server takes longer to process requests - you'll have noticed this on other sites when things are busy, that you're waiting longer for pages. This increases the average time your server takes to process a single HTTP request (4), which decreases the maximum throughput. So take the 278 visitors per minute and halve it. Then halve it again. You're probably realistically looking at about 70 new visitors per minute at maximum load.
Other confounding factors include caching, which means your visitors' browsers may not need to make every single request for every single page - this tends to increase the number of new visitors per minute your server can handle.
You'll also find that not all the pages take the same time to complete. Database searches and updates take the longest, so you will have a bottleneck somewhere in your process where people pile up, waiting for credit card details to be processed and orders stored, or waiting for availability to be checked. Every transaction flow has a slowest step so there is always a bottleneck somewhere. In that case, you want to set your Queue Rate low enough to ensure that your server has capacity to process enough people concurrently for the slowest step in your process so that people don't pile up there. Otherwise your webserver can literally grind to a halt.
So what do I do?
Our experience is that, going into their first sale, everybody overestimates the ability of their servers to cope with high volumes of traffic.
The best thing to do is run a proper load test, with 'fake' customers actually going through the order process exactly as they would in real life, making the same HTTP requests in the same order, with the same waits between pages as you see in real life, and keep an eye on your processor load, IO throughput and response times as you ramp up the number of virtual visitors. You can use Apache JMeter for this, but whatever tool you use it's time consuming and tricky to get exactly right (especially with the complexities of caching). Even then, take your numbers and halve them.
In the absence of that, err on the side of caution.
You can easily change the Queue Rate for any Queue-Fair queue at any time using the Queue-Fair portal. Start at 10 visitors per minute, or your transaction rate on a more normal day, see how that goes for a little while after your tickets go on sale, and if all looks good, your processor load is low, your database is fine and (above all) your pages are responsive when you hit CTRL-SHIFT-R, double it, wait a bit, and repeat. You'll soon find the actual rate you need, and remember, from a customer experience point of view, it's fine to raise the Queue Rate as this causes the estimated waits that your customers in the queue are seeing to reduce, and everyone is happy to see a shorter estimated wait.
What you want to avoid doing is setting the Queue Rate too high then be in the position of having to lower it, as this a) means people using the site experience slow page loads, and b) causes the estimated waits to increase. All the people in your queue will sigh!
Gotcha #3: Increasing the rate too quickly after a queue opens
Remember, you will have a bottleneck somewhere in your order process - every transaction has a slowest step. What you don't want to do is get a minute into your ticket sale, see that your server processor load is fine, and raise the rate. Your visitors probably haven't got as far as the "Buy Now" button. You want to wait until your database is reporting new orders at the same or similar rate as your Queue Rate and make your measurements and responsiveness tests then. Remember that every time you increase the rate, it will take that same amount of time for the extra visitors to reach your bottleneck, so you won't be able to accurately assess how your server performs at the new rate until after that time has elapsed.
In this article we've explained why a rate-based queue is always the way forward, and given two methods to calculate the rate you need, but unless you've done full and accurate virtual visitor load testing on your entire transaction flow, and are really super extra mega certain about that, our advice is always the same:
- Start with a Queue Rate set to 10, or your transaction rate on a more normal day.
- Watch your processor load and other performance indicators.
- Wait until new orders are being recorded in your database at the same or similar rate as your Queue Rate.
- Hit CTRL-SHIFT-R on your pages to check responsiveness.
- Increase the Queue Rate.
- Go back to Step 2, and wait again.
That's for your first queue, when you don't know the actual maximum Queue Rate your system can support. For subsequent queues, once you've measured the Queue Rate that your system can actually handle, you might be able to use the same figure again - but only if nothing has changed on your system. In practice your system is probably under constant development and modification, and you may not know how recent changes have affected your maximum Queue Rate - so why not start at half your previous measured figure and repeat the above process?
Remember, it's always better to be safe than sorry.