Traditional load estimation
Traditionally servers are provisioned to support a maximum number of concurrent users, and load is often defined this way.
There is a problem with using this approach to run a queue for a website - a problem we solve with rates.
The problem arises because visitors are only actually using the webserver while their browser is loading each web page.
Here's an example
For example, let’s say a site is selling tickets. From the start of the visitor journey to the end of the visitor journey (the user session) there are six pages as the visitor
- arrives at the site,
- goes to the events page,
- sees a page for a specific event,
- chooses seats and tickets,
- enters payment details and
- sees an order confirmation page.
The entire transaction process takes five minutes, on average.
Let's say, again for example, 100 visitors arrive to buy tickets every minute, and the average time to complete the process is five minutes. In any given minute there are 500 people in the transaction process, and the server must be provisioned to support 500 concurrent users.
Counting concurrent users
One might think that one could create a Concurrent User Counter that is incremented every time a new visitor arrives at the home page, and decremented every time a visitor completes the transaction, and that this would tell you how many concurrent users one web server has at any one time, but this won’t work.
It won’t work because not every visitor completes the transaction. People change their minds, or get distracted, or go away and come back later. Because the webserver is only interacting with the visitors when their browsers are loading pages, the webserver has no way of knowing when this has happened, or to whom.
Instead, to determine that a visitor is no longer part of the transaction process, the webserver has to wait to see if no more page requests arrive from that visitor and then time out that visitor’s session. Timeouts must always be set to much longer than the average transaction time (say 20 minutes in this example) as some visitors take much longer than others to complete their transaction.
One could add the facility to also decrement one’s notional Concurrent User Counter every time a visitor session times out in this way, but this gives very poor results.
If 10% of the 100 visitors that arrive every minute do not go on to complete their transaction, the ones that do complete will be active for five minutes on average and the ones that don’t complete will be considered active for 20 minutes always.
That’s 90 * 5 + 10 * 20 = 650, and the server will report 650 concurrent users, even though it is actually less busy!
What about the timing out users?
Furthermore, as many as 10 * 20 = 200 of those concurrent users are not actually using the site and are in the process of timing out, which is over 30% of the reported concurrent users, even though it’s only 10% of the visitors that fail to complete the transaction.
Now let’s say one wishes to add a queue to this website, controlled by our Concurrent User Counter. Once the site is at capacity, then an individual who is at the front of the queue will only be passed to the site once the counter decrements. This is called a one-out-one-in queue. What will happen is that 30% of the time, a person at the front of the queue will be waiting for someone who isn’t going to complete their transaction to finish not completing their transaction.
That is very clearly broken.
There is also the additional technical integration work and load on the website of creating and running the counter, and sharing that information with the queue system. If the sharing or counting mechanism goes down or breaks, the queue gets stuck too. Even worse, if the web server gets so busy it can't update the queue server to tell the queue server to let the next person in, the queue gets stuck too.
The solution: Use Rates
All these problems are easily and simply solved by instead sending the visitors from the front of the queue at a constant, fixed rate – in this case 100 visitors per minute.
That way there’s no need to measure the actual number of concurrent users, and no need for complex integration with the ticketing website, and no risk of the queue getting stuck.
That’s why we invented and patented the rate-based Virtual Waiting Room for busy websites in 2004.
How many concurrent users can a web server handle?
If you know how many concurrent users a web server can handle, and the average transaction time or visit duration, from the first page in your transaction flow to the order confirmation page, you can convert this into a Queue Rate by dividing the number of users by the duration, like this:
Queue rate = Concurrent users / Transaction time
An Easy Way To Calculate the Queue Rate
What if you don't know how many Concurrent Users a server can handle, or Transaction Time? You can look at the page that is likely to be your bottleneck - usually the one that is the result of clicking a "Buy Now" button. Use Google Analytics to find the monthly unique visitors to that page, or count your monthly orders. Divide this by 30 * 24 * 60 = 43,200 which is the number of minutes in a month (approximately). That's your average visitors per minute over the whole month. Multiply this by three. That's your average visitors per minute during business hours (approximately). Double this. That's probably a safe figure for the Queue Rate to use.
For example, let's say you process 100,000 orders per month - that's 100,000 clicks of the "Buy Now" button. That's 100,000 / 43,200 = 2.31 orders per minute. You would expect most of these orders to be during the day, and your servers to be quieter at night, so multiply this by 3 and that's 7 orders per minute as a rough estimate of how busy your server is during business hours. If the resulting figure is less than 50: there will be peaks and troughs in demand, so if your server is not noticeably slow in peak hours, multiply this by 2 to get 14 active users per minute. If the figure is more than 50: minute to minute peaks and troughs will be smaller in comparison, and it's not safe to double this. The number you end up with is probably a safe figure for the Queue Rate to start withand how many requests you can safely manage; you can always increase it if you find your systems are still responsive at that rate.
If your orders are timestamped, you can also look at the maximum orders you took in a single minute in the last month - but use with caution as you won't know how many orders you may have dropped during this minute due to your servers slowing, so reduce this by 20%.
he rest of this article discusses some other ways to work out the Queue Rate.
Gotcha #1: Concurrent Users vs Concurrent Requests
It's worth pointing out that there are at least two definitions of "Concurrent Users" in common usage.
We use the definition,‘the number of people engaged in a transaction flow at any one time’. That's the key number you need to know to set the Queue Rate. That's how many users are viewing your site right now. The number of Concurrent Sessions is usually somewhat larger than this, because some of the sessions are in the process of timing out.
Contrast this with Concurrent Requests, which is the number of HTTP requests being processed by your web server at any one time. Very confusingly, a lot of tech people will mean Concurrent Requests when they say Concurrent Users.
Then there's Concurrent Connections (or concurrent TCP connections), which is the number of TCP/IP Sockets open on your server at any one time. When making page requests, browsers will by default leave the connection open in case any further requests are made by the page, or the user goes to a different page. Timeouts for these connections vary by browser, from 60 seconds to never-close. Your server may automatically close connections after a period of no activity too. Again, some people call this "Concurrent Users" too.
Indeed if you ask your hosting provider to tell you the maximum number of Concurrent Users that your web server will support (how much traffic), they will probably actually give you a figure for Concurrent Sessions, Concurrent Requests or Concurrent Connections, for the simple reason that they don't know your average transaction time, number of pages in your transaction flow, or any of the other information that would allow them to tell you how many simultaneous users your web server process can handle.
If you are asking your hosting provider or tech team for information about maximum traffic levels, it's super-important that you clarify whether they mean Concurrent Users, Concurrent Sessions, Concurrent Requests or Concurrent Connections.
Getting this wrong can crash your web site!
Here's why. Each page is a single HTTP request, but all the images, scripts and other files that come from your web application that the browser uses to display the page are also HTTP requests.
Let's imagine you've been told by your tech team that the server supports 500 Concurrent Users, but they actually mean 500 Concurrent Requests. With your 5 minute transaction time, you use the above formula and assume that your site can support 100 visitors per minute.
Can it? No.
As people go through the transaction flow, they are only actually making requests from your servers while each page loads. Out of the five minute transaction time, that's only a few seconds. You might therefore think that 500 Concurrent Requests means you can handle a lot more Concurrent Users, but you may well be wrong. Can you see now how understanding your website capacity is such a complicated business?
Converting Concurrent Requests to Concurrent Users
To work out your maximum Concurrent Users from your maximum Concurrent Requests, you also need to know
- The number of pages in your transaction flow
- The average visitor transaction time from first page to last page in your flow
- The average number of HTTP requests that make up each page
- The average time your server takes to process a single HTTP request
You probably know 1) and 2) already - in our example it's 6 pages and 5 minutes. You can easily count the pages you see while making a transaction. If you don't know the average transaction time, Google Analytics may tell you, or you can check your web server logs.
For 3) and 4), the Firefox browser can help. Right click on a page on your site, choose Inspect Element, and the Network tab. Then hit CTRL-SHIFT-R to completely refresh the page. You'll see network load times for every element of the page in the list. You want to make sure that you can see transfer sizes in the Transferred column, as otherwise files might be served from a cache which can mess up your calculations. You might see some scripts and other resources come from servers other than your site, so you can type the domain name for your site in the filter box on the left. To see the Duration column, right click any column header and select Timings -> Duration from the pop up menu. Your screen should look like this:
The Firefox Network tab for this page, showing Duration and number of Requests from queue-fair.com
Files used in the display of your pages can come from a number of different sites, so you want to also use the filter in the top left to just show those from your site - but only if you are sure that those files from other sites are not the reason for slow page loads, or part of your bottleneck.
Firefox counts the requests for you in the bottom left of the display, and shows 36 HTTP requests for just this one page.
You need to do this for every page in your transaction flow - count the total and divide by the number of pages to find the average number of HTTP requests for each page, number 3) in our list.
For number 4), you need to look at the Duration column and find the average for all the HTTP requests for all your pages. If you're not sure, assume half a second - there's a lot of uncertainty in this anyway (see below).
Doing the math
Let's give some example numbers. We've already said there are six pages in the example flow, which is 1), and that the average transaction time is five minutes, which is 2). Let's assume 36 HTTP requests per page for 3), and half a second for the server processing time for each HTTP request, which is 4).
With those numbers, a server that can handle 500 Concurrent Requests can handle 500 / (0.5 seconds) = 1000 HTTP requests per second, which is 60,000 HTTP requests per minute, when it's completely maxed out.
Over the five minute transaction time, it can handle 5 * 60,000 = 300,000 HTTP requests. Seems like a lot, right?
But, for each visitor, there are six pages with an average of 36 HTTP requests each, so that's 6 * 36 = 216 requests
So, the 300,000 HTTP request capacity can in theory handle 300,000 / 216 = 1,389 Concurrent Users
Gotcha #2: Web Servers Get Slower With Load
Hey, that's great! We thought we could only have a queue rate of 100, but 1,389 / 5 minutes = 278 visitors per minute, so we can have a higher queue rate!
Well, probably not. For one, your visitors won't neatly send requests at intervals of exactly half a second, as the above calculation assumes. More importantly, you'll have been measuring your input data when the site isn't busy. Garbage in, garbage out.
When the site is busy, the server takes longer to process requests - you'll have noticed this on other sites when things are busy, that you're waiting longer for pages. This increases the average time your server takes to process a single HTTP request (4), which decreases the maximum throughput. So take the 278 visitors per minute and halve it. Then halve it again. You're probably realistically looking at about 70 new visitors per minute at maximum load.
Other confounding factors include caching, which means your visitors' browsers may not need to make every single request for every single page - this tends to increase the number of new visitors per minute your server can handle.
You'll also find that not all the pages take the same time to complete. Database searches and updates take the longest, so you will have a bottleneck somewhere in your process where people pile up, waiting for credit card details to be processed and orders stored, or waiting for availability to be checked. Every transaction flow has a slowest step so there is always a bottleneck somewhere. In that case, you want to set your Queue Rate low enough to ensure that your server has capacity to process enough people concurrently for the slowest step in your process so that people don't pile up there. Otherwise your webserver can literally grind to a halt.
So what do I do?
Our experience is that, going into their first sale, everybody overestimates the ability of their servers to cope with high volumes of traffic.
Everybody.
Accurately pinpointing the average session duration and end user performance during slow or peak traffic isn't for the faint hearted. The best thing to do is run a proper load test, with 'fake' customers actually going through the order process exactly as they would in real life, making the same HTTP requests in the same order, with the same waits between pages as you see in real life, and keep an eye on your processor load, IO throughput and response times as you ramp up the number of virtual visitors. You can use Apache JMeter for this (we also like K6 for lighter loads), but whatever tool you use it's time consuming and tricky to get exactly right (especially with the complexities of caching). Even then, take your numbers and halve them.
In the absence of that, err on the side of caution.
You can easily change the Queue Rate for any Queue-Fair queue at any time using the Queue-Fair portal. Start at 10 visitors per minute, or your transaction rate on a more normal day, see how that goes for a little while after your tickets go on sale, and if all looks good, your processor load is low, your database is fine and (above all) your pages are responsive when you hit CTRL-SHIFT-R, double it, wait a bit, and repeat. You'll soon find the actual rate you need during this 'load balancing' (see what we did there?), and remember, from a customer experience point of view, it's fine to raise the Queue Rate as this causes the estimated waits that your customers in the queue are seeing to reduce, and everyone is happy to see a response time delivering a shorter estimated wait.
What you want to avoid doing is setting the Queue Rate too high then be in the position of having to lower it, as this a) means people using the site experience slow page loads, and b) causes the estimated waits to increase. All the people in your queue will sigh!
Gotcha #3: Increasing the rate too quickly after a queue opens
Remember, you will have a bottleneck somewhere in your order process - every transaction has a slowest step. What you don't want to do is get a minute into your ticket sale, see that your server processor load is fine, and raise the rate. Your visitors probably haven't got as far as the "Buy Now" button. You want to wait until your database is reporting new orders at the same or similar rate as your Queue Rate and make your measurements and responsiveness tests then. Remember that every time you increase the rate, it will take that same amount of time for the extra visitors to reach your bottleneck, so you won't be able to accurately assess how your server performs at the new rate until after that time has elapsed.
Gotcha #4: Snapping your servers
We've already discussed how it's best to increase the Queue Rate gradually once your queue has opened. You are probably aware that your servers do have a limit that cannot be exceeded without the system crashing and may even be aware of what the limit is - but what you may not know is that as the load is approaching this limit, there is usually very little sign - often just a few errors or warnings, or a processor load above 80%.
When web services fail they tend to 'snap' or seize up very quickly. This is normally because once your system can no longer process requests as quickly as they come in, internal queues of processing build up. Your system then has to do the work of processing, managing and storing its internal queues as well as the requests, and that's what tips servers over the edge. Very quickly. Once that happens, your servers may for a time be able to respond with an error page, but this doesn't help you because the visitors that see it will immediately hit Refresh, compounding the load.
So, don't push your servers any harder than you need to. Going for that last 20% of capacity is usually not worth the risk. If the queue size shown in the Queue-Fair Portal (the yellow Waiting figure and line in the charts) is decreasing or even just increasing more slowly, minute by minute, and the wait time shown is 50 minutes or less, then you are processing orders fast enough and the queue will eventually empty and stop showing Queue Pages automatically, without you having to do anything, and without you having to tell your boss that you pushed it too hard and broke it. You'll get there eventually so long as the speed of the Front of the Queue is higher than the number of Joins every minute (both of which are shown in the Queue-Fair Portal) - the turning point is usually at least a few minutes into each event. If you are selling a limited-quantity product, you will probably sell out before the turning point is reached.
The good news is that if you accidentally do set the Queue Rate too high and your servers snap, Queue-Fair can help you get up and running quickly - just put the queue on Hold until your servers are ready to handle visitors again. In Hold mode, people in the queue see a special Hold page that you can design before your online event. No-one is let through from the front of the queue when it is on Hold, but new visitors can still join the queue at the back, to be queued fairly once the blockage is cleared, which will happen very quickly because Queue-Fair is protecting your site from the demand. The Hold Page is a superior user experience to setting the Queue Rate really low, especially if you update it to tell the visitors what time you expect the Queue to reopen, which is easy to do with the Portal page editor, even when hundreds of thousands of people are already in the queue - and in Hold mode you can even let them through one at a time with Queue-Fair's unique Admit One button if you need to while your system recovers from its snap.
So, if you do find your servers need to take a break during your event, the Hold page is just what you need for that, and will help your servers recover more quickly to boot.
Conclusion
In this article we've explained why a rate-based queue is always the way forward, and given two methods to calculate the rate you need, but unless you've done full and accurate virtual visitor load testing on your entire transaction flow, and are really super extra mega certain about that, our advice is always the same:
- Start with a Queue Rate set to 10, or your transaction rate on a more normal day.
- Watch your processor load and other performance indicators.
- Wait until new orders are being recorded in your database at the same or similar rate as your Queue Rate.
- Hit CTRL-SHIFT-R on your pages to check responsiveness.
- Increase the Queue Rate by no more than 20%.
- Go back to Step 2, and wait again.
- Once the queue size is decreasing or is steadily increasing less rapidly every minute, and the wait time shown is less than 50 minutes it doesn't need to go any faster.
- Sit back and relax! Queue-Fair's got you covered.
If you are selling a limited quantity product, you don't need to pay attention to Step 7 either.
That's for your first queue, when you don't know the actual maximum Queue Rate your system can support. For subsequent queues, once you've measured the Queue Rate that your system can actually handle, you might be able to use the same figure again - but only if nothing has changed on your system. In practice your system is probably under constant development and modification, and you may not know how recent changes have affected your maximum Queue Rate - so why not start at half your previous measured figure and repeat the above process?
Remember, it's always better to be safe than sorry.