ATTN Big Media: Web page speed matters!

July 4th, 2010

I am not a front-end kinda guy, but these days my latest obsession is to test and improve web page loading times. With this blogpost, I intend to show with examples of some selected websites and what are they doing right and what *desperately* needs to be improved. As I learn more about this I will be posting more in-depth posts on these issues.

Why does it matter?

These days almost everyone is on high-speed ‘broadband’ connections so the sites should load fast. Right? No! With faster connectivity, users get more impatient and expect websites to load faster.

E-Commerce websites have done studies which proove faster sites results in more sales. Google has also been advocating to webmasters to improve their pageload times. Now pageload is a signal for ranking in Google, abit a minor signal, but it still matters.

Overall faster websites results in a higher user retention. Faster websites also lower operational costs. Not directly related, but an optimized website is also less susceptible to Frontend SPOF.

Analysis of pageload time

My favorite tool for this is webpagetest.org. They also have an active forum where users can discuss results and exchange tips on making it faster.

Some test results :-

Site Alexa Rank First View Repeat View
My Sites
Main blog page 436,669 1.512s 1.038s
A single post page on my blog. 436,669 0.822s 0.648s
Thaindian.com Homepage 5,368 0.740s 0.514s
A story page on Thaindian News 5,368 4.517s 3.411s
Sites for Geeks/nerds
Stack Overflow 445 2.233s 1.734s
A single thread at Stack Overflow 445 2.846s 2.645s
Slashdot.Org 1,393 3.773s 2.175s
A single thread at Slashdot 1,393 3.817s 2.100s
New media
The Huffington Post 152 6.945s 4.044s
A single post on HuffPo 152 7.563s 4.416s
TechCrunch 357 16.462s 14.760s
An individual post on TechCrunch 357 10.987s 6.764s
ReadWriteWeb 1,996 17.766s 11.643s
A single post on ReadWriteWeb 1,996 18.890s 10.408s
Big media
Google News 1 3.253s 1.365s
Single story on Google News 1 2.688s 1.238s
Yahoo! News 4 3.174s 1.707s
Single story on Yahoo! News 4 5.310s 2.335s
New York Times 91 7.038s 2.200s
Single single story on NYT 91 4.650s 4.507s
BBC 40 7.485s 3.427s
Single story on BBC 40 7.148s 3.851s
CNN 58 10.197s 2.528s
An Article on CNN 58 7.989s 3.523s
MSNBC 9 8.172s 6.638s
Single story on MSNBC 9 12.064s 4.191s
The Washington Post 316 9.878s 3.828s
Single story on WaPo 316 15.435s 4.530s
ABC News 492 14.761s 4.205s
Single story on ABC News 492 8.697s 3.037s
RollingStone 3,376 18.700s 3.572s
An article on RollingStone 3,376 10.707s 2.905s
Fox News 201 26.755s 8.761s
Single story on Fox News 201 16.281s 5.839s

The Sajal Kayan award for excellence in web development goes to FOX News
Test conditions :-

  • First View is simulated to act as the first time the visitor visits the website, the Repeat view is when the user visits the website again after a while, probably closes and reopens the browser in between.
  • All tests were done on Internet Explorer 8. Yes, I know you hate IE, I hate it more than you! But since the majority of the web uses this browser we test on IE8. Also IE8 performs significantly better than its predecessors. Some of the sites tested may take upto 2x the time to load on IE7 due to lower parallel downloads per host.
  • The tests were done from a test machine in Dulles, VA Provided by AOL set to work at DSL speeds(1500 Kbps down, 384 Kbps up and 50ms latency)
  • The timings above show when the window.onload event was triggered. It is possible that the page was usable well before that. Very rarely is also possible that for the page needs to do more stuff after onload in order to be usable.
  • Each test was repeated 5 times and its average was taken
  • I have tried to be as fair as possible in selection of pages for the test. I didnt use pages filled with videos/many images, etc.
  • Tests were conducted during this weekend(July 3rd and 4th) so traffic to those site would be much lower than usual.

You can see from the table above, that except for Google News, Yahoo! News and (to some extent) The New York Times and BBC, all other websites have not yet given a minute of thought on Frontend performance of their web pages. This needs to change. Even one of the biggest evangalist of Frontend performance Google, forgot to set correct cache control headers for images on Google News.

I place more emphasis on individual story pages since those are the pages which a first time user would encounter first - coming via search, links, tweets etc…

Now, when browsing un-optimized sites from Thailand, the load times increased exponentially due to a much higher latency and occasional packet loss.

Finally! compare each of the above link visually in IE7 in video mode. WaPo single story is missing due to a bug. For each URL, 3 first-view tests will be run from ‘Dulles, VA - IE7′ and the median run will be used for comparison.

How to improve?

Essential reading :-

An optimizers goal should not be simply to get the total load time low. Thats important, but more important is the time in which the website kinda becomes usable. This for a news publishing site is being able to display the title and body of the story. The user can start reading the content while other parts of the website loads. This for Thaindian News story page hovers around 0.9 to 1.5 seconds!

In a CDN’d world, OpenDNS is the enemy!

May 18th, 2010

While many people are happy with using DNS service providers such as OpenDNS, Google, etc… I will show you here why they may not produce optimal results.

The way most CDNs work is by using DNS routing. When a user attempts to resolve a hostname, the CDN’s DNS server responds with an IP which is closest based on the IP address of the requester. A more detailed insight into the workings of a CDN can be found on an earlier post “Make your own cheap charlie CDN

For my test here, I tested from the following locations listed below :-

  1. True - Thailand : My personal internet connection provided by the ISP called True Internet.
  2. Softlayer - United States : A server hosted at Softlayer’s Washington DC Datacenter.
  3. EC2 - United States : An EC2 instance at Amazons us-east-1c availability zone.
  4. EC2-EU - Ireland : An EC2 instance at Amazons eu-west-1 availability zone. - Thanks Luke
  5. EC2-APAC - Singapore : An EC2 instance at Amazons ap-southeast-1a availability zone.
  6. Com Hem - Sweden : An ISP in sweden. - Thanks Adam
  7. Tata - India : An ISP in India. - Thanks Angsuman

The following DNS servers were used to resolve the domains :-

  1. OpenDNS (208.67.222.222 , 208.67.220.220 )- Has different caches in multiple locations(Anycasted) - Chicago, Illinois, USA; Dallas, Texas, USA; Los Angeles, California, USA; Miami, Florida, USA; New York, New York, USA; Palo Alto, California, USA; Seattle, Washington, USA; Washington, DC, USA; Amsterdam, The Netherlands and London, England, UK
  2. Google Public DNS (8.8.8.8 , 8.8.4.4 ) - “Google Public DNS servers are available worldwide” . I think Google has their DNS servers in all countries where they have hosting infrastructure.
  3. Local DNS - The ISP provided DNS in the different locations.

The test was done to the following CDN providers :-

  1. Internap ( cdn.thaindian.com ) - Uses DNS routing. POPs (Point Of Presence) in the following locations : Atlanta; Boston; Chicago; Dallas; Denver; El Segundo; Houston; Miami; New York; Philadelphia; Phoenix; San Jose; Seattle; Washington, DC; Sydney; Tokyo; Singapore; Hong Kong; Amsterdam; London
  2. Akamai ( profile.ak.fbcdn.net ) - AFAIK they have a POP in almost all countries including Thailand. Note: Akamai does not entertain sales queries from Thai companies.

Results:-

1) Internap ( using cdn.thaindian.com )

Location Opendns Google Local
IP Returned Ping to IP (ms) IP Returned Ping to IP (ms) IP Returned Ping to IP (ms)
True (Thailand) 64.94.126.65 256 74.201.0.130 365 203.190.126.131 152
Softlayer (US-East Coast) 69.88.152.250 1.253 74.201.0.130 25.69 69.88.152.250 1.388
EC2 (US-East Coast) 69.88.152.250 2.144 74.201.0.130 20.229 69.88.152.250 2.094
EC2 (Europe) 77.242.194.130 13.331 64.7.222.130 159.422 77.242.194.130 12.504
EC2 (Singapore) 64.94.126.65 202 74.201.0.130 228 202.58.12.98 37.260
Com Hem (Sweden) 77.242.194.130 40.035 64.7.222.130 189.647 69.88.148.130 36.310
Tata (India) 64.7.222.130 313.2 64.74.124.65 304.1 203.190.126.131 150

2) Akamai ( using profile.ak.fbcdn.net )

Location Opendns Google Local
IP Returned Ping to IP (ms) IP Returned Ping to IP (ms) IP Returned Ping to IP (ms)
True (Thailand) 208.50.77.112 239.4 60.254.185.83 138.9 58.97.45.59 18.88
Softlayer (US-East Coast) 72.246.31.57 1.312 72.246.31.42 1.262 24.143.196.88 0.877
EC2 (US-East Coast) 72.246.31.73 2.581 72.246.31.25 1.792 72.247.242.51 1.941
EC2 (Europe) 195.59.150.139 13.449 92.122.207.177 29.022 195.59.150.138 13.516
EC2 (Singapore) 208.50.77.94 202 60.254.185.73 71.7 124.155.222.10 7.052
Com Hem (Sweden) 217.243.192.8 51.73 92.123.69.82 35.972 92.123.155.139 13.212
Tata (India) 209.18.46.113 300 203.106.85.33 196 125.252.226.58 100.5

The ping timings represent the lag to the destination server from the location in question. I will try to update the results from more locations if I can get shell access to a server or PC in other countries. If you are willing to run the tests for me please contact me(or post in comments)

Conclusion

Using OpenDNS or Google Public DNS may be fast in resolving the DNS, but they do not give the ideal results.

In the case of Global DNS providers, the IP of the original requester is not passed along to the CDN’s DNS servers so they are unable to route the user to the nearest POP.

As you can see in the result tables above, when using OpenDNS from Thailand, trying to access static assets of Facebook, I am directed to a server in the USA whereas when using Google’s DNS i am directed to a server in Japan and when using my ISP’s DNS server I access content locally, hosted within my own ISPs network!

While the effect on large websites using CDN is significant, smaller non-CDN’d websites are also effected. Most websites embed widgets, advertising and other assets which are likely to be CDN’d.

The solution would be to use your ISPs DNS server rather than these Global providers. If they really suck so bad, its fairly simple to set up BIND as a caching recursive resolver to resolve hostnames directly bypassing the ISPs crappy service.

Bill Fumerola, ex-director of network engineering at OpenDNS confirms this problem on OpenDNS forums.

You can run the tests from your own computer using this simple script: dnstest.py

Here is the named.conf for a recursive server. Set your computer to use 127.0.0.1 as the DNS. - config may differ for you, RTFM and adapt accordingly.

options {
        directory "/var/named";
        listen-on {
		127.0.0.1;
        };
        auth-nxdomain yes;
        allow-recursion {
                127.0.0.1;
        };
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";

};

//
// a caching only nameserver config
//
zone "." {
        type hint;
        file "named.ca";
};

include "/etc/named.rfc1912.zones";

include "/etc/named.dnssec.keys";
include "/etc/pki/dnssec-keys/dlv/dlv.isc.org.conf";

EDIT 1: Inverted the axis added test data from Europe
EDIT 2: Added test data from Singapore
EDIT 3: Added test data from Sweden
EDIT 4: Added test data from India
EDIT 5: Added link to Bill Fumerola’s explanation of the problem.

Simple command to “watch” the webserver access log

April 2nd, 2010

I am often curious as to what bots are going on my site at any given moment. So much so that I devote one terminal tab to running this script.

save the following as say bot.sh and make it executable :-

  1. #!/bin/bash
  2. watch "grep $1 /path/to/access.log | tail -15"

note: the number after tail can be adjusted depending on your terminal size…

Run it on the server as :-

  1. [user@server ~]# ./bot.sh Googlebot

OR

  1. [user@server ~]# ./bot.sh msnbot

OR

  1. [user@server ~]# ./bot.sh <suspicious ip address>

and so on….

UPDATE: Better alternative by willwill. Save as bot.sh:-

  1. #!/bin/bash
  2. watch "tail -f /path/to/access.log | grep $1"

Future releases of Firefox to speed page load time considerably?

January 20th, 2010

Living in Thailand has its fair share of disadvantages. The most prominent being bad internet and poor response times. In most cases, the packet shaping, caching and filtering mechanisms use by ISPs do more harm than good. A response from a US server may take anywhere between 100 to 1000 ms extra than it should (not counting the ping lag and server processing overhead, etc). These days, most websites integrate a lot of client side external scripts and APIs, lagging responses make for a horrible user experience.

Especially when within one ad code, I have a default ad code and that too has a default. This means, when an impression is trying to be filled, the ad network decides, if they can fill the impression based on parameters I set or not. If not, then they pass the impression down the chain to another network. It goes on until the end network. In my case the chain is mostly 3 networks. I cant increase it as it results in a poorer user experience.

Google Webmaster Tools

Recently, Google started showing average response times in Google Webmaster Tools so, Ive started worrying about these things more than I should.

On my site, I have 2 ad blocks(leaderboard, skyscraper and another block which shows up on individual story pages) which load up before the main content page. Recently I moved the ads to Google ad manager which has a wonderful way of debugging ad loading by adding ?google_debug to the end of the URL.

My first impression of Google ad manager was excellent. My page was no more held up while the ads loaded, but soon I realized thats not an admanager feature, it is firefox 3.5.8pre which is speeding things up.

Browsers Useragent : Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8pre) Gecko/20100116 Ubuntu/9.04 (jaunty) Shiretoko/3.5.8pre (Click image for full screenshot)

My tests on my laptop shows otherwise. (it runs 3.0.17).

Browsers Useragent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.16) Gecko/2009121601 Ubuntu/9.04 (jaunty) Firefox/3.0.17 (Click on image for full screenshot)

This does not speed up much on Chrome or IE too… They all show that the “Time the page is blocked fetching ads from Google” to range between 1000ms to 2500ms. The variation is irrelevant its due to network issues and ad server response times. But the bottom line is that these browsers do hold up the page while the ads load.

Maybe this is an improvement in the latest Ubuntu nightly build or a general improvement, whatever it is, the future is Firefox and they are fast!

So far, there has been no proper way to load ads such that they don’t block the rest of the page from loading. The 2 ways i know of are very ugly and I don’t like them :-

  1. Load the target adscript from a separate HTML file loaded via iframe - costs one extra request/ad code, may screw up ad targeting, etc.
  2. Place a blank hidden div in place of ad, load the ad in a hidden div below the actual content and then using javascript trickery swap contents of the hidden div with this ad div. - sounds ugly again. not a neat solution.

Of course there is a neat and ideal solution… which is to make your template in such a way(CSS absolute positioning or something) such that the HTML of the content appears before in the code than the ad javascript… but again this is cumbersome. Interesting discussion here.

In an ideal world, all ad networks would be banned from using document.write in their scripts and use some form of ajax to call the banner code after(or during) rest of the page has loaded. Its not 2001 anymore!

Here is what I request from you, open the following URL http://www.thaindian.com/newsportal/?google_debug then there should be 1 or 2 popups(maybe some browsers need to disable popup blocker). Look at the popup which resembles the screenshots above, and report your findings in the comments below. Be sure to wait for the main page to complete loading and don’t forget to include your full useragent. If you can upload screenshots somewhere then please drop their URLs in comments too.

The info i need, could be like the following example:-

User Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8pre) Gecko/20100116 Ubuntu/9.04 (jaunty) Shiretoko/3.5.8pre
Debug: -
7342 Information Time the page is blocked fetching ads from Google 0 ms
7343 Information Time the page is blocked rendering ads from Google 0 ms

Your useragent can be checked here.

Video of pageload on Google Chrome:-

Video of pageload on Firefox 3.6pre(Ubuntu build):-

UPDATE: I upgraded my main browser to Firefox 3.6 (Your User Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2pre) Gecko/20100120 Ubuntu/9.04 (jaunty) Namoroka/3.6pre) same results as 3.5.8pre its bloody fast and doesn’t stall the pageload waiting for ads.

UPDATE 2: Based on comment by Archit below, the speed improvement is not visible on 3.6rc2 . My conclusions are based on the nightly builds by the Ubuntu Mozilla Daily Build Team

UPDATE 3: Added videos

UPDATE 4: For my site I implemented the hidden div trick, so for now, all browsers will not notice the visual delay.

I, me and Solid State Drives

September 20th, 2009

Let me first explain my set-up before the upgrade. I use 2 computers, 1 desktop at office and a laptop at home or while traveling.

Laptop : Lenovo; core 2; 2 GB RAM; regular 5.4k rpm hard disk. Purchased abt 1.5 years ago.

Desktop : (costed same as laptop but purchased only 3 or 4 months ago) i7 CPU 920 @ 2.67GHz ; 6 GB DDR3 ; a kickass motherboard; 1TB Seagate HDD (ST31000528AS) - 7.2k rpm

After using the Desktop for most of the work, I was no more able to work from the Laptop which was significantly slower than the desktop. Having used Solid State Drive (SSD) on my server for few months as Mysql data directory, I decided to see how it could improve things on the laptop.

Over the last few months, I had saved enough to treat myself to some gadgets :)

Intel X25-M 80GB SSD

On reaching Fortune mall, I couldn’t find SSDs anywhere, no store there had heard of “Solid State Disk” or “SSD”. In fact one shopkeeper thought I wanted to buy SD cards. I only upgraded the memory to 4 GB. After this upgrade, things didn’t speed up much, just that it didn’t lag anymore after opening loads of applications.

I had lost all hope… and even started thinking in what other way to spend my gadget budget when Twitter came to the rescue in the form of a reply from my fav columnist which said “@sajal at fortune jet has some, but for components, zeer is better these days i feel. It swings.” .

Getting my hopes up.. I went to that shop, they had options between Intel X25-M(MLC) and few OCI brands. Since I don’t care about diskspace, I was actually looking for X25-E(SLC) which is in my server, but they hadn’t heard of it here in Bangkok, so I settled for a 80 GB X25-M costing me 13,500 Baht (approx 400 USD). Decision to go for Intel was highly influenced by AnandTech’s reviews. AnandTech has series of articles and benchmarks on SSD performance and benchmarks for various applications. (Dear AnandTech : Will you give me a job? All i need is a chance to play with cool things :P )

Disadvantages of using SSD in laptop:-

  1. Unlike before, now you can’t feel the vibration, noise from the drive, hence you don’t “feel” when your disk is being accessed or is it idling.
  2. The disk activity LED hardly lights up… SSDs idle much more than regular disk since the requested data is returned immediately, thus the transaction is completed before the LED can light up fully.
  3. The laptop(running Ubuntu 9.04) boots up in < 20 seconds including me logging in. This doesnt give me enough time to get coffee, pee, etc after reaching home and pushing the on switch.
  4. It is slightly thinner than a regular 2.5″ notebook HDD, hence it is placed in my laptop at a very slight angle. Maybe later i should put in a metal sheet or something to compensate.

Now, the performance of my laptop is in fact slightly better than the desktop. I am looking forward to the day when SSDs get more commoditized and we start seeing them compete with regular HDD in terms of cost per GB. - If i had the kind of money, id load up 10 64 GB Intel X25-Es in my main desktop :D

My conclusion is that the main bottleneck in laptops is Disk I/O. I guess within a year or sooner, we will see most mid-segment laptops coming with SSDs instead of currently HDD.