Make your own cheap charlie CDN

June 28th, 2008 | by Sajal Kayan |

CDN stands for Content Delivery(or Distribution) Network. It is a network of servers usually located in various geographic locations to improve the availability and access speed of a website (or webapp or other media). The main use of CDNs were during the nineteens when inter-continental access was slow, scarce and expensive. Origin server could be in the silicon valley, users from UK would access the node located in UK, so in theory only once the page would be downloaded from the US server to the UK server. Thus allowing the UK visitors to access the page locally resulting in a huge saving in inter-continental bandwidth costs and improved access times for the end users.

CDNs are traditionally a very expensive solution to implement if using any of the established providers. The solution in itself is not very complex. I am in the process of implementing my very own CDN. The benefits are simple.

  • It would reduce load on the origin server
  • Faster access if user downloads pages from a server closer to them
  • Since load on origin server is low, faster access even if cache needs to be refreshed

Some major portions of the CDN I am looking to implement

  1. Origin Server(s)
  2. Geo targeting DNS servers
  3. Squid Cache - Set as Reverse Proxy or web accelerator

Origin Server : This at the moment is a single server, which may be increased to run mysql and apache on separate boxes to increase productivity. This is up and running and in production.

Geo targeting DNS servers : A perl script making use of Geo::IP and Net::DNS::Nameserver modules to resolve the query based on the origin country of the requester. The DNS script is in very early development. At the moment it is basically the example usage of Net::DNS::Nameserver with the Geo::IP loosely implemented. Need to implement some way of using config files which flush every 10 minutes, so I can use a series of servers runing this script and changes in config can be done on one server and rsynced across all nameservers running this script. The perl script is attached in the end. It would resolve foo.example.com differently if the request came from Malaysia.

Squid Cache
: Squid cache is an Free proxy server which can be setup as a Reverse proxy. Users would query this proxy for pages and the proxy would deliver content, flushing the cache based on the rules defined in the squid.conf file and/or the expires headers tag set by the origin server. It can be setup such that different filetypes are cached in a different manner. Different url patterns need to be cached differently. Queries to some URL patterns by logged-in(based on cookies) users should be direct from origin server. These configurations are a little complicated. My plan for this configuration is attached in the end.

The Geo Dns and Squid would be installed on cheap VPS in a few countries. Will start with one to see how well it scales.

EDIT 1 : Playing with Varnish at the moment, considering it over squid.

The Geo Dns perl script :

  1. #!/usr/bin/perl
  2.  
  3. use Geo::IP;
  4. use Net::DNS::Nameserver;
  5. use strict;
  6. use warnings;
  7.  
  8. sub reply_handler {
  9. my ($qname, $qclass, $qtype, $peerhost,$query) = @_;
  10. my ($rcode, @ans, @auth, @add);
  11.  
  12. my $gi = Geo::IP->new(GEOIP_STANDARD);
  13. print "Received query from $peerhost\n";
  14. my $ipsubstr $peerhost, 7;
  15. my $country = $gi->country_code_by_addr($ip);
  16. print "–$ip–$country \n\n";
  17. $query->print;
  18.  
  19. if ($qtype eq "A" && $qname eq "foo.example.com" ) {
  20. my ($ttl, $rdata) = (3600, "10.1.2.3");
  21. if ($country eq "MY" ) {
  22. $rdata = "10.1.2.4";
  23. }
  24. push @ans, Net::DNS::RR->new("$qname $ttl $qclass $qtype $rdata");
  25. $rcode = "NOERROR";
  26. }elsif( $qname eq "foo.example.com" ) {
  27. $rcode = "NOERROR";
  28.  
  29. }else{
  30. $rcode = "NXDOMAIN";
  31. }
  32.  
  33. # mark the answer as authoritive (by setting the ‘aa’ flag
  34. return ($rcode, \@ans, \@auth, \@add, { aa => 1 });
  35. }
  36.  
  37. my $ns = Net::DNS::Nameserver->new(
  38. LocalPort    => 53,
  39. ReplyHandler => \&reply_handler,
  40.  
  41. Verbose      => 1,
  42. ) || die "couldn’t create nameserver object\n";
  43.  
  44. $ns->main_loop;

My idea for squid.conf :

  1. 1) Forward proxy :-
  2.  
  3. Allow following IPs to browse any website without any caching… allow https also…
  4.  
  5. a.b.c.d
  6. w.x.y.z
  7. 127.0.0.1 (ill do a ssh tunnel)
  8.  
  9. 2) Reverse proxy :-
  10.  
  11. Rules (to be followed serially, if rule 3 and 5 both match, rule 3 should be used):-
  12.  
  13. 1) All urls ending in the following must be cached for minimum 5 days or expires headder. dont even check to see if file has updated.
  14.  
  15. .jpg, .gif, .css, .js, .swf, .png  (not case sensitive )
  16.  
  17. 2) POST should never be cached
  18.  
  19. 3) Few URLs should be cached for 30 mins (minimum/maximum) no matter what the expires headder says.
  20.  
  21. http://www.mysite.com/urla/
  22. http://www.mysite.com/urlc.html
  23. etc…
  24.  
  25. 4) Requests to http://www.mysite.com/sectiona/* :-
  26.  
  27. * If user has following cookies pass them direct hint : "acl cookie_test req_header Cookie ^.*(comment_author_|wordpress|wp-postpass_).*$"
  28. * http://www.mysite.com/sectiona/sub-section1/* : 30 mins cache no matter what!
  29. * If requester is Googlebot : serve from cache only if it the copy in cache is 5 mins old. else update the cache.
  30. * for other users cache urls ending in .html for 60 mins , rest for 20 mins
  31.  
  32. 5) Requests to http://www.mysite.com/sectionb/*
  33.  
  34. * no cache unless images.
  35. * Allow access only if user has a particular cookie e.g. secret_word=another_word
  36.  
  37. 6) urls which are NOT in point 4 or 5 :-
  38.  
  39. * If users have cookie eg. no_cache then pass direct
  40. * 5 min cache for http://www.mysite.com/sectionc/*
  41. * Cache the shit out of everything else for 30 mins
  42.  
  43. Special Instructions : If origin server is unreachable then show cached result, no matter what. The first cache server is a VPS running ubuntu server with 128 megs of dedicated non-burstable ram and has 4.3 GB diskspace left. resources can be upgraded on request. Where I have mentioned "no matter what" i dont want the proxy server bothering the origin server at all. The origin and proxies will be located far geographically so connection between them may not be optimal.
  44.  
  45. In case squid allows for URL rewriting, i would like to also map for example :-
  46. us.mysite.com -> www.mysite.com
  47.  
  48. so if user can access the same stuff by going to www.mysite.com or even us.mysite.com
  49.  
  50. Also if URL rewriting is possible in Squid, in the future id like to be able to map … http://www.mysite.com/subfolder as http://somesite.com/subfolder and http://www.mysite.com/anotherfolder as http://anothersite.com/anotherfolder
  51.  
  52. Also.. if squid supports ssl, would it be possible to use https (and also install some certificate on the squid) then users connection to and from the proxy is encrypted if needed, but the connection between squid and origin server is plaintext ?
Sphere: Related Content

  1. 7 Responses to “Make your own cheap charlie CDN”

  2. By todd on Jul 23, 2009 | Reply

    have you had anymore thought on this? or given it a try?

    I’ve got servers in thailand and the US and there’s 2 major issues with hosting sites for users of both countries.

    if we host a site in the US, the connection to the US from thailand is poor and slow. so that’s no good, and if we host a site in thailand i’ll be fast for our majority thai users, but slow for the other 30% who are outside of thailand.

    not just slow for the outside countries trying to access a thai website, super crazy slow. might as well not exist.

    so yeah, keen to know more about your idea, i’m going to be trying to build my own CDN of some sort soon.

    probably servers in thailand and the US, primary sql DB in thailand that’s clustered and replicated to the US, and deployment of sites will be dual deployed with capistrano.

    the main issue is geoip redirecting people, you dont want US users having to hit a thai dns first, so thinking of setting up some dns servers as well in thailand and the US and use dns load balancing with geo ip redirection…

  3. By Sajal Kayan on Aug 9, 2009 | Reply

    @todd

    Due to lack of time, this post is as far as i could get in building the cdn thingy…

    If your main issue is dns requests, then i dont think making a cdn would help. Even using commercial CDNs wouldnt help much.

    Your best bet would be to have DNS servers only in the US. (The Thai ISPs aggressively cache DNS information, so you should be fine).

    IMHO dont worry much about DNS request time as its usually very fast.. and quite likely only one query for an entire session of browsing…

  4. By todd on Aug 13, 2009 | Reply

    Hey, yeah, i’m interested in CDN’s but i’m more building a DNS redirection - redirecting traffic to the closest server.

    with ruby and capistrano we can deploy all the content to every server (US, UK, TH) so keeping them all in sync is no issue.

    just making sure traffic outside of thailand doesn’t load from the thai server, and vice versa.

    our other option is a CDN, run the site from a primary US server and load the onsite content from the local content servers. but that’s a bit more painful.

  5. By Mark on Oct 13, 2009 | Reply

    Would setting up DNS servers in multiple locations assist in the resolution times?

    eg. A local thai DNS server and one in the US? This would allow for localised requesting, however I know its not possible to direct the query first time to the local one.

    I am also looking at some kind of DNS resolver detecting the location of the requesting IP and returning the closest matched server.

    This would also be relatively easily to protect from a localised DOS attack or a digg/slashdotting event.

  6. By Sajal Kayan on Nov 6, 2009 | Reply

    @Mark

    to build a GeoDNS look at this excelent script here.
    http://www.icez.net/blog/373/geodns-for-bind-9-2

    It makes ACL rules for bind u can make it use diff zone file for diff countries.

    Moreover for having multiple DNS servers one solution to look at would be Anycast.

    If i just have a regular Thai and US DNS .. then it would be a major problem when a US visitor needs to lookup from a Thai DNS… Anycast solves that issue… In Anycast, One IP is mapped to multiple servers in multiple locations and when accessed it connects to the nearest server.

  7. By todd on Jan 11, 2010 | Reply

    you could use what we’re about to setup - http://comwired.com/

    they will geo your dns, so you can run a thai server for thai users, and then a us / uk servers for your international users.

    either single location your mysql, or use mysql replication. store all your images in the amazon cloud and store your applications locally.

  8. By Sajal Kayan on Jan 11, 2010 | Reply

    @todd: Interesting. I had been earlier searching for a service like comwired but couldnt find any…

    Now i may end prefer using bind. (Ive had a bad experience with an external DNS provider)

Post a Comment