The architecture of the internet is very complex. Topology-wise, it’s a robust and redundant network of interconnected servers, routers, switches, and endpoints. The whole system exists to serve one essential purpose. That purpose is to facilitate data transfers from point to point at the highest speed possible.
The public face of the internet, called the World Wide Web, has experienced a tremendous spike in usage over the last twenty years. Only an estimated 16 million people had access to the web in 1995, compared to 3.8 billion today. As traffic increased, the burden on internet systems grew as well. Several technologies were employed to keep pace with the demand, but one that often goes unnoticed is the widespread use of a concept called web caching.
For those who aren’t familiar with the term, web caching refers to the storage of website data on local systems or in an otherwise decentralized fashion to increase page delivery efficiency. Over the years, a few variations evolved as the types of web content has grown and changed.
Initial web caching systems existed on local networks or exclusively in the browser and were intended to combat the glacially slow connection speeds that used to be the norm. As broadband access became more commonplace and web content became more complex, distributed caching via geolocated internet nodes began to see wider use. There are several similarities between these approaches, but there are key differences as well. To clarify, here’s a rundown of web caching technologies and how they work.
Browser Caching
When users around the world first started connecting to the World Wide Web, most of them reached the internet via dial-up modems. The limitations of that technology meant that slow data throughput was a universal impediment to designing complex or multimedia-rich websites. To mitigate the issue, web browsers have a built-in data cache that allowed for local storage of website content on users’ computers.
The system is designed to interpret and act upon website header data, which provides instructions to each browser on what to store on the local hard drive, and how long it should remain valid. The time between validity checks (when the browser checks in with the web server to see if there’s been an update) depends on the type of cached content.
Local Caching Proxy Servers
For large networks, a more centralized approach to web caching is sometimes preferred. It is for those applications that caching proxy servers have been developed. They function like local browser caches, but with a twist.
The major advantage of a caching proxy server is that it prevents duplicative requests and bandwidth usage across entire local networks. Since most local area networks consist of machine interconnections that are orders of magnitude faster than the network’s connection to the internet, serving the most requested web content from a local server makes for much faster page loading times.
Content Delivery Networks
As websites grew larger and started to include rich multimedia content, the concept of web caching was expanded on a global level. Companies began to utilize caching servers that were set up based upon physical location. Keeping these servers close to the largest clusters of users allows the cached content that they store to be delivered faster than it would have been via a round trip to and from the original web server. These large deployments of caching servers formed what is now referred to as content delivery networks (CDN).
In a way, CDN servers operate in a similar fashion to local network caching proxy servers. They store the most requested website data at locations that are closer along the network path to the end user. The data that they store, however, is determined solely by the origin web server itself, much like in a local browser cache.
The other major benefit that CDN caching technology provides is resiliency against denial of service attacks. Websites that utilize CDN services deliver content to users in a decentralized manner and direct user requests to an origin server are very rarely permitted.
The Backbone of the Web
The modern World Wide Web couldn’t exist without the caching technologies outlined above. Overall internet traffic continues to increase, and the demands on web servers are rising in proportion to that traffic. The functionality of the internet has become dependent on web caching services that are deployed locally and at a global level.
It’s impossible to estimate how much traffic is web caching saves annually, but the scale of its use means the quantity must be vast. Since the internet has become a vital and valuable part of the global economy as well as a fixture in the daily lives of users everywhere, it’s not a stretch to say that web caching technology is the backbone of the today’s world wide web, and will be for years to come.