Caching Dynamic Content from Syndicated Web Sites (2002)

Caching Dynamic Content from Syndicated Web Sites (2002)

The Web is growing at a very steady rate and the number of active Web sites doubled in the course of 2001, reaching an estimate of 6.5 million. Dynamically generated documents constitute an increasing percentage of Web content and are also estimated to account for 85% of the volume of the Web. As providing content over the Web becomes common, performance plays the role of a key differentiator among the providers.

Whilst caching has been around well before the emergence of the Internet, it still plays an important role in masking latency and bandwidth hindrances. The cache needs to know when a dynamically generated page changes even a small part of its content, as it then needs to be refreshed. The Hypertext Transfer Protocol provides clear guidelines for proxies, and caching is allowed unless expressly prohibited by the page headers. This is always the case with dynamic pages, which are always tagged uncacheable; this is obviously due their proneness of running stale.

The solution implemented in this dissertation considers dynamic pages as a collection of autonomous components and employs a process of dissecting the HTML into such components that may then be cached separately. Inherent to this concept is also the groundwork for a strong personalisation framework. Users may specify the components they wish to view and pages with this content may be assembled on the fly using templates. The entire system is proposed as a plug-on to existing web sites that publish dynamic content and that want to improve their performance, and possibly introduce personalisation functionality.

Administrators assist the page dissection process by highlighting the components on a typical page and providing information about their refresh rate. An important difference between this solution and others that also make use of partial-page-caching is that no input or special markup is required from the page developers.

This dissertation explains the architecture behind the system and its implementation using a Java platform. All data needed by the system is stored in XML to provide cross-platform compatibility and interoperability.

Go to Citation on CiteSeer


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s