So I just got a new Web site up and running for this week’s digital history workset, and I thought I’d post a bit about how this stuff all works.
What is this site, exactly? There are several ways of answering that question. First of all, it’s a
URL – a Uniform Resource Locator – which is another way of saying it’s got an address. But there’s more to it than that; let’s break apart the URL of one of site’s pages – http://dighist15.kinias.org/contact – as an example. The first bit, before the colon, is the scheme, http. This is what tells us we’re looking at a Web site, as opposed to some other kind of Internet resource; it specifies the protocol we’re going to use to interact with the resource, which is in this case HTTP, the Hypertext Transfer Protocol. (Encrypted connections use HTTPS as a protocol, yielding https: URLs.) Next we have the hostname portion, //dighist15.kinias.org, which specifies that the site is being served by a machine (a “host”) called dighist15 in the kinias.org domain. Finally, we have the path to the resource we’re requesting, which is the /contact portion of the URL. This says to fetch a file called contact at the top of the site’s file structure (i.e., in the “root” directory).
Nowadays, of course, much of this is virtualized. There isn’t in fact any such machine as dighist15; if you were to check the DNS records for this host, you’d find that it’s simply an alias for another machine – siwa.kinias.org, which is a Dell server running the Debian GNU/Linux operating system and Apache Web server software. Apache allows a single machine to serve multiple Web sites, acting as multiple virtual hosts. In addition to my new dighist15 virtual host, I could also have a hypothetical foo.kinias.org living on same server, and even a foo.example.com or a quux.foobar.com. Via DNS aliasing, your Web browser’s request for any of these sites would be routed to the same server, but the Apache software would respond by sending you the content assigned to the site you’d requested.
There’s also no file actually called contact in the root directory of the site. On older Web sites you would typically have seen something like http://info.cern.ch/hypertext/WWW/TheProject.html, where the path portion of the URL ended with something-dot-html. That pointed to a HTML file living on the server, and Apache (or its predecessor) would simply grab that file and dump it to your browser over the HTTP connection. In this case, though, the actual file here is contact.php, which is a PHP file which Apache’s mod_php module parses to generate the actual HTML code which is served up to your browser. In most more complicated Web sites today, the pages don’t live in files at all, but are stored as data in a MySQL or similar database, and Apache plugins do all kinds of back-end processing to generate the HTML on the fly.