Using PHP and Apache, you can turn your "Page Not Found" messages into more than bland error reports. You can serve an alternate page based on the name of the page that was not found, create a page on the fly from a database, or send an email about the missing page to a webmaster.
');
//-->
Building a custom error page with PHP and Apache requires two steps. You need to tell Apache to run a PHP program when it encounters a 404 ("Page Not Found") error. And you need to write the corresponding program that takes the appropriate action.
Configuring Apache
To tell Apache what to do on a 404 error, use the ErrorDocument directive:ErrorDocument 404 /error-404.php
This tells Apache to serve up error-404.php in the document root directory when it encounters a 404 error. The ErrorDocument directive can go in Apache's httpd.conf file, but it also works in .htaccess files in individual directories. You can have a site-wide error-handling page or different error-handling pages for different parts of your site. Apache also sets some server variables that the error-handling page can access:
Related Reading
PHP CookbookBy David Sklar, Adam Trachtenberg
Table of ContentsIndexSample ChapterRead Online--Safari Search this book on Safari:
Only This Book All of SafariCode Fragments only
REDIRECT_URL: the URL-path that was not found. If a user asks for the nonexistent page http://www.example.com/lunch/pastrami.html, for example, this variable is set to /lunch/pastrami.html.
REDIRECT_STATUS: the HTTP response status resulting from the request for the original page. In our case, this is always "404". You can use ErrorDocument with other status codes, though, so if you have one error-handling page for multiple statuses, you can use this variable to determine which error status caused the error-handling page to be loaded.
REDIRECT_ERROR_NOTES: a brief description of what went wrong, for example, "File does not exist: /usr/local/apache/docroot/lunch/pastrami.html".
REDIRECT_REQUEST_METHOD: the method of the request for the original page, such as GET or POST.
If there is a query string in the original request, it is stored in REDIRECT_QUERY_STRING. The error page does not have access to the GET or POST variables via $_GET, $_POST, or $_REQUEST, but cookie variables are still available in $_COOKIE.
These REDIRECT variables are available in the PHP superglobal array $_SERVER: $_SERVER
['REDIRECT_URL'], $_SERVER['REDIRECT_STATUS'], and so forth.
Taking Action
The information in the REDIRECT variables can be used to do many different things in response to a request for a nonexistent page. If your site has been recently reorganized, you can transparently redirect users to the new URL that corresponds to a particular old URL: '/new/2.html',
'/old/2' => '/new/3.html');
if (isset($map[$_SERVER['REDIRECT_URL']])) {
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$map[$_SERVER['REDIRECT_URL']];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
A redirect response needs to include the query string in the redirect URL if the query string was present in the original request. Redirects always use the GET method. Including the query string preserves any GET variables from the original request, but POST data is lost.
Additionally, the protocol and host name need to be at the beginning of the redirect URL sent with the Location header. This example hardcodes "http" as the protocol and gets the host name from the HTTP_HOST server variable. To work transparently under https as well as http, your code should test for the presence of $_SERVER['HTTPS']. If this variable is set to "on", then the protocol should be "https" instead of "http".
Basic redirection could also be accomplished with a list of Apache Redirect or RedirectMatch directives, but you can construct more complicated expressions in PHP. You can easily redirect multiple old URLs to the same new URL:
array('/old-1.html',
'/old-2.html',
'/old-3.html'));
foreach ($rev_map as $new => $ar) {
foreach ($ar as $old) {
$map[$old] = $new;
}
}
if (isset($map[$_SERVER['REDIRECT_URL']])) {
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$map[$_SERVER['REDIRECT_URL']];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
You can look up the new URLs to which the old ones map in a database: '\_',
'%' => '\%'));
$r = mysql_query("SELECT new FROM pages
WHERE old LIKE '$old_page'");
if (mysql_numrows($r) == 1) {
$ob = mysql_fetch_object($r);
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] . $ob->new;
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
If you need to use values from $_SERVER['REDIRECT_QUERY_STRING'] into variables to determine the new URL, parse the query string with parse_str(). If $_SERVER['REDIRECT_QUERY_STRING'] is artist=weird+al&album=dare+to+be+stupid, then parse_str($_SERVER['REDIRECT_QUERY_STRING'],$vars) sets $vars['artist'] to "weird al" and $vars['album'] to "dare to be stupid".
You can even use the error document to make a simple caching system. If a page isn't found, get its contents from your database and write them to disk. Then, redirect the user to the same URL they just asked for. Since the page now exists, they'll get it, and not the error page: '\_',
'%' => '\%'));
// look for the page in the database
$r = mysql_query("SELECT page FROM pages
WHERE url LIKE '$url'");
if (mysql_numrows($r) == 1) {
$ob = mysql_fetch_object($r);
if ($fp = fopen($_SERVER['DOCUMENT_ROOT'] .
$_SERVER['REDIRECT_URL'],'w')) {
// write the page to disk
fwrite($fp,$ob->page);
fclose($fp);
// send the user back to the same URL
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$_SERVER['REDIRECT_URL'];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
// couldn't generate the page
print "This page is really not found.";
}
} else {
// couldn't find the page in the database
print "This page is really not found.";
}
?>
In this example, the entire contents of a page are stored in the page column of the pages table and are written to a file with fwrite(). You could do more interesting or complicated things when generating a page, like pull multiple pieces of the page from different places or populate a template with dynamic data. However you generate the page, publishing a new version of it is easy. Just update the database and delete the file from disk. The next time a user asks for that page, it won't be found. The error-handling page will load the updated page (or its components) from the database and write the new version to a file.
If you're sending a user to a new PHP page, it's important to use a redirect instead of just loading the page with include(). The error page doesn't have GET or POST variables set, and some server variables are different (for example, $_SERVER['PHP_SELF'] points to the error page, not the original URL.) If you're sending the user to a static page, however, including content without a redirect can be useful. You can use an error-handling page to provide access to a library of files without keeping the files under the web server document root, for example:
If this error-handling page is set up for the root directory of http://www.example.com/, asking for http://www.example.com/EatIt sends you the file /usr/local/songs/e/eatit.mp3, if that file exists. Checking to see whether the output of realpath() begins with $file_root prevents a user from passing directory-changing strings like "/../" in the URL. If a file is found, the page sends the right status code and headers to tell the user that they're getting an MP3 file and then sends the contents of the song file.
The error-handling page doesn't just have to find a new page to send to users. It can notify the webmaster that a page is missing. You can use this to find out if your own site has bad links to itself:if (preg_match('{^http(s)?://'.$_SERVER['HTTP_HOST'].'}',
$_SERVER['HTTP_REFERER'])) {
ob_start();
print_r($_SERVER);
$data = ob_get_contents();
ob_end_clean();
mail($_SERVER['SERVER_ADMIN'],
'Page Not Found: '.$_SERVER['REDIRECT_URL'],
$data);
}
The preg_match() statement finds referrer URLs that are on the same host as the current request by comparing the beginning of the referring URL to the $_SERVER['HTTP_HOST']. If they match, the output of print_r($_SERVER) is stored in $data using output buffering:
ob_start() tells PHP to capture output in a buffer instead of printing it.
ob_get_contents() returns the contents of that buffer.
ob_end_clean() turns off output buffering without printing the buffer.
The mail() function sends a message to the server administrator. The body of the message (all the $_SERVER variables in $data) contains the referring URL and other information that you can use to fix the page with the bad link.
More Information
Documentation for Apache custom error responses is at http://httpd.apache.org/docs/custom-error.html. The www.php.net site uses a custom error response to turn handy shortcut URLs like http://www.php.net/xml into the correct URL for the XML section of the manual. You can see the source code to it at http://cvs.php.net/co.php/phpweb/error/index.php?r=HEAD.
Taking Action
The information in the REDIRECT variables can be used to do many different things in response to a request for a nonexistent page. If your site has been recently reorganized, you can transparently redirect users to the new URL that corresponds to a particular old URL: '/new/2.html',
'/old/2' => '/new/3.html');
if (isset($map[$_SERVER['REDIRECT_URL']])) {
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$map[$_SERVER['REDIRECT_URL']];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
A redirect response needs to include the query string in the redirect URL if the query string was present in the original request. Redirects always use the GET method. Including the query string preserves any GET variables from the original request, but POST data is lost.
Additionally, the protocol and host name need to be at the beginning of the redirect URL sent with the Location header. This example hardcodes "http" as the protocol and gets the host name from the HTTP_HOST server variable. To work transparently under https as well as http, your code should test for the presence of $_SERVER['HTTPS']. If this variable is set to "on", then the protocol should be "https" instead of "http".
Basic redirection could also be accomplished with a list of Apache Redirect or RedirectMatch directives, but you can construct more complicated expressions in PHP. You can easily redirect multiple old URLs to the same new URL:
array('/old-1.html',
'/old-2.html',
'/old-3.html'));
foreach ($rev_map as $new => $ar) {
foreach ($ar as $old) {
$map[$old] = $new;
}
}
if (isset($map[$_SERVER['REDIRECT_URL']])) {
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$map[$_SERVER['REDIRECT_URL']];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
You can look up the new URLs to which the old ones map in a database: '\_',
'%' => '\%'));
$r = mysql_query("SELECT new FROM pages
WHERE old LIKE '$old_page'");
if (mysql_numrows($r) == 1) {
$ob = mysql_fetch_object($r);
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] . $ob->new;
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
If you need to use values from $_SERVER['REDIRECT_QUERY_STRING'] into variables to determine the new URL, parse the query string with parse_str(). If $_SERVER['REDIRECT_QUERY_STRING'] is artist=weird+al&album=dare+to+be+stupid, then parse_str($_SERVER['REDIRECT_QUERY_STRING'],$vars) sets $vars['artist'] to "weird al" and $vars['album'] to "dare to be stupid".
You can even use the error document to make a simple caching system. If a page isn't found, get its contents from your database and write them to disk. Then, redirect the user to the same URL they just asked for. Since the page now exists, they'll get it, and not the error page: '\_',
'%' => '\%'));
// look for the page in the database
$r = mysql_query("SELECT page FROM pages
WHERE url LIKE '$url'");
if (mysql_numrows($r) == 1) {
$ob = mysql_fetch_object($r);
if ($fp = fopen($_SERVER['DOCUMENT_ROOT'] .
$_SERVER['REDIRECT_URL'],'w')) {
// write the page to disk
fwrite($fp,$ob->page);
fclose($fp);
// send the user back to the same URL
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$_SERVER['REDIRECT_URL'];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
// couldn't generate the page
print "This page is really not found.";
}
} else {
// couldn't find the page in the database
print "This page is really not found.";
}
?>
In this example, the entire contents of a page are stored in the page column of the pages table and are written to a file with fwrite(). You could do more interesting or complicated things when generating a page, like pull multiple pieces of the page from different places or populate a template with dynamic data. However you generate the page, publishing a new version of it is easy. Just update the database and delete the file from disk. The next time a user asks for that page, it won't be found. The error-handling page will load the updated page (or its components) from the database and write the new version to a file.
If you're sending a user to a new PHP page, it's important to use a redirect instead of just loading the page with include(). The error page doesn't have GET or POST variables set, and some server variables are different (for example, $_SERVER['PHP_SELF'] points to the error page, not the original URL.) If you're sending the user to a static page, however, including content without a redirect can be useful. You can use an error-handling page to provide access to a library of files without keeping the files under the web server document root, for example:
If this error-handling page is set up for the root directory of http://www.example.com/, asking for http://www.example.com/EatIt sends you the file /usr/local/songs/e/eatit.mp3, if that file exists. Checking to see whether the output of realpath() begins with $file_root prevents a user from passing directory-changing strings like "/../" in the URL. If a file is found, the page sends the right status code and headers to tell the user that they're getting an MP3 file and then sends the contents of the song file.
The error-handling page doesn't just have to find a new page to send to users. It can notify the webmaster that a page is missing. You can use this to find out if your own site has bad links to itself:if (preg_match('{^http(s)?://'.$_SERVER['HTTP_HOST'].'}',
$_SERVER['HTTP_REFERER'])) {
ob_start();
print_r($_SERVER);
$data = ob_get_contents();
ob_end_clean();
mail($_SERVER['SERVER_ADMIN'],
'Page Not Found: '.$_SERVER['REDIRECT_URL'],
$data);
}
The preg_match() statement finds referrer URLs that are on the same host as the current request by comparing the beginning of the referring URL to the $_SERVER['HTTP_HOST']. If they match, the output of print_r($_SERVER) is stored in $data using output buffering:
ob_start() tells PHP to capture output in a buffer instead of printing it.
ob_get_contents() returns the contents of that buffer.
ob_end_clean() turns off output buffering without printing the buffer.
The mail() function sends a message to the server administrator. The body of the message (all the $_SERVER variables in $data) contains the referring URL and other information that you can use to fix the page with the bad link.
More Information
Documentation for Apache custom error responses is at http://httpd.apache.org/docs/custom-error.html. The www.php.net site uses a custom error response to turn handy shortcut URLs like http://www.php.net/xml into the correct URL for the XML section of the manual. You can see the source code to it at http://cvs.php.net/co.php/phpweb/error/index.php?r=HEAD.
Tidak ada komentar:
Posting Komentar