Z. Wagner - Content negotiation in PHP scripts

The page on content negotiation shows basic principles and presents links to other pieces of interesting information. Now we will see how and for what purpose content negotiation can be used in connection with pages generated dynamically by PHP. There are these possibilities:

  1. Internationalization of PHP scripts
  2. Simulation of content negotiation in case that server does not support this standard or it is not allowed by the server administrator.

This document will describe both methods. We will deal only with PHP but similar approach can be used with other scripting languages as ASP, Perl, Python, Ruby etc.

1. Internationalization of PHP scripts

A PHP script is usually a mixture of program instructions using both standard and user defined functions and an HTML code. The greatest part is thus common for all language versions. We do not wish to create separate files distinguished by extensions for all languages as we did in case of static pages. We would like to make use of distinction by extension in order to make content negotiation with MultiViews work but we would like to share the program code. Copying the code from file to file would easily lead to errors and inconsistencies.

The PHP script shows texts in the WWW pages by means of the echo and printf functions. These functions work with strings which will have different values in each language version. Monolingual version uses string literals. We will, however, use variables or constants defined by the define function. The program code will then appear in a common file. Each language version will only define constants and variables dependent on the language and then read the common file by the require function. The multilingual text can look as:

Czech file named file.cs.php:

<?php
define(_title_, 'Sjednání obsahu');
require 'incl.file.php';
?>

English file named file.en.php:

<?php
define(_title_, 'Content negotiation');
require 'incl.file.php';
?>

Common file named incl.file.php:

<?php
printf('<title>%s</title>', _title_);
?>

The corresponding page will be referred to just as file without any extension. Apache will then select the appropriate variant.

Finally we must prohibit access to the common files because they would not work as such. This is achieved by this directive in the configuration file:

<FilesMatch "incl\.">
  Order allow,deny
  Deny from all
</FilesMatch>

Notice that incl. appears at the beginning of the name. If the common file were called file.incl.php and access were prohibited by a similar FilesMatch directive, then Apache when looking for file would write into the error log that access to this file was rejected due to configuration settings.

Now we explain the case when we wish to insert two large parts of the HTML code. We create two separate files file1.cs.html and file2.cs.html for the Czech version, similarly file1.en.html and file2.en.html for the English version. If we put virtual('file1'); and virtual('file2'); to proper places, Apache inserts correct files by means of content negotiation. Unfortunately, this does not always work. Remember that the correct language variant could not be found when searching for URL called file and the user selected the language version explicitely. Apache will react by the very same way to the request for virtual('file1'); and offer the list of useless links. The file for each language should define the string constant called _ext_ containing the correct extension and the HTML code will then be inserted by virtual('file1' . _ext_);.

Content negotiation in PHP scripts is made use of in the MemoDisx program. You can try its demo version. You can also download the MemoDisx distribution and look into the source codes how it is done.MemoDisx makes use of frames and is entered via the main page with this contents:

<!--#include virtual="html-head.html" -->
<!--#if expr="$DOCUMENT_URI != /\/index$/" -->
<body>
<p class="error">Please add "English" to the language preferences of your browser and then
<a target="_top" href="./">try again</a>, otherwise MemoDisx will not work properly.</p>

<p>More details can be found in the description of <a
href="http://hroch486.icpf.cas.cz/wagner/content-negotiation">Content Negotiation</a>.</p>

</body>
<!--#else -->
<frameset cols="15%,*">
  <!--#include virtual="noframes" -->
  <frame name="left" src="menu">
  <frame name="right" src="main">
</frameset>
<!--#endif -->
</html>

Variable $DOCUMENT_URI contains the requested document name. If the name does not contain any extension, it means that the language version was selected according to preferences set in the browser. If the name contains an extension, we know that the user selected the language explicitely. In such a case we would have to program everything more complex. Anyway, MemoDisx is an application intended for repeated use. We can thus afford to ask the user to spend more time with configuring her WWW browser.

2. Simulation of content negotiation

This document is not a programmer's toolkit which you can just take and use directly. The ami of the document is to explain the method. You can, of course, extract the code examples, put them to files and get working solution. You will have to replace character entities with corresponding characters. In order not to do it by hand, you can take advantage of e simple perl script (but first you have to edit it manually):

#!perl
while (<STDIN>) {
  s/&amp;/&/g;  s/&lt;/</g;  s/&gt;/>/g;  print;
}

It is supposed that global variables are registered. It is thus possible to use $HTTP_ACCEPT_LANGUAGE and $REQUEST_URI. If globals were not registered, it would be necessary to use $_SERVER['HTTP_ACCEPT_LANGUAGE'] and $_SERVER['REQUEST_URI'].

All links should point to the main file filename.php which chooses the correct language variant according to the rquest received from the WWW browser. The names of files with the language variants will not have several extensions. Their form will be filename-XXX_YYY.EXT where XXX and YYY specify the language and encoding, EXT is an extension. In the examples shown here the extension can be either php or html. The structure of the main file is simple:

<?php
require 'engine.php';
if ($inc) {
include $fn;
}
else virtual($fn);
exit;
?>

The action is executed in file engine.php. It returns the name of the file containing the requested language in variable $fn. If $inc is set, the file is a PHP script which should be processed by the include command, otherwise we use the virtual function.

File engine.php contains a code which is directly executed. It makes use of functions defined in file functions.php, therefore that file must first be included by the require command. None of these files must be run directly, therefore we put the following lines to the beginning of functions.php:

$bsn = basename($REQUEST_URI);
if ($bsn == 'engine.php' || $bsn == 'functions.php') die('Access not allowed!');

The program code in engine.php first finds the list of files. Their names are matched to the regular expression generated by the script.

# Find the list of the files, $re evaluated from $REQUEST_URI
$rq = $REQUEST_URI;
if (substr($rq, -1) == '/') $rq .= 'index.php';
$re = '/^' . substr(basename($rq), 0, -4) . '-(.*)\./';
unset($flist);
$dirhandle = opendir('.');
while ($fn = readdir($dirhandle)) {
  if (preg_match($re, $fn, $m)) {
    $f = array('fn' => $fn, 'ext' => explode('_', $m[1]));
    $flist[] = $f;
  }
}
closedir($dirhandle);
if ($flist) sort($flist);

Now we find the list of acceptable languages. If the browser does not supply the list of languages we use our default similarly as Apache directive LanguagePriority.

# Get the list of acceptable languages
unset($acclang);
if ($HTTP_ACCEPT_LANGUAGE) {
  $acclang = explode(',', $HTTP_ACCEPT_LANGUAGE);
  for ($i = 0; $i < count($acclang);  $i++) {
    $L = explode(';', $acclang[$i]);
    $acclang[$i] = trim($L[0]);
  }
} else $acclang = array('en', 'cs');

The user can request specific language variant (e.g. en-zw) which is not available on the server. Thus if no language variant is found, we remove specifications of the variants and search again.

$xlang = FindLanguage($flist, $acclang);

# Remove language variants and find a file
if ($xlang < 0) {
  for ($i = 0; $i < count($acclang);  $i++) {
    $L = explode('-', $acclang[$i]);
    $acclang[$i] = trim($L[0]);
  }
  $xlang = FindLanguage($flist, $acclang);
}

The FindLanguage function is defined in file functions.php. We must, however, first define the list of known languages and list of encodings (e.g. the Czech language requires ISO-8859-2).

$languages = array(
  'cs' => 'CS',
  'en' => 'EN',
  'fr' => 'FR',
  'de' => 'DE'
);

$encodings = array(
  'iso2' => 'iso-8859-2'
);

# Function for finding a language
function FindLanguage($flist, $acclang) {
$lx = -1;
for ($j = 0;  $lx < 0 && $j < count($acclang);  $j++) {
  for ($i = 0;  $lx < 0 && $i < count($flist);  $i++) {
    $ext = $flist[$i]['ext'];
    for ($k = 0;  $lx < 0 && $k < count($ext);  $k++) {
      if ($acclang[$j] == $ext[$k]) $lx = $i;
    }
  }
}
return $lx;
}

If no acceptable variant is found, we display the list and exit the script.

# If no file found, display variants and exit, otherwise fill the file name
if ($xlang < 0) {
?>
<html><body>
<p>No acceptable variant was found. Your browser is set to require text in languages:
<code><?php echo $HTTP_ACCEPT_LANGUAGE; ?></code>
but only the following variants are available:</p>
<ul>
<?php
  for ($i = 0;  $i < count($flist);  $i++) {
    $ext = $flist[$i]['ext'];  $f = $flist[$i]['fn'];
    reset($languages);
    while (list($k, $v) = each($languages)) {
      for ($j = 0;  $f && $j < count($ext);  $j++) {
        if ($ext[$j] == $k) {
          echo "<li><a href=\"$f\">$v</a></li>\n";  $f = false;
        }
      }
    }
  }
?>
</ul>
<p>You can read more about <a title="Content negotiation" target="content_neg"
href="http://hroch486.icpf.cas.cz/wagner/content-negotiation.shtml.en">content negotiation</a> if
you wish to know how to setup your WWW browser correctly.</p>
<hr>
</body></html>
<?php
  exit;
}

Finally we assign values to variables and optionally ask Apache for sending some response header.

$fn = $flist[$xlang]['fn'];
EmitHeader($fn);
$inc = preg_match('/\.php$/', $fn);

The EmitHeader function is also defined in file functions.php. Some language wariants specify required character encoding. It is then necessary to send a response header with corresponding information. File functions.php assigns such information to extensions.

$headers = array(
  'iso2' => 'Content-Type: text/html; charset=iso-8859-2'
);

# Function for emitting a header
function EmitHeader($fn) {
global $headers;
if (preg_match('/-([^-]+)\./', $fn, $m)) {
  $e = explode('_', $m[1]);
  while (list($k, $v) = each($e)) {
    if ($headers[$v]) header($headers[$v]);
  }
}}

Dynamically generated pages are not usually cached. As a matter of fact, our pages are static, we can therefore ask Apache to send information about expiration.

# Longer time units
define(HOUR, 3600);
define(DAY, 24 * HOUR);
define(MONTH, 30 * DAY);

# Expiration
function Expires($after = DAY) {
setlocale(LC_ALL, 'EN_US');
Header('Expires: ' . gmdate('D, d M Y H:i:s', time() + $after) . ' GMT');
}

The LangLinks function prepares links to other language variants.

# Links to languages
function LangLinks() {
global $fn, $REQUEST_URI, $languages;
echo '<div id="langlinks">&nbsp;';
$rq = $REQUEST_URI;
if (substr($rq, -1) == '/') $rq = index.php;
else $rq = basename($rq);
if (!$fn) $fn = $rq;
if (preg_match('/^([^-]+)-[^-]+$/', $fn, $m)) $rq = $m[1];
$re = '/^' . $rq . '-(.*)\./';
$dirhandle = opendir('.');
while ($f = readdir($dirhandle)) {
  if (preg_match($re, $f, $m)) {
    if ($f != $fn) {
      $ext = explode('_', $m[1]);
      while (list($k, $v) = each($ext)) {
        if ($languages[$v]) {
          $lang = $languages[$v];
          printf('<a title="%s" href="%s">%s</a>&nbsp;', $lang, $f, $lang);
        }
      }
    }
  }
}
closedir($dirhandle);
echo "</div>\n";
}

The links are written close to the end of the page but displayed on top of the page via a cascaded stylesheet so that indexing robots do not index the page by language names or abbreviations. The cascaded stylesheet contains:

h1 {
  font-size: 180%;
  margin-top: 2.5em;
  font-family: "Tms Rmn", "Times New Roman", "Times Roman", serif
}

#langlinks {
  position: absolute; top:0.5em; right:0;
  background-color: rgb(153,0,51);
  color: rgb(204,221,255);
  width: 100%;
  text-align: right;
  font-family: "Lucida Sans", Helvetica, Arial, sans-serif;
  font-weight: bold;
  font-size: 125%;
  padding-color: rgb(153,0,51);
  margin-color: rgb(153,0,51);
}

If the files containing the language variants are PHP scripts, their structure will be:

<?php
if (!function_exists('FindLanguage')) {
require 'functions.php';
EmitHeader(basename($REQUEST_URI));
}
Expires();
?>
<html>
<head>
...
<link rel="STYLESHEET" type="text/css" href="style.css" />
</head>
<body>
...
<?php LangLinks(); ?>
</body>
</html>

Such scripts work similarly as the standard method of content negotiation.

Friday, 09-Sep-2005 10:19:20 CEST