<?php
// Define a test HTML document containing two tables. Note that the first table
// has a row that combined <th> and <td> cells, while the second one only has
// <th> cells.
$xhtml = <<<XHTML
<html>
<head>
<title>Table test page</title>
</head>
<body>
<table>
<tr>
<th>Cell 1</th>
<td>Cell 2</td>
</tr>
</table>
<table>
<tr>
<th>Cell 1</th>
<th>Cell 2</th>
</tr>
</table>
</body>
</html>
XHTML;
$document = new DOMDocument();
$document->loadXML($xhtml);
$xpath = new DOMXPath($document);
$xpath = function ($query) use ($xpath) {
echo $query . "\n";
$result = $xpath->query($query);
if (method_exists($result, 'count')) {
$count = $result->count();
} else {
// Fallback for older PHP versions that don't have DomNodeList::count().
$result_array = iterator_to_array($result);
// Filter out any bogus results that equal `NULL`.
$result_array = array_filter($result_array);
$count = count($result_array);
}
echo $count . ' result' . ($count != 1 ? 's' : '') . ($count ? ':' : '') . "\n";
foreach ($result as $element) {
$document = new DOMDocument();
$document->appendChild($document->importNode($element->cloneNode(TRUE), TRUE));
echo $document->saveHTML();
}
echo "\n";
};
echo "I am using the following XPath expression to select table cells from\n";
echo "the first row of the first table, by combining the results for <th>\n";
echo "and <td>. This works fine, I get both cells back.\n";
echo "Note that the first table combines <th> and <td> in a single row.\n";
$xpath('(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)');
echo "When I try to get the first cell by appending [1] to the expression\n";
echo "I get the second cell back instead of the first. Why?\n";
echo "The expected result of this expression is '<th>Cell 1</th>'\n";
$xpath('(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[1]');
echo "Appending [2] yields the second cell successfully. But why not the first?\n";
$xpath('(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[2]');
echo "Similarly, for the second table I can get all cells with this expression.\n";
echo "The second table only contains <th> cells in the first row.\n";
$xpath('(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)');
echo "However, when I append [1] to get only the first cell back, I get an empty result.\n";
echo "The expected result for this expression is '<th>Cell 1</th>'\n";
$xpath('(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[1]');
echo "The second cell can be retrieved successfully by appending [2].\n";
echo "But why not the first?\n";
$xpath('(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[2]');
echo "Note that by replacing the first 'descendant-or-self' with '//' the expression\n";
echo "seems to work as expected in all cases for both the first and second table:\n";
$xpath('(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)');
$xpath('(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[1]');
$xpath('(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[2]');
$xpath('(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)');
$xpath('(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[1]');
$xpath('(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[2]');
I am using the following XPath expression to select table cells from
the first row of the first table, by combining the results for <th>
and <td>. This works fine, I get both cells back.
Note that the first table combines <th> and <td> in a single row.
(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)
2 results:
<th>Cell 1</th>
<td>Cell 2</td>
When I try to get the first cell by appending [1] to the expression
I get the second cell back instead of the first. Why?
The expected result of this expression is '<th>Cell 1</th>'
(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[1]
1 result:
<td>Cell 2</td>
Appending [2] yields the second cell successfully. But why not the first?
(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[2]
1 result:
<td>Cell 2</td>
Similarly, for the second table I can get all cells with this expression.
The second table only contains <th> cells in the first row.
(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)
2 results:
<th>Cell 1</th>
<th>Cell 2</th>
However, when I append [1] to get only the first cell back, I get an empty result.
The expected result for this expression is '<th>Cell 1</th>'
(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[1]
0 results
The second cell can be retrieved successfully by appending [2].
But why not the first?
(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[2]
1 result:
<th>Cell 2</th>
Note that by replacing the first 'descendant-or-self' with '//' the expression
seems to work as expected in all cases for both the first and second table:
(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)
2 results:
<th>Cell 1</th>
<td>Cell 2</td>
(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[1]
1 result:
<th>Cell 1</th>
(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[2]
1 result:
<td>Cell 2</td>
(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)
2 results:
<th>Cell 1</th>
<th>Cell 2</th>
(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[1]
1 result:
<th>Cell 1</th>
(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[2]
1 result:
<th>Cell 2</th>