Monday, 2 January 2017

php - Can't separate cells properly with simplehtmldom

I am trying to write a web scraper. I want to get all the cells in a row. The row before the one I want has THOROUGHBRED MEETINGS as its plain text value. I can successfully get this row. But I can't figure out how to get the next row's children which are the cells or tags.



if ($foundTag = FindTagByText("THOROUGHBRED MEETINGS", $html))
{
$cell = $foundTag->parent();
$row = $cell->parent();

$nextRow = $row->next_sibling();
echo "Row: ".$row->plaintext."
\n";
echo "Next Row: ".$nextRow->plaintext."
\n";
$cells = $nextRow->children();

foreach ($cells as $cell)
{
echo "Cell: ".$cell->plaintext."
\n";
}
}


function FindTagByText($text, $html)
{
// Use Simple_HTML_DOM special selector 'text'
// to retrieve all text nodes from the document
$textNodes = $html->find('text');
$foundTag = null;

foreach($textNodes as $textNode)
{

if($textNode->plaintext == $text)
{
// Get the parent of the text node
// (A text node is always a child of
// its container)
$foundTag = $textNode->parent();
break;
}
}


return $foundTag;
}


Here is the html I am trying to parse:




THOROUGHBRED MEETINGS




BR SUNSHINE COAST
FINE/DEAD
R1@12:30pm
1
2
3

4
5

6
7
8
 



Here is my output:





Row: THOROUGHBRED MEETINGS
Next Row: BR SUNSHINE COAST FINE/DEAD R1@12:30pm 1 2 3 4 5 6 7 8 CR NEW ZEALAND FINE/DEAD R3@11:10am 1 2 3 4 5 6 7 8 9 DR HOBART OCAST/HVY R1@12:15pm 1 2 3 4 5 6 7 MR CRANBOURNE OCAST/SLOW R1@12:20pm 1 2 3 4 5 6 7 8 NR COFFS HARBOUR OCAST/SLOW R1@12:45pm 1 2 3 4 5 6 7 8 SR MORUYA FINE/GOOD R1@12:25pm 1 2 3 4 5 6 7 8 VR BENALLA OCAST/SLOW R1@12:35pm 1 2 3 4 5 6 7 8 XR KALGOORLIE FINE/GOOD R1@ 3:00pm 1 2 3 4 5 6 7 HARNESS MEETINGS DT LAUNCESTON SHWRY/GOOD R1@ 4:57pm 1 2 3 4 5 6 7 8 9 10 MT CRANBOURNE OCAST/GOOD R1@ 5:05pm 1 2 3 4 5 6 7 8 GREYHOUND MEETINGS AD GAWLER OCAST/GOOD R1@ 5:10pm 1 2 3 4 5 6 7 8 9 10 11 CD CANBERRA OCAST/GOOD R1@ 5:02pm 1 2 3 4 5 6 7 8 9 10 11 MD SALE FINE/GOOD R1@ 4:54pm 1 2 3 4 5 6 7 8 9 10 11 12
Cell: BR SUNSHINE COAST
Cell: FINE/DEAD
Cell: R1@12:30pm
Cell: 1 2 3 4 5 6 7 8 CR NEW ZEALAND FINE/DEAD R3@11:10am 1 2 3 4 5 6 7 8 9 DR HOBART OCAST/HVY R1@12:15pm 1 2 3 4 5 6 7 MR CRANBOURNE OCAST/SLOW R1@12:20pm 1 2 3 4 5 6 7 8 NR COFFS HARBOUR OCAST/SLOW R1@12:45pm 1 2 3 4 5 6 7 8 SR MORUYA FINE/GOOD R1@12:25pm 1 2 3 4 5 6 7 8 VR BENALLA OCAST/SLOW R1@12:35pm 1 2 3 4 5 6 7 8 XR KALGOORLIE FINE/GOOD R1@ 3:00pm 1 2 3 4 5 6 7 HARNESS MEETINGS DT LAUNCESTON SHWRY/GOOD R1@ 4:57pm 1 2 3 4 5 6 7 8 9 10 MT CRANBOURNE OCAST/GOOD R1@ 5:05pm 1 2 3 4 5 6 7 8 GREYHOUND MEETINGS AD GAWLER OCAST/GOOD R1@ 5:10pm 1 2 3 4 5 6 7 8 9 10 11 CD CANBERRA OCAST/GOOD R1@ 5:02pm 1 2 3 4 5 6 7 8 9 10 11 MD SALE FINE/GOOD R1@ 4:54pm 1 2 3 4 5 6 7 8 9 10 11 12

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...