Parsing info from a table without headers, using PHP, DOM and cUrl

I need to parse data from a table that i scrape from a different website using PHP.
The table looks like this:




























This table is generated by javascript.
In this table the first tr holds all the td which holds the headers. While all the rest of the table rows hold the info that i need to parse.
Now I've been struggling with this for a while and i found an answer on this website which helped me out a little bit, but it reads the table by using the td and th id's while mine table doesn't have an id on it's table rows or td's.
I'm using cURL to get this table HTML from an other website and pass it through and load it into DOM like this:



include_once('/simple_dom/simple_html_dom.php');
//step1
$cSession = curl_init(); 
//step2
$tmpfname = dirname(__FILE__).'/cookie.txt';
curl_setopt($cSession, CURLOPT_COOKIEJAR, $tmpfname);
curl_setopt($cSession, CURLOPT_COOKIEFILE, $tmpfname);
curl_setopt($cSession,CURLOPT_URL,"http://anonymusurlbecauseofprivacyreasons?somegetters");
curl_setopt($cSession,CURLOPT_RETURNTRANSFER,true);

curl_setopt($cSession, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($cSession,CURLOPT_HEADER, false); 
curl_setopt ($cSession, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($cSession, CURLOPT_CAINFO, dirname(__FILE__)."/cacert.pem");
curl_setopt($cSession,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

$result=curl_exec($cSession);
if ($result === FALSE) {  

echo "cURL Error: " . curl_error($ch);  


}  
curl_close($cSession);
// create empty document 
$dom = new DomDocument;
@$dom->loadHtml($result);
$xpath = new DomXPath($dom);


Okay so far, so good.

But now comes the part of code which i can't figure out how to get it working.
To read out the date I copied and edited the code from this thread: (How to parse this table and extract data from it?) but I can't get it working.


// collect data
foreach ($xpath->query('//table[@id="IWGRD"]/tr') as $node) {
$rowData = array();
foreach ($xpath->query('td', $node) as $cell) {
    $rowcleaned = str_replace("\xc2\xa0","", $cell->textContent);
    $rowData[] = $rowcleaned;
}

}
print_r($rowData);


Which gives me the following output:
Array ( [0] => [1] => [2] => 7 - 8 [3] => S0.20 [4] => SPHdeBruin [5] => SWSP17KBOOV13 [6] => MAV1SP09,MAV1SP10 [7] => Bewegingsagogiek )


Which is the correct output for the last row, but i need all the rows.
So the kind of output I would need is all of the rows (I only don't need the top rows)
So like
array[1] = ([0] => Mon [1] => 11-11-2013 [2] => 7 - 8 [3] => S0.20 [4] => SPHdeBruin [5] => SWSP17KBOOV13 [6] => MAV1SP09,MAV1SP10 [7] => Bewegingsagogiek)  



Array[2] = ([0] => Mon [1] => 11-11-2013 [2] => 8 - 9 [3] => S0.20 [4] => name [5] => SWSP17KBOOV13 [6] => MAV1SP09,MAV1SP10 [7] => randomresult)
So i can use the info and put it in variables to pass it on to an app.


Anyone knows how to do this? I've been working on this for hours because i have none experience using cUrl or DOM whatsoever.
Any help is much appreciated! :)

 Answer
 

It seems like you're not collecting every row as you go along...


$tableData = array();

foreach ($xpath->query('//table[@id="IWGRD"]/tr') as $node) {

  $rowData = array();
  foreach ($xpath->query('td', $node) as $cell) {
      $rowcleaned = str_replace("\xc2\xa0","", $cell->textContent);
      $rowData[] = $rowcleaned;
  }
  $tableData[] = $rowData;
}

print_r($tableData);


    





-

April 23, 2016











Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest












No comments:







Post a Comment










Newer Post


Older Post

Home


Subscribe to:
Post Comments (Atom)



c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...









javascript - Create multidimensional array from string
          I want to create an options array from a string. How can i create an array as {     width : 100,     height : 200 } from a string ...





c# - How to fix "The body of 'display(List)' cannot be an iterator
block because 'string' is not an iterator interface type"?
I'm new to Programming. I would like to implement a program with a yield keyword . So That, I have created a new List  and ask the user ...





Gradle cannot find the Android Support Repository - Eclipse Neon,
Gradle 3.5, javafxports
This is my first post.  I have searched extensively for four days through Stackoverflow and other sources for the problem and have yet to fi...















Search This Blog




 Dag 

 Datum 

 Lesuur 


 Lokaal 

 Docent(en) 

 Vak 

 Groep(en) 

 Toelichting 

 Di 


 12-11-2013 

 5 - 6 

 B2.33 

 LKH02 

 SWSP14SLB1V13_SWSP15PRA1V13 

 MAV1SP10  


 SLB major 1 / praktijkleren

Blog

Saturday, 23 April 2016

Parsing info from a table without headers, using PHP, DOM and cUrl

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Blog Archive

Dag	Datum	Lesuur	Lokaal	Docent(en)	Vak	Groep(en)	Toelichting
Di	12-11-2013	5 - 6	B2.33	LKH02	SWSP14SLB1V13_SWSP15PRA1V13	MAV1SP10	SLB major 1 / praktijkleren