Friday, 19 May 2017

PHP Parse HTML code










How can I parse HTML code held in a PHP variable if it something like:




T1

Lorem ipsum.

T2

The quick red fox...

T3

... jumps over the lazy brown FROG!


I want to only get the text that's between the headings and I understand that it's not a good idea to use Regular Expressions.


Answer



Use PHP Document Object Model:



   $str = '

T1

Lorem ipsum.

T2

The quick red fox...

T3

... jumps over the lazy brown FROG';
$DOM = new DOMDocument;

$DOM->loadHTML($str);

//get all H1
$items = $DOM->getElementsByTagName('h1');

//display all H1 text
for ($i = 0; $i < $items->length; $i++)
echo $items->item($i)->nodeValue . "
";
?>



This outputs as:



 T1
T2
T3






[EDIT]: After OP Clarification:



If you want the content like Lorem ipsum. etc, you can directly use this regex:



   $str = '

T1

Lorem ipsum.

T2

The quick red fox...

T3

... jumps over the lazy brown FROG';
echo preg_replace("#.*?#", "", $str);
?>



this outputs:




Lorem ipsum.The quick red fox...... jumps over the lazy brown FROG



No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...