Saturday, 27 February 2016

HTML Scraping in Php




I've been doing some HTML scraping in PHP using regular expressions. This works, but the result is finicky and fragile. Has anyone used any packages that provide a more robust solution? A config driven solution would be ideal, but I'm not picky.


Answer



I would recomend PHP Simple HTML DOM Parser after you have scraped the HTML from the page. It supports invalid HTML, and provides a very easy way to handle HTML elements.



No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...