Blog

Saturday, 17 September 2016

php - Scraping HTML with JavaScript postbacks



I'm trying to scrape some HTML (with permission from the author). I was using the PHP library suggested here, and it was working well until I encountered a link that looks like this:







Which I believe is some asp.net thing. When I click it, it doesn't change the URL, it just loads some new content into the page, which I'd also like to scrape.



How can I get around this?



I suppose I would need to simulate the click, but I can't do that when processing raw HTML, I'd need some kind of browser/JS interpreter, no?



Is there a better suited library for this task? I'm not limited to PHP, but it's preferred.


Answer



I ended up using Python with Selenium Firefox web driver. Since I'm using a real browser, I can do everything FF can.


- September 17, 2016
Share

No comments:

Post a Comment

‹
›
Home
View web version
Powered by Blogger.