I recently was tasked with a rather complex project involving scraping hundreds of thousands HTML documents. Normally scraping is quite easy. I have a lot of experience with it and just use the wonderful Simple HTML DOM library . Simple HTML DOM has some issues though. It chokes on large HTML documents. And when running... Read more »
↧