How to optimize performance?

Here are some tips to help you optimize the usage:

  1. Keep the timeout as low as possible (least is 1 second). Higher timeout might impact your page processing time if you are dealing with content on slow servers.
  2. Strongly recommended to use a Persistent Cache Plugin and enable Disk or Memory based Object Cache for better caching performance. There can be some serious issues if you are scraping quite a lot and not using a persistent cache plugin.
  3. If you are not using a Persistent Cache Plugin, then the underlying Transients API will fallback on WordPress options table (wp_options) to store cache. This might lead to issues detailed in this thread. To avoid such issues, either use a Persistent Cache Plugin or delete expired transients occasionally.
  4. If you are scraping a lot, keep a watch on your cache size too. Clear/Flush cache occasionally.
  5. If you plan use multiple scrapers in a single page, make sure you set the cache timeout to a larger period. Possibly as long as a day (i.e. 1440 minutes) or even more. This will cache content on your server and reduce scraping.
  6. Use fast loading pages (URL sources) as your content source. Also prefer pages low in size to optimize performance.
  7. Keep a close watch on your scraper. If the website changes its page layout, your selector may fail to fetch the right content.

Leave a Reply