Here are some tips to help you optimize the usage:
- Keep the timeout as low as possible (least is 1 second). Higher timeout might impact your page processing time if you are dealing with content on slow servers.
- Strongly recommended to use a Persistent Cache Plugin and enable Disk or Memory based Object Cache for better caching performance. There can be some serious issues if you are scraping quite a lot and not using a persistent cache plugin.
- If you are not using a Persistent Cache Plugin, then the underlying Transients API will fallback on WordPress options table (wp_options) to store cache. This might lead to issues detailed in this thread. To avoid such issues, either use a Persistent Cache Plugin or delete expired transients occasionally.
- If you are scraping a lot, keep a watch on your cache size too. Clear/Flush cache occasionally.
- If you plan use multiple scrapers in a single page, make sure you set the cache timeout to a larger period. Possibly as long as a day (i.e. 1440 minutes) or even more. This will cache content on your server and reduce scraping.
- Use fast loading pages (URL sources) as your content source. Also prefer pages low in size to optimize performance.
- Keep a close watch on your scraper. If the website changes its page layout, your selector may fail to fetch the right content.