This article specifically details usage of query which are the heart of WP Web Scraper. For parsing html, the plugin three types of queries- CSS Selectors; XPath and Regular Expression. Selectors are not only used by WP Web Scraper to query data from source URL, but also to remove or replace stuff
For all scraping that deals with DOM documents (XML, HTML etc) CSS Selectors and XPaths can support all possible use cases. Regular Expression is provided as a query option for extreme edge cases or non-DOM content.
CSS selectors are patterns used to select the element(s) you want to style. CSS selectors are less powerful than XPath, but far easier to write, read and understand.
The CSS Selector Reference on w3schools is recommended to get you started. You may also want to try the CSS Selector Tester from w3schools.
Internally, WP Web Scraper converts the CSS Selector into an XPath expressions using Symfony’s CssSelector Component.
The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as “an XPath”.
When you’re parsing an HTML or an XML document, by far the most powerful method is XPath. XPath expressions are incredibly flexible, so there is almost always an XPath expression that will find the element you need.
A regular expression (abbreviated regex or regexp) and sometimes called a rational expression is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. Each character in a regular expression is either understood to be a metacharacter with its special meaning, or a regular character with its literal meaning.
Need Help with Queries?
Crafting the right and optimized query can be a bit tricky at times. Try the paid support for crafting a perfectly optimized web scrape