This fantastic engine should be used to fetch the URLs of items from any HTML webpages.
Engine settings:
- Container URLs: area to put the URL of HTML webpage which you want to grab the items from.
- Note: you can add multiple links
- Container Area Extraction Method: option let you choose method to find and extract links from page
- Container Area: area to put The XPath or CSS selectors queries to the area where the items locate. Should try to input the correct queries!
- Example find all links where parent has class uk-article-title :
- Using XPATH:
//*[@class="uk-article-title"]/a
- Using CSS Selector:
a.uk-article-title
- Using XPATH:
- Note: you can add multiple links
- Example find all links where parent has class uk-article-title :
- Item Format: area to put the general format of item's links. For example: http://www.domain.com/(*)/(*)/(*).html!
- Limit: option to limit the number of fetched items.
- Decode: option to decode the title into utf8.
- Format Page: option for trying fetching items from other next pages.
- Absolute Host: option to put the missing domain if the links of items are not absolute.
- List Elements: option to only pick up the list of specific items base on their ordering.
- Force Title: option to even fetch links which do not have title.
- Get Title From Link: option that let you extract title from link URL if not found
- Note: usefull if your links doesn't have titles
Output fields of each fetched item:
- title
- link
- title_attribute
- alt_attribute
The results from this engine probably useless without the help from Get Fulltext processor and others.
For example: you can use Get Fulltext processor to get full-text from a link / webpage OR using Slug processor to create slug from the title.