This fantastic engine should be used to fetch the URLs of items from any HTML webpages.
Engine settings:
- Container URLs: area to put the URL of HTML webpage which you want to grab the items from.
- Container Area: area to put The XPath queries to the area where the items locate. Should try to input the correct queries!
- Example find all links where parent has class uk-article-title :
//*[@class="uk-article-title"]/a
- Example find all links where parent has class uk-article-title :
- Item Format: area to put the general format of item's links. For example: http://www.domain.com/(*)/(*)/(*).html!
- Limit: option to limit the number of fetched items.
- Decode: option to decode the title into utf8.
- Format Page: option for trying fetching items from other next pages.
- Absolute Host: option to put the missing domain if the links of items are not absolute.
- List Elements: option to only pick up the list of specific items base on their ordering.
- Force Title: option to even fetch links which do not have title.
Output fields of each fetched item:
- title
- link
- title_attribute
- alt_attribute
The results from this engine probably useless without the help from Get Fulltext processor and others.
For example: you can use Get Fulltext processor to get full-text from a link / webpage OR using Slug processor to create slug from the title.