Documentation

Find documentation for our Joomla extensions on this page.

How to use HTML Parser processor in JoomGrabber

This processor is a magic one, which is alternative processor for “Get Fulltext” processor in the case “Get Fulltext” doesn’t work for your source.

To be honest, it is a little bit complicated here with our own program language, but we will try to make it clear and easy.

There are several commands: ginner, remove, split, wrap and replace to do magic with HTML source. Each command need to be placed on a new line.

Function: ginner

Get inside content of an HTML tag from input HTML source

Syntax

ginner|{LINE}|{TAG}|{DELIMITER}|{RETURN}|{DEBUG}|
  • {LINE}: the output from {LINE}. Basically, you can put many lines and each line will have an output itself, and we can use output of this line as input of other line. “0” means the original input of the processor, “1” means the output of the line#1.
  • {TAG}: target HTML tag
  • {DELIMITER} a string inside that target tag
  • {RETURN}: the number of part will be returned for the processor, start with 0, L stands for Last part.
  • {DEBUG}: debug mode in the case the {DELIMITER} cannot be found from INPUT HTML source.
    • 0: return "” (empty string) in the case an error occurs.
    • 1: stop immediately in the case an error occurs.
    • 2: return INPUT HTML source.

Example:

ginner|0|div|post|L|1|

Get inner content from input HTML source for the “div” tag with a string “post” inside that div, no matter that string is id, class or any attribute. For example: <html>...<body>...<div class=”post” id=”whatever” what_ever=”attribute”>I want to get this text</div>...</body></html> Will return “I want to get this text” by using above sample.

Function: remove

To remove an HTML tag out our input HTML source

Syntax

remove|{LINE}|{TAG}|{DELIMITER}|

Example

remove|0|div|post|

Remove div tag which has string “post” inside. For example:

<html>...<body>...ABC<div class=”post” id=”whatever” what_ever=”attribute”>I want to get this text</div>XYZ...</body></html>

Will return

<html>...<body>...ABCXYZ...</body></html>

Which is the input HTML source without the div tag with string “post” inside.

Function: split

To split/seperate HTML source to many parts base on a delimiter. This function is pretty similar to explode function in PHP (if you know PHP program language).

Syntax

split|{LINE}|{DELIMITER}|{RETURN}|{DEBUG}|
  • {LINE}: the output from {LINE}. Basically, you can put many lines and each line will have an output itself, and we can use output of this line as input of other line. “0” means the original input of the processor, “1” means the output of the line#1.
  • {DELIMITER}: HTML or Text to delimiter the INPUT HTML source.
  • {RETURN}: the number of part will be returned for the processor, start with 0, L stands for Last part.
  • {DEBUG}: debug mode in the case the {DELIMITER} cannot be found from INPUT HTML source.
    • 0: return "” (empty string) in the case an error occurs.
    • 1: stop immediately in the case an error occurs.
    • 2: return INPUT HTML source.

Example

Example 1
split|0|<div class="post">|L|1|

Split the INPUT HTML source to many parts by the delimiter <div class=”post”>, it gets the last part, and if nothing found, it will stop immediately and start over with the new item.

Example 2
split|2|<p class="paragraph">|1|2

Split the output from line#2 by the delimiter <p class=”paragraph”>, it gets the first part, and if nothing found, it will return the line-itself input.

Function: wrap

wrap/combine one or many parts (which returned by other lines) by a new HTML format.

Syntax

wrap|{INPUT_LINE1,INPUT_LINE2,...}|{WRAP_HTML}|
  • {INPUT_LINE1,INPUT_LINE2,...}: input lines variables to be wrapped.
  • {WRAP_HTML}: there are variables in {WRAP_HTML}
    • {ogb-0} understands for the first line-parameter in INPUT_LINE1, this will be replaced by the output value of INPUT_LINE1.
    • {ogb-1} understands for the first line-parameter in INPUT_LINE2, this will be replaced by the output value of INPUT_LINE2.

Example

wrap|3,5|<div class="content">{ogb-0}<hr />{ogb-1}|

Combine line#3 and line#5 into the new formated HTML source, the first line parameter (line#3) will be replaced for {obg-0}, the second line parameter (line#5) will be replaced for {obg-1}.

Function: replace

replaces an INPUT_SOURCE by a new one.

Syntax

replace|{INPUT_LINE}|{SEARCH}|{REPLACE}|
  • {INPUT_LINE}: get input from other line output.
  • {SEARCH}: search this string.
  • {REPLACE}: and replace by this string.

Example

replace|5|<div class="abc"|<div class="xyz" |

Find <div class=”abc” from line#5 output, replace it by <div class=”xyz” 

We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.

Ok