Documentation

Find documentation for our Joomla extensions on this page.

Article Extractor Processor for JoomGrabber

The Article Extractor processor is an enhanced content extraction tool for JoomGrabber component that uses PHP-Readability library as its primary extraction engine, with optional AI refinement and content spinning capabilities.

Features

  • PHP-Readability Integration: Primary content extraction using Readability library
  • AI Refinement: Optional AI processing to clean and enhance extracted content
  • Content Spinning: Integrated content rewriting for unique article generation
  • Multiple AI Providers: Supports OpenAI (GPT) and Google AI Studio (Gemini)
  • HTML Structure Preservation: Maintains article formatting, images, and links
  • Smart Fallback System: Graceful degradation when AI services are unavailable
  • Comprehensive Error Handling: Detailed error messages and fallback mechanisms

Configuration Options

Extraction Settings

ParameterTypeDefaultDescription
Use AI Refinement Radio Yes Enable/disable AI content refinement
AI Service Select Google Choose between OpenAI or Google AI Studio
Spin Content Radio No Enable content rewriting (requires AI)
Spinning Intensity Select Moderate Control the amount of content rewriting

Spinning Intensity Options:

  • Light: 20-30% changes
  • Moderate: 50-60% changes
  • Heavy: 80-90% changes

API Settings

ParameterTypeDefaultDescription
OpenAI API Key Text Empty API key for OpenAI services
Google AI API Key Text Empty API key for Google AI Studio
OpenAI Token Limit Text 100000 Maximum characters to send to OpenAI
Google Token Limit Text 60000 Maximum characters to send to Google AI

Input/Output Fields

Input Fields

  • url: URL of the page to extract content from
  • html: Raw HTML content (alternative to URL)

Output Fields

  • extracted_article: Extracted HTML content
  • title: Article title
  • summary: Brief summary (max 3 sentences)
  • stop: Error handling object with state and msg properties

Processing Workflow

  1. Content Extraction

    • Fetches content from URL if provided
    • Uses PHP-Readability for primary extraction
    • Extracts title, content, and generates summary
  2. AI Refinement (Optional)

    • Sends content to selected AI service
    • Removes non-article elements (ads, navigation)
    • Preserves HTML structure and important elements
  3. Content Spinning (Optional)

    • Integrated with AI refinement step
    • Rewrites content based on intensity setting
    • Maintains original meaning and structure
  4. Error Handling

    • Falls back to Readability-only results if AI fails
    • Returns original content on spinning failure
    • Provides detailed error messages

Debugging

Enable debugging by adding ?pdebug=1 to the URL. This will output:

  • Input data
  • Processor parameters
  • API responses
  • Error messages
  • Processing steps

We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.

Ok