KItinerary::ScriptExtractor

Search for usage in LXR

KItinerary::ScriptExtractor Class Reference

#include <scriptextractor.h>

Inheritance diagram for KItinerary::ScriptExtractor:

Public Member Functions

bool canHandle (const ExtractorDocumentNode &node) const override
 
ExtractorResult extract (const ExtractorDocumentNode &node, const ExtractorEngine *engine) const override
 
const std::vector< ExtractorFilter > & filters () const
 
QString mimeType () const
 
QString name () const override
 
QString scriptFileName () const
 
QString scriptFunction () const
 
- Public Member Functions inherited from KItinerary::AbstractExtractor

Detailed Description

A single unstructured data extraction rule set.

These rules are loaded from JSON meta-data files in a compiled-in qrc file, or from $XDG_DATA_DIRS/kitinerary/extractors.

Meta Data Format

The meta-data files either contain a single JSON object or an array of JSON objects with the following content:

  • mimeType: The MIME type of the extractor, text if not specified.
  • filter: An array of filters that are used to select this extractor for a given input file.
  • script: A JavaScript file to execute.
  • function: The entry point in the above mentioned script, main if not specified.

The following extractor types are supported:

  • text/plain: plain text, the argument to the script function is a single string.
  • text/html: HTML documents, the argument to the script function is a KItinerary::HtmlDocument instance.
  • application/pdf: PDF documents, the argument to the script function is a KItinerary::PdfDocument instance.
  • application/vnd.apple.pkpass: Apple Wallet passes, the argument to the script function is a KPkPass::Pass instance.
  • internal/event: iCalendar events, the argument to the script function is a KCalendarCore::Event instance.

Filter definitions have the following field:

  • mimeType: The MIME type of the document part this filter can match against.
  • field: The name of the field to match against. This can be a field id in a Apple Wallet pass, A MIME message header name, a property on a Json-LD object or an iCal calendar or event. For plain text or binary content, this is ignored.
  • match: A regular expression that is matched against the specified value (see QRegularExpression).
  • scope: Specifies how the filter should be applied relative to the document node that is being extracted. One of Current, Parent, Children, Ancestors, Descendants (Current is the default).

Example:

[
{
"mimeType": "application/pdf",
"filter": [ { "field": "From", "match": "@swiss.com", "mimeType": "message/rfc822", "scope": "Ancestors" } ],
"script": "swiss.js",
"function": "parsePdf"
},
{
"mimeType": "application/vnd.apple.pkpass",
"filter": [ { "field": "passTypeIdentifier", "match": "pass.booking.swiss.com", "mimeType": "application/vnd.apple.pkpass", "scope": "Current" } ],
"script": "swiss.js",
"function": "parsePkPass"
}
]

Development

For development it's convenient to symlink the extractors source folder to $XDG_DATA_DIRS/kitinerary/extractors, so you can re-run a changed extractor script without recompiling or restarting the application.

Definition at line 76 of file scriptextractor.h.

Constructor & Destructor Documentation

◆ ScriptExtractor()

ScriptExtractor::ScriptExtractor ( )
explicit

Definition at line 37 of file scriptextractor.cpp.

Member Function Documentation

◆ canHandle()

bool ScriptExtractor::canHandle ( const ExtractorDocumentNode & node) const
overridevirtual

Fast check whether this extractor is applicable for node.

Implements KItinerary::AbstractExtractor.

Definition at line 159 of file scriptextractor.cpp.

◆ extract()

ExtractorResult ScriptExtractor::extract ( const ExtractorDocumentNode & node,
const ExtractorEngine * engine ) const
overridevirtual

Extract data from node.

Implements KItinerary::AbstractExtractor.

Definition at line 175 of file scriptextractor.cpp.

◆ filters()

const std::vector< ExtractorFilter > & ScriptExtractor::filters ( ) const

Returns the filters deciding whether this extractor should be applied.

Definition at line 144 of file scriptextractor.cpp.

◆ mimeType()

QString ScriptExtractor::mimeType ( ) const

Mime type this script extractor supports.

Definition at line 109 of file scriptextractor.cpp.

◆ name()

QString ScriptExtractor::name ( ) const
overridevirtual

Identifier for this extractor.

Mainly used for diagnostics and tooling.

Implements KItinerary::AbstractExtractor.

Definition at line 104 of file scriptextractor.cpp.

◆ scriptFileName()

QString ScriptExtractor::scriptFileName ( ) const

The JS script containing the code of the extractor.

Definition at line 119 of file scriptextractor.cpp.

◆ scriptFunction()

QString ScriptExtractor::scriptFunction ( ) const

The JS function entry point for this extractor, main if empty.

Definition at line 129 of file scriptextractor.cpp.


The documentation for this class was generated from the following files:
This file is part of the KDE documentation.
Documentation copyright © 1996-2024 The KDE developers.
Generated on Fri Nov 22 2024 12:00:34 by doxygen 1.12.0 written by Dimitri van Heesch, © 1997-2006

KDE's Doxygen guidelines are available online.