Package core :: Package data :: Package parsers :: Module pdfParser :: Class pdfParser
[hide private]

Class pdfParser




This class parses pdf documents to find mails and URLs. It's based in the pyPdf library.


Author: Andres Riancho ( andres.riancho@gmail.com )

Instance Methods [hide private]
 
__init__(self, httpResponse)
 
_preParse(self, document)
 
_parse(self, contentText)
 
getPDFContent(self, documentString)
 
getReferences(self)
Returns: A list of URL strings.
 
_returnEmptyList(self)
This method is called (see below) when the caller invokes one of:
 
getMetaTags(self)
This method is called (see below) when the caller invokes one of:
 
getMetaRedir(self)
This method is called (see below) when the caller invokes one of:
 
getComments(self)
This method is called (see below) when the caller invokes one of:
 
getForms(self)
This method is called (see below) when the caller invokes one of:
 
getReferencesOfTag(self)
This method is called (see below) when the caller invokes one of:

Inherited from abstractParser.abstractParser: findEmails, getEmails, getScripts

Inherited from abstractParser.abstractParser (private): _decodeString

Method Details [hide private]

__init__(self, httpResponse)
(Constructor)

 
Overrides: abstractParser.abstractParser.__init__

getReferences(self)

 
Returns:
A list of URL strings.
Overrides: abstractParser.abstractParser.getReferences
(inherited documentation)

_returnEmptyList(self)

 
This method is called (see below) when the caller invokes one of:
  • getForms
  • getComments
  • getMetaRedir
  • getMetaTags
  • getReferencesOfTag
Returns:
Because we are a PDF document, we don't have the same things that a nice HTML document has, so we simply return an empty list.

getMetaTags(self)

 
This method is called (see below) when the caller invokes one of:
  • getForms
  • getComments
  • getMetaRedir
  • getMetaTags
  • getReferencesOfTag
Returns:
Because we are a PDF document, we don't have the same things that a nice HTML document has, so we simply return an empty list.
Overrides: abstractParser.abstractParser.getMetaTags

getMetaRedir(self)

 
This method is called (see below) when the caller invokes one of:
  • getForms
  • getComments
  • getMetaRedir
  • getMetaTags
  • getReferencesOfTag
Returns:
Because we are a PDF document, we don't have the same things that a nice HTML document has, so we simply return an empty list.
Overrides: abstractParser.abstractParser.getMetaRedir

getComments(self)

 
This method is called (see below) when the caller invokes one of:
  • getForms
  • getComments
  • getMetaRedir
  • getMetaTags
  • getReferencesOfTag
Returns:
Because we are a PDF document, we don't have the same things that a nice HTML document has, so we simply return an empty list.
Overrides: abstractParser.abstractParser.getComments

getForms(self)

 
This method is called (see below) when the caller invokes one of:
  • getForms
  • getComments
  • getMetaRedir
  • getMetaTags
  • getReferencesOfTag
Returns:
Because we are a PDF document, we don't have the same things that a nice HTML document has, so we simply return an empty list.
Overrides: abstractParser.abstractParser.getForms

getReferencesOfTag(self)

 
This method is called (see below) when the caller invokes one of:
  • getForms
  • getComments
  • getMetaRedir
  • getMetaTags
  • getReferencesOfTag
Returns:
Because we are a PDF document, we don't have the same things that a nice HTML document has, so we simply return an empty list.