Package core :: Package data :: Package parsers :: Module sgmlParser :: Class sgmlParser
[hide private]
[frames] | no frames]

Class sgmlParser




This class is a SGML document parser.


Author: Andres Riancho ( andres.riancho@gmail.com )

Instance Methods [hide private]
  __init__(self, document, baseUrl, useTidy=True, verbose=0)
  _findMetaRedir(self, tag, attrs)
Find meta tag redirections, like this one: <META HTTP-EQUIV="refresh" content="4;URL=http://www.f00.us/">
  _findReferences(self, tag, attrs)
This method finds references inside a document.
  _parse(self, s)
This method parses the document.
  _parseMetaTags(self, tag, attrs)
This method parses the meta tags and creates a list of tuples with their values.
  getAccounts(self)
  getComments(self)
  getForms(self)
  getMetaRedir(self)
  getMetaTags(self)
  getReferences(self)
Searches for references on a page.
  handle_comment(self, text)
This method is called by parse when a comment is found.
  unknown_starttag(self, tag, attrs)
Called for each start tag attrs is a list of (attr, value) tuples e.g.

Inherited from abstractParser.abstractParser: findAccounts

Inherited from sgmllib.SGMLParser: close, error, feed, finish_endtag, finish_shorttag, finish_starttag, get_starttag_text, goahead, handle_charref, handle_data, handle_decl, handle_endtag, handle_entityref, handle_pi, handle_starttag, parse_endtag, parse_pi, parse_starttag, report_unbalanced, reset, setliteral, setnomoretags, unknown_charref, unknown_endtag, unknown_entityref

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_declaration, parse_marked_section, unknown_decl, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name


Class Variables [hide private]

Inherited from sgmllib.SGMLParser: entitydefs

Inherited from sgmllib.SGMLParser (private): _decl_otherchars


Method Details [hide private]

__init__(self, document, baseUrl, useTidy=True, verbose=0)
(Constructor)

 
None
Overrides: abstractParser.abstractParser.__init__

_findMetaRedir(self, tag, attrs)

 
Find meta tag redirections, like this one: <META HTTP-EQUIV="refresh" content="4;URL=http://www.f00.us/">

_findReferences(self, tag, attrs)

 
This method finds references inside a document.

_parse(self, s)

 
This method parses the document.
Parameters:
  • s - The document to parse.

_parseMetaTags(self, tag, attrs)

 
This method parses the meta tags and creates a list of tuples with their values. The only exception made here is for the meta redirections, that are handled with "_findMetaRedir".

getAccounts(self)

 
None
Overrides: abstractParser.abstractParser.getAccounts

getComments(self)

 
Returns:
Returns list of comment strings.
Overrides: abstractParser.abstractParser.getComments

getForms(self)

 
Returns:
Returns list of forms.
Overrides: abstractParser.abstractParser.getForms

getMetaRedir(self)

 
Returns:
Returns list of meta redirections.
Overrides: abstractParser.abstractParser.getMetaRedir

getMetaTags(self)

 
Returns:
Returns list of all meta tags.
Overrides: abstractParser.abstractParser.getMetaTags

getReferences(self)

 
Searches for references on a page. w3af searches references in every html tag, including:
  • a
  • forms
  • images
  • frames
  • etc.
Returns:
Returns list of links.
Overrides: abstractParser.abstractParser.getReferences

handle_comment(self, text)

 
This method is called by parse when a comment is found.
Overrides: sgmllib.SGMLParser.handle_comment

unknown_starttag(self, tag, attrs)

 

Called for each start tag attrs is a list of (attr, value) tuples e.g. for <pre class="screen">, tag="pre", attrs=[("class", "screen")]

Note that improperly embedded non-HTML code (like client-side Javascript) may be parsed incorrectly by the ancestor, causing runtime script errors. All non-HTML code must be enclosed in HTML comment tags (<!-- code -->) to ensure that it will pass through this parser unaltered (in handle_comment).
Overrides: sgmllib.SGMLParser.unknown_starttag