Package core :: Package data :: Package parsers :: Module htmlParser :: Class htmlParser
[hide private]
[frames] | no frames]

Class htmlParser




This class parses HTML's.


Author: Andres Riancho ( andres.riancho@gmail.com )

Instance Methods [hide private]
  __init__(self, document, baseUrl, useTidy=True, verbose=0)
  _preParse(self, HTMLDocument)
  unknown_endtag(self, tag)
called for each end tag, e.g.
  _findForms(self, tag, attrs)
This method finds forms inside an HTML document.

Inherited from sgmlParser.sgmlParser: getAccounts, getComments, getForms, getMetaRedir, getMetaTags, getReferences, handle_comment, unknown_starttag

Inherited from abstractParser.abstractParser: findAccounts

Inherited from sgmllib.SGMLParser: close, error, feed, finish_endtag, finish_shorttag, finish_starttag, get_starttag_text, goahead, handle_charref, handle_data, handle_decl, handle_endtag, handle_entityref, handle_pi, handle_starttag, parse_endtag, parse_pi, parse_starttag, report_unbalanced, reset, setliteral, setnomoretags, unknown_charref, unknown_entityref

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_declaration, parse_marked_section, unknown_decl, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name


Class Variables [hide private]

Inherited from sgmllib.SGMLParser: entitydefs

Inherited from sgmllib.SGMLParser (private): _decl_otherchars


Method Details [hide private]

__init__(self, document, baseUrl, useTidy=True, verbose=0)
(Constructor)

 
None
Overrides: sgmlParser.sgmlParser.__init__

_preParse(self, HTMLDocument)

 
None

unknown_endtag(self, tag)

 
called for each end tag, e.g. for </pre>, tag will be "pre"
Overrides: sgmllib.SGMLParser.unknown_endtag

_findForms(self, tag, attrs)

 
This method finds forms inside an HTML document.