Package core :: Package data :: Package parsers :: Module urlParser
[hide private]

Module urlParser



urlParser.py

Copyright 2006 Andres Riancho

This file is part of w3af, w3af.sourceforge.net .

w3af is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation version 2 of the License.

w3af is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with w3af; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Functions [hide private]
 
hasQueryString(uri)
Analizes the uri to check for a query string.
 
getQueryString(url, ignoreExceptions=True)
Parses the query string and returns a dict.
 
uri2url(url)
Returns: Returns a string contaning the URL without the query string.
 
removeFragment(url)
Returns: Returns a string contaning the URL without the fragment.
 
baseUrl(url)
Returns: Returns a string contaning the URL without the query string and without any path.
 
normalizeURL(url)
This method was added to be able to avoid some issuess which are generated by the different way browsers and urlparser.urljoin join the URLs.
 
urlJoin(baseurl, relative)
Construct a full (``absolute'') URL by combining a ``base URL'' (base) with a ``relative URL'' (url).
 
getDomain(url)
Input: http://localhost:4444/f00_bar.html Output: localhost
 
getNetLocation(url)
Input: http://localhost:4444/f00_bar.html Output: localhost:4444
 
getProtocol(url)
Returns: Returns the domain name for the url.
 
getRootDomain(input)
Get the root domain name.
 
getDomainPath(url)
Returns: Returns the domain name and the path for the url.
 
getFileName(url)
Returns: Returns the filename name for the given url.
 
getExtension(url)
Returns: Returns the extension of the filename, if possible, else, ''.
 
allButScheme(url)
Returns: Returns the domain name and the path for the url.
 
getPath(url)
@parameter url: The url to parse.
 
getPathQs(url)
 
urlDecode(url)
UrlDecode the url.
 
getDirectories(url)
Get a list of all directories and subdirectories.
Function Details [hide private]

hasQueryString(uri)

 
Analizes the uri to check for a query string.
Parameters:
  • uri - The uri to analize.
Returns:
True if the URI has a query string.

getQueryString(url, ignoreExceptions=True)

 
Parses the query string and returns a dict.
Parameters:
  • url - The url with the query string to parse.
Returns:
A QueryString Object, example :
  • input url : http://localhost/foo.asp?xx=yy&bb=dd
  • output dict : { xx:yy , bb:dd }

uri2url(url)

 
Parameters:
  • url - The url with the query string.
Returns:
Returns a string contaning the URL without the query string. Example :
  • input url : http://localhost/foo.asp?xx=yy&bb=dd#fragment
  • output url string : http://localhost/foo.asp

removeFragment(url)

 
Parameters:
  • url - The url with fragments
Returns:
Returns a string contaning the URL without the fragment. Example :
  • input url : http://localhost/foo.asp?xx=yy&bb=dd#fragment
  • output url string : http://localhost/foo.asp?xx=yy&bb=dd

baseUrl(url)

 
Parameters:
  • url - The url with the query string.
Returns:
Returns a string contaning the URL without the query string and without any path. Example :
  • input url : http://localhost/dir1/foo.asp?xx=yy&bb=dd
  • output url string : http://localhost/

normalizeURL(url)

 

This method was added to be able to avoid some issuess which are generated by the different way browsers and urlparser.urljoin
join the URLs. A clear example of this is the following case:
    baseURL = 'http:/abc/'
    relativeURL = '/../f00.b4r'
    
w3af would try to GET http:/abc/../f00.b4r ; while mozilla would try to get http:/abc/f00.b4r . In some cases, the first is ok, on other
cases the first one doesn't even work and return a 403 error message.

So, to sum up, this method takes an URL, and returns a normalized URL. For the example we were talking before,
it will return: 'http://abc/f00.b4r' instead of the normal response from urlparser.urljoin: 'http://abc/../f00.b4r'

urlJoin(baseurl, relative)

 

Construct a full (``absolute'') URL by combining a ``base URL'' (base) with a ``relative URL'' (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL.

Example: urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') yields the string 'http://www.cwi.nl/%7Eguido/FAQ.html'
Parameters:
  • baseurl - The base url to join
  • relative - The relative url to add to the base url

getDomain(url)

 
Input: http://localhost:4444/f00_bar.html Output: localhost
Parameters:
  • url - The url to parse.
Returns:
Returns the domain name for the url.

getNetLocation(url)

 
Input: http://localhost:4444/f00_bar.html Output: localhost:4444
Parameters:
  • url - The url to parse.
Returns:
Returns the net location for the url.

getProtocol(url)

 
Parameters:
  • url - The url to parse.
Returns:
Returns the domain name for the url.

getRootDomain(input)

 

Get the root domain name. Examples:

input: www.ciudad.com.ar output: ciudad.com.ar

input: i.love.myself.ru output: myself.ru

Code taken from: http://getoutfoxed.com/node/41

getDomainPath(url)

 
Parameters:
  • url - The url to parse.
Returns:
Returns the domain name and the path for the url.
>>> getDomainPath('http://localhost/')
'http://localhost/'
>>> getDomainPath('http://localhost/abc/')
'http://localhost/abc/'
>>> getDomainPath('http://localhost/abc/def.html')
'http://localhost/abc/'
>>>

getFileName(url)

 
Parameters:
  • url - The url to parse.
Returns:
Returns the filename name for the given url.
>>> getFileName('http://localhost/')
''
>>> getFileName('http://localhost/abc')
'abc'
>>> getFileName('http://localhost/abc.html')
'abc.html'
>>> getFileName('http://localhost/def/abc.html')
'abc.html'

getExtension(url)

 
Parameters:
  • url - The url to parse.
Returns:
Returns the extension of the filename, if possible, else, ''.

allButScheme(url)

 
Parameters:
  • url - The url to parse.
Returns:
Returns the domain name and the path for the url.

getPath(url)

 

@parameter url: The url to parse.
@return: Returns the path for the url:
    Input:
        http://localhost/pepe/0a0a
    Output:
        /pepe/0a0a

getPathQs(url)

 
>>> urlParser.getPathQs( 'http://localhost/a/b/c/hh.html' )
>>> '/a/b/c/hh.html'
Parameters:
  • url - The url to parse.
Returns:
Returns the domain name and the path for the url.

getDirectories(url)

 
Get a list of all directories and subdirectories. Example:
  • url = 'http://www.o.com/a/b/c/'
  • return: ['http://www.o.com/a/b/c/','http://www.o.com/a/b/','http://www.o.com/a/','http://www.o.com/']