lesscode.org


scanner

A rudamentary Python source code scanner.

A regular expression based Python source code scanner. The scan function runs through a file line by line and collects bits of information not provided by object introspection.

Synopsis

Consider the following source file (example.py):

>>> foo = 'FOO'
>>> def bar_function():
...     print foo
>>> class Bling:
...     foo = 'Bling's FOO'
...     def bar_method(self):
...         print self.foo

The scan function returns a Token instance representing the file that was scanned:

>>> import pudge.scanner as scanner
>>> file_tok = scanner.scan('example.py')
>>> (file_tok.type, file_tok.name)
('file', 'example.py')

Traverse the token tree using Token.find :

>>> tok = file_tok.find('foo')
>>> (tok.type, tok.name, tok.line, tok.last_line)
('=', 'foo', 1, 2)

Line numbers are one piece of information not available via introspection.

You can traverse multiple levels of depth using dot notation:

>>> tok = file_tok.find('Bling.bar_method')
>>> (tok.type, tok.name, tok.line, tok.last_line)
('def', 'bar_method', 6, 8)

Token instances can be treated like dictionaries for syntactic pleasure; this is just like calling find:

>>> tok = file_tok['Bling']
>>> (tok.type, tok.name, tok.children)
('class', 'Bling', [<Token('=', 'foo')>, <Token('def', 'bar_method')>])

Note also that the Token.children attribute contains a list containing the immediately children of the token. This provides a source level order of tokens which is not available via introspection.


Attributes

a __url__

'$URL: svn://lesscode.org/pudge/trunk/pudge/scanner.py $'

a token_patterns

[('def', <_sre.SRE_Pattern object at 0x13a3c58>),
 ('class', <_sre.SRE_Pattern object at 0x135a8a8>),
 ('=', <_sre.SRE_Pattern object at 0x135a950>)]

Functions

f empty_cache() ...

Empties the filename -> Token cache.

It isn't a bad idea to do this every once in a while if you use the cache argument to scan so that the garbage collector can free up the objects.

f scan(filename, file=None, cache=0) ...

Scan a file and return collected bits

filename is the name of the file to scan. If file is specified, it is a file object that responds to the readline method. When a truthful cache argument is provided, this method memoizes the result based on the filename argument. The cache is not thread safe.

A single Token object is returned that represents the root of the tree. The Token 's type will be 'file'

Classes

C Token(...) ...

A Python syntax token.

This class provides access to information about a named python object. Token objects are arranged into a hierarchy that should look exactly like the introspection object hierarchy.

Token objects have six important attributes:

  • type - The token's type. This will be one of the following string values:
    • 'file' - The token is a root file token. The name attribute contains the name of the file.
    • 'def' - The token describes a function or method.
    • 'class' - The token describes a class.
    • '=' - The token describes an attribute
  • name - The name of the file, class, function, or attribute
  • indent - The indent level as an integer starting at 0.
  • line - The line number that the token appears on.
  • last_line - The line at which the token is no longer 'on the stack'
  • children - list of child tokens.

This class contains 6 members.

See the source for more information.