scanner
A rudamentary Python source code scanner.
The scanner module is accessible via the pudge module.
A regular expression based Python source code scanner. The scan function runs through a file line by line and collects bits of information not provided by object introspection.
Synopsis
Consider the following source file (example.py):
>>> foo = 'FOO' >>> def bar_function(): ... print foo >>> class Bling: ... foo = 'Bling's FOO' ... def bar_method(self): ... print self.foo
The scan function returns a Token instance representing the file that was scanned:
>>> import pudge.scanner as scanner
>>> file_tok = scanner.scan('example.py')
>>> (file_tok.type, file_tok.name)
('file', 'example.py')
Traverse the token tree using Token.find :
>>> tok = file_tok.find('foo')
>>> (tok.type, tok.name, tok.line, tok.last_line)
('=', 'foo', 1, 2)
Line numbers are one piece of information not available via introspection.
You can traverse multiple levels of depth using dot notation:
>>> tok = file_tok.find('Bling.bar_method')
>>> (tok.type, tok.name, tok.line, tok.last_line)
('def', 'bar_method', 6, 8)
Token instances can be treated like dictionaries for syntactic pleasure; this is just like calling find:
>>> tok = file_tok['Bling']
>>> (tok.type, tok.name, tok.children)
('class', 'Bling', [<Token('=', 'foo')>, <Token('def', 'bar_method')>])
Note also that the Token.children attribute contains a list containing the immediately children of the token. This provides a source level order of tokens which is not available via introspection.
Attributes
a token_patterns
[('def', <_sre.SRE_Pattern object at 0x13a3c58>),
('class', <_sre.SRE_Pattern object at 0x135a8a8>),
('=', <_sre.SRE_Pattern object at 0x135a950>)]
Functions
f empty_cache() ...
Empties the filename -> Token cache.
It isn't a bad idea to do this every once in a while if you use the cache argument to scan so that the garbage collector can free up the objects.
f scan(filename, file=None, cache=0) ...
Scan a file and return collected bits
filename is the name of the file to scan. If file is specified, it is a file object that responds to the readline method. When a truthful cache argument is provided, this method memoizes the result based on the filename argument. The cache is not thread safe.
A single Token object is returned that represents the root of the tree. The Token 's type will be 'file'
Classes
C Token(...) ...
A Python syntax token.
This class provides access to information about a named python object. Token objects are arranged into a hierarchy that should look exactly like the introspection object hierarchy.
Token objects have six important attributes:
-
type - The token's type.
This will be one of the following string values:
- 'file' - The token is a root file token. The name attribute contains the name of the file.
- 'def' - The token describes a function or method.
- 'class' - The token describes a class.
- '=' - The token describes an attribute
- name - The name of the file, class, function, or attribute
- indent - The indent level as an integer starting at 0.
- line - The line number that the token appears on.
- last_line - The line at which the token is no longer 'on the stack'
- children - list of child tokens.
This class contains 6 members.
See the source for more information.