Articles

  • Escaping square brackets in Python's glob

    Widely supported in both Unix and Windows shells, globbing is the magic that expands path specifications (e.g. ls readme*.txt) into a list of files that matches the specification. Besides the wildcard, there are some lesser known characters that act as globbing operators as well. That includes the square brackets, which mean match any of the characters enclosed once. Unfortunately, square brackets are also legal characters for filenames, and certain files often use square brackets in their names by convention. Some glob implementations, Bash and Powershell for example, have escape characters that allow matching the square brackets literally. Others however, do not and when you intend to match square brackets literally, they get interpreted as globbing operators and hence yield incorrect results.

    For these situations, there is a trick to escape globbing operators - enclose the operator you want matched explicitly inside square brackets, like this -

    ls project[[]X[]].*.pdf

    This will match all files starting with project[X]. and ending with .pdf. By enclosing each bracket in square brackets, it tells the glob implementation to match any of the characters inside literally, once.

    Python is one of those implementations that requires this trick (internally, Python converts the glob pattern into a regular expression - see the translate method in python/lib/fnmatch.py - but the converter does not have the ability to handle escape characters).

    Here is some Python code that adds this trick to all square brackets in the given glob_pattern,  causing glob to match all square brackets literally -

    import glob
    import re
    
    # given the following glob_pattern
    glob_pattern = 'project[X].*.pdf'
    
    # replace the left square bracket with [[]
    glob_pattern = re.sub(r'\[', '[[]', glob_pattern)
    # replace the right square bracket with []] but be careful not to replace
    # the right square brackets in the left square bracket's 'escape' sequence.
    glob_pattern = re.sub(r'(?<!\[)\]', '[]]', glob_pattern)
    
    files = glob.glob(glob_pattern)
     

Comments

1.

Line 5 isn't right - glob_pattern isn't defined yet so you can't re.sub against it quite yet. Just wanted to give ya a heads up that you might want to change the var name there.

Posted by JFray, 20th March 2013 7:12 PM
 
2.

@JFray - I was assuming the glob pattern to escape was already defined as glob_pattern; should've made that clearer. I've updated the post; thanks.

Posted by [edgylogic] sam, 23rd March 2013 4:37 PM
 
3.

you could do it in one re.sub line. it surrounds any found [ or ] with []. in the find and replace string, the [, ] and \1 needed to be escaped

glob_pattern = re.sub(r'([\[\]])', '[\\1]',glob_pattern)

Posted by dev0, 3rd October 2013 9:58 AM
 

Leave a comment