Escaping square brackets in Python's glob
Widely supported in both Unix and Windows shells, globbing is the magic that expands path specifications (e.g. ls readme*.txt) into a list of files that matches the specification. Besides the wildcard, there are some lesser known characters that act as globbing operators as well. That includes the square brackets, which mean match any of the characters enclosed once. Unfortunately, square brackets are also legal characters for filenames, and certain files often use square brackets in their names by convention. Some glob implementations, Bash and Powershell for example, have escape characters that allow matching the square brackets literally. Others however, do not and when you intend to match square brackets literally, they get interpreted as globbing operators and hence yield incorrect results.
For these situations, there is a trick to escape globbing operators - enclose the operator you want matched explicitly inside square brackets, like this -
This will match all files starting with project[X]. and ending with .pdf. By enclosing each bracket in square brackets, it tells the glob implementation to match any of the characters inside literally, once.
Python is one of those implementations that requires this trick (internally, Python converts the glob pattern into a regular expression - see the translate method in python/lib/fnmatch.py - but the converter does not have the ability to handle escape characters).
Here is some Python code that adds this trick to all square brackets in the given glob_pattern, causing glob to match all square brackets literally -
import glob import re # given the following glob_pattern glob_pattern = 'project[X].*.pdf' # replace the left square bracket with [ glob_pattern = re.sub(r'\[', '[', glob_pattern) # replace the right square bracket with ] but be careful not to replace # the right square brackets in the left square bracket's 'escape' sequence. glob_pattern = re.sub(r'(?<!\[)\]', ']', glob_pattern) files = glob.glob(glob_pattern)