Articles

  • Escaping square brackets in Python's glob

    Widely supported in both Unix and Windows shells, globbing is the magic that expands path specifications (e.g. ls readme*.txt) into a list of files that matches the specification. Besides the wildcard, there are some lesser known characters that act as globbing operators as well. That includes the square brackets, which mean match any of the characters enclosed once. Unfortunately, square brackets are also legal characters for filenames, and certain files often use square brackets in their names by convention. Some glob implementations, Bash and Powershell for example, have escape characters that allow matching the square brackets literally. Others however, do not and when you intend to match square brackets literally, they get interpreted as globbing operators and hence yield incorrect results.

    For these situations, there is a trick to escape globbing operators - enclose the operator you want matched explicitly inside square brackets, like this -

    ls project[[]X[]].*.pdf

    This will match all files starting with project[X]. and ending with .pdf. By enclosing each bracket in square brackets, it tells the glob implementation to match any of the characters inside literally, once.

    Python is one of those implementations that requires this trick (internally, Python converts the glob pattern into a regular expression - see the translate method in python/lib/fnmatch.py - but the converter does not have the ability to handle escape characters).

    Here is some Python code that adds this trick to all square brackets in the given glob_pattern,  causing glob to match all square brackets literally -

    import glob
    import re
    
    # given the following glob_pattern
    glob_pattern = 'project[X].*.pdf'
    
    # replace the left square bracket with [[]
    glob_pattern = re.sub(r'\[', '[[]', glob_pattern)
    # replace the right square bracket with []] but be careful not to replace
    # the right square brackets in the left square bracket's 'escape' sequence.
    glob_pattern = re.sub(r'(?<!\[)\]', '[]]', glob_pattern)
    
    files = glob.glob(glob_pattern)
     
  • Network shares and XBMC on Apple TV 2

  • Changing fonts on boxee box

  • Rendering variable-sized SVGs

  • Downloading large files with VBScript

Tweets

  • Wondering why a financial company approved this. To annoy people and tell them their passwords are stored in cleartext? http://t.co/N0DUumta

    1 year ago

  • Using @perforce P4Sandbox, I can finally code the way I want to at work! Goodbye silly SCM policy. Few issues with streams and jobs though.

    1 year ago

  • Trying to do something with Linux that even Linus strongly recommends against. This could only end badly.

    1 year ago

  • Surprisingly few good wiki & forum apps. If @atlassian had a discussion app bridging Confluence & JIRA, it'd be the perfect support portal.

    2 years ago

  • The @commbank secure prepaid card site is signed by "YALAMANCHILI SOFTWARE EXPORTS LIMITED". Probably outsourced, but not very reassuring.

    2 years ago