Currently the re module supports only simple set syntax. But it is possible that in future it will support extended syntax: nested sets and set operations. Unfortunately that syntax is not fully compatible with the current syntax. In particular open bracket '[' in a character set starts a nested set. The code of html5lib contains a regular expression that will be broken if the new syntax will be accepted.
ascii_punctuation_re = re.compile("[\u0009-\u000D\u0020-\u002F\u003A-\u0040\u005B-\u0060\u007B-\u007E]")
It would be good to guard the code from possible future breakage. It is enough to add a backslash before [. Replace \u005B with \u005C\u005B, \\\u005B or \\[.
See Python issue: https://bugs.python.org/issue30349.