Fix exponential regex performance issue

filter_leading_non_json_lines effectively does

re.match(".*\w+=\w+.*", line)

for every line of output. This has abysmal performance in case of large
Base64-encoded data (which ultimately does not match the regex but does
match the .*\w+= part) as returned e.g. by the template module (diffs).

Replacing the match with

re.search("\w=\w", line)

drops the complexity back to linear, and actually usable with large
diffs from the template module (a 150 KB Base64 diff kept Ansible
spinning at 100% cpu for minutes).

Also, check the easy cases (line.startswith) first while we're here.

Closes: #8932
This commit is contained in:
Grzegorz Nosek 2014-08-01 14:34:37 +02:00
parent c93b89fa63
commit 7f33580eba

View file

@ -1041,11 +1041,11 @@ def filter_leading_non_json_lines(buf):
filter only leading lines since multiline JSON is valid.
'''
kv_regex = re.compile(r'.*\w+=\w+.*')
kv_regex = re.compile(r'\w=\w')
filtered_lines = StringIO.StringIO()
stop_filtering = False
for line in buf.splitlines():
if stop_filtering or kv_regex.match(line) or line.startswith('{') or line.startswith('['):
if stop_filtering or line.startswith('{') or line.startswith('[') or kv_regex.search(line):
stop_filtering = True
filtered_lines.write(line + '\n')
return filtered_lines.getvalue()