python - Unexpected Performance Decrease -


i have parse huge (250 mb) text file, reason single line, causing every text editor tried (notepad++, visual studio, matlab) fail loading it. therefore read piece piece, , parse whenever logical line (starting #) read:

f = open(filename, "rt")  line = "" buffer = "blub" while buffer != "":     buffer = f.read(10000)     = buffer.find('#')     if != -1: # end of line found         line += buffer[:i]         processline(line)         line = buffer[i+1:] # skip '#'     else: # still reading current line         line += buffer 

this works reasonably well, however, might happen, line shorter buffer, cause me skip line. replaced loop by

while buffer != "":     buffer = f.read(10000)     = buffer.find('#')     while != -1:         pixels += 1         line += buffer[:i]         buffer = buffer[i+1:]         processline(line)         = buffer.find('#')     line += buffer 

, trick. @ least hundred times slower, rendering useless read large files. don't see, how can happen, have inner loop, of times repeated once. copy buffer (buffer = buffer[i+1:]), somehow understand if performance dropped half, don't see how make 100 times slower.

as side note: (logical) lines 27.000 bytes. therefore, if buffer 10.000 bytes, never skip lines in first implementation, if 30.000, do. not seem impact performance, if inner loop in second implementation evaluated @ once, performance still horrible.

what going on under hood, miss?

if understood correctly want do, both versions of code wrong. @leon said in second version missing line = "" after processline(line), , in first version first line correct, , sad if line shorter buffer, use first part of buffer in line += buffer[:i] problem in line line = buffer[i+1:] if line 1000 characters long, , buffer 10000 characters long, when use line += buffer[:i], line 9000 characters long containing more 1 line. reading:

"this works reasonably well, however, might happen, line shorter buffer, cause me skip line"

i think realised that, reason writing in detail, is reason why first version works faster.

after explaining that, think best read hole file , split text lines, code this:

f = open('textfile.txt', "rt") buffer = f.read() f.close() l = buffer.split('#') 

and can use like:

for line in l:     processline(line) 

to list l took me less 2 seconds.

ps: shouldn't have problems opening large files (like 250mb) notepad, opened 500mb files.


Comments

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -