.net - How to traverse multiple Log/Text Files of approx 200 MB Each using C#? and Apply Regex -
i have develop utility accepts path of folder containing multiple log/text files of around 200 mb each , traverse through files pick 4 elements lines exist.
i have tried multiple solutions, all solutions working fine smaller files when load bigger file windows form hangs or shows "outofmemory exception". please
solution 1:
string textfile; string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:t|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))"; folderbrowserdialog fbd = new folderbrowserdialog(); dialogresult result = fbd.showdialog(); if (!string.isnullorwhitespace(fbd.selectedpath)) { string[] files = directory.getfiles(fbd.selectedpath); system.windows.forms.messagebox.show("files found: " + files.length.tostring(), "message"); foreach (string filename in files) { textfile = file.readalltext(filename); matchcollection mc = regex.matches(textfile, re1); foreach (match m in mc) { string = m.tostring(); path.text += a; //temporary, check output path.text += environment.newline; } } }
soltuion 2:
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:t|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))"; folderbrowserdialog fbd = new folderbrowserdialog(); dialogresult result = fbd.showdialog(); foreach (string file in system.io.directory.getfiles(fbd.selectedpath)) { const int32 buffersize = 512; using (var filestream = file.openread(file)) using (var streamreader = new streamreader(filestream, encoding.utf8, true, buffersize)) { string line; while ((line = streamreader.readline()) != null) { matchcollection mc = regex.matches(line, re1); foreach (match m in mc) { string = m.tostring(); path.text += a; //temporary, check output path.text += environment.newline; } } }
solution 3:
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:t|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))"; folderbrowserdialog fbd = new folderbrowserdialog(); dialogresult result = fbd.showdialog(); using (streamreader r = new streamreader(file)) { try { string line = string.empty; while (!r.endofstream) { line = r.readline(); matchcollection mc = regex.matches(line, re1); foreach (match m in mc) { string = m.tostring(); path.text += a; //temporary, check output path.text += environment.newline; } } } catch (exception ex) { messagebox.show(ex.message); } }
few things should taken care of
- don't append string
path.text += ...
. assuming test code , should thrown out - you can use simple
file.readlines
call no practical difference in file reading speed case - you should compile regex
- you can try simplify regex
- you can add simple string based pre-checks before doing regex matches
below sample code implement above guidelines
string re1 = "((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:t|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))"; var buf = new list<string>(); var re2 = new regex(re1, regexoptions.compiled); folderbrowserdialog fbd = new folderbrowserdialog(); dialogresult result = fbd.showdialog(); foreach (string file in system.io.directory.getfiles(fbd.selectedpath)) { foreach (var line in file.readlines(file)) { if ((indx = line.indexof('-')) == -1 || line.indexof(':', indx + 1) == -1) continue; matchcollection mc = re2.matches(line); foreach (match m in mc) { string = m.tostring(); buf.add(a + environment.newline); //temporary, check output } } }
Comments
Post a Comment