java - Not able to replace a text in PDF using PDFBox 2.0.2 -
my requirements
1) need identify particular text pattern
2) replace text pattern pre-defined text-value same format of text pattern, such font, font colour, bold …
3) able identify text, replace text predefined values, writing pdf failing.
i tried following 2 appraches write pdf
1) overriding writestring(string string, list textpositions)of pdftextstripper
2) using cosarray.add(new cosstring(replacedfield)); or cosarray.set(…)
results approach 1 - overriding writestring
the pdf generated code not getting opened in pdf. able open in word, there no format of original text.
results approach 2 - using cosarray.add or cosarray.set(…) seeing boxes in generated pdf .
code approach 1 - overriding writestring
public void rewrite(string templatepdfpath) throws ioexception { pddocument document = null; writer pdfwriter = null; try { file templatefile = new file(templatepdfpath); document = pddocument.load(templatefile); this.setsortbyposition(true); this.setstartpage(0); this.setendpage(document.getnumberofpages()); pdfwriter = new printwriter(utils.getfilepathwithtimestamp(templatepdfpath).tostring()); this.writetext(document, pdfwriter); } { if (document != null) { document.close(); } if (null != pdfwriter) pdfwriter.close(); // if (null != pdfwriter) // pdfwriter.close(); } } protected void writestring(string string, list<textposition> textpositions) throws ioexception { (int = 0; < textpositions.size(); i++) { textposition text = textpositions.get(i); string currentcharcter = text.getunicode(); // system.out.println("string[" + text.getxdiradj() + "," + // // text.getydiradj() + " fs=" + text.getfontsize() // + " xscale=" + // text.getxscale() + " height=" + // text.getheightdir() + " // space=" // + // text.getwidthofspace() + " width=" + text.getwidthdiradj() + // // "]" + // currentcharcter); } string replacedstring = replacefields(string.trim()); if (!(string.equals(replacedstring))) { system.out.println("field " + string + " replaced value " + replacedstring); // super.writestring(replacedstring, textpositions); super.writestring(replacedstring); } }
code approach 2 - using cosarray.add or cosarray.set(…)
public list<string> replacefieldsincosarray(cosarray cosarray) { list<string> replacedstrings = new arraylist<string>(); string stringsofcosarray = ""; (int cosarrayindex = 0; cosarrayindex < cosarray.size(); cosarrayindex++) { object cosobject = cosarray.get(cosarrayindex); if (cosobject instanceof cosstring) { cosstring cosstring = (cosstring) cosobject; stringsofcosarray += cosstring.getstring(); } } stringsofcosarray = stringsofcosarray.trim(); //cosarray.clear(); string replacedfield = this.replacefields(stringsofcosarray); system.out.println("costext:" + stringsofcosarray + ":replacedfield:" + replacedfield); cosarray.add(new cosstring(replacedfield)); if (!stringsofcosarray.equals(replacedfield)) { replacedstrings.add(replacedfield); }
strong text
1) overriding writestring(string string, list textpositions)of pdftextstripper
pdftextstripper
tool extraction of plain text. thus, not surprising output cannot opened pdf. furthermore, word can open because word recognises plain text , opens such.
2) using cosarray.add(new cosstring(replacedfield)); or cosarray.set(…)
it not clear mean here. in particular, cosarray
talking about?
one might assume mean parameter of tj operator there multiple reasons against assumption:
- the tj operator 1 of many text showing operators , 1 accepting array argument; thus, @ few of operators in question;
- your code assume whole text pattern try identify drawn same operation; why should it?
- you seem assume
cosstring.getstring()
returns intelligible; unfortunately not case in general, merely if fonts in question usesome standard encoding had been becoming less , less common; - furthermore, assume glyphs replacement text contained in font of replaced text. why should they? embedded font subsets have become more , more common...
thus, mean here?
that being said, if happen work merely naively built pdfs, might want @ answer question @tilmann pointed to. there small set of pdfs code may work for.
if pdfs happen more sophisticated, though, describing approach beyond scope of single stackoverflow answer.
by way, requirements not defined, in particular
replace text pattern pre-defined text-value same format of text pattern, such font, font colour, bold …
if predefined text has 3 letters, replacement has 2 letters, , found occurrence has first glyph in red, second in green, , third in blue, how should 2 replacement glyphs drawn using 3 colors?
Comments
Post a Comment