token - Using bigram using Stanford NLP in java -


i using stanford nlp api document collection , code used tokenization

              ptbtokenizer<corelabel> ptbt = new ptbtokenizer<>(reader,                                   new corelabeltokenfactory(), "");                                 while (ptbt.hasnext()) {                             corelabel token = ptbt.next();                             string word = token.get(textannotation.class);                            } 

this code delimited on white space. mean convert words alarm activated in 2 words alarm , activated. guess bigram solve problem not sure how use here. can body suggest thing use bigram ptbtokenizer or how use bigram in tokenization using stanford nlp.


Comments

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

python - Error opening file in H5PY (File signature not found) -