java - Apache Geode - Query performance on joins -

- May 15, 2013

i using apache geode caching solution. have requirement store data within 2 different regions , retrieve them simple join query.

i have tried both replicated partitioned regions have found query takes long time return results. have added indexes on both regions has improved performance still not fast enough. can please on how improve performance on query.

here have tried

example 1 - partitioned regions

time taken retrieve 7300 records cache 36 seconds

configuration in cache.xml

<region name="department">     <region-attributes>         <partition-attributes redundant-copies="1">         </partition-attributes>     </region-attributes>     <index name="deptindex" from-clause="/department" expression="deptid"/> </region>  <region name="employee">     <region-attributes>         <partition-attributes redundant-copies="1" colocated-with="department">         </partition-attributes>     </region-attributes>     <index name="empindex" from-clause="/employee" expression="deptid"/> </region>

queryfunction

@override public void execute(functioncontext context) { // todo auto-generated method stub cache cache = cachefactory.getanyinstance(); queryservice queryservice = cache.getqueryservice();  arraylist arguments = (arraylist)context.getarguments(); string querystr = (string)arguments.get(0);  query query = queryservice.newquery(querystr);  try {     selectresults result = (selectresults)query.execute((regionfunctioncontext)context);      arraylist arrayresult = (arraylist)result.aslist();     context.getresultsender().sendresult(arrayresult);     context.getresultsender().lastresult(null); } catch (functiondomainexception e) {     // todo auto-generated catch block     e.printstacktrace(); } catch (typemismatchexception e) {     // todo auto-generated catch block     e.printstacktrace(); } catch (nameresolutionexception e) {     // todo auto-generated catch block     e.printstacktrace(); } catch (queryinvocationtargetexception e) {     // todo auto-generated catch block     e.printstacktrace(); }  }

executing function

function function = new queryfunction(); string querystr = "select * /department d, /employee e d.deptid=e.deptid"; arraylist arglist = new arraylist(); arglist.add(querystr); object result = functionservice.onregion(cachefactory.getanyinstance().getregion("department")).withargs(arglist).execute(function).getresult();  arraylist resultlist = (arraylist)result; arraylist<structimpl> finallist = (arraylist)resultlist.get(0);

example 2 - replicated regions

time taken retrieve 7300 records cache 29 seconds

configuration in cache.xml

<region name="department">     <region-attributes refid="replicate">     </region-attributes>     <index name="deptindex" from-clause="/department" expression="deptid"/> </region>  <region name="employee">     <region-attributes refid="replicate">     </region-attributes>     <index name="empindex" from-clause="/employee" expression="deptid"/> </region>

query

@override public selectresults fetchjoineddataforindex() {     queryservice queryservice = getclientcache().getqueryservice();     query query = queryservice.newquery("select * /department d, /employee e d.deptid=e.deptid");     selectresults result = null;     try {         result = (selectresults)query.execute();         system.out.println(result.size());     } catch (functiondomainexception e) {         // todo auto-generated catch block         e.printstacktrace();     } catch (typemismatchexception e) {         // todo auto-generated catch block         e.printstacktrace();     } catch (nameresolutionexception e) {         // todo auto-generated catch block         e.printstacktrace();     } catch (queryinvocationtargetexception e) {         // todo auto-generated catch block         e.printstacktrace();     }     return result; }

can please describe domain objects? keys , values in employees , department regions? using pdx?

one simple approach make deptid key department region. in function, can iterate on employee region , get(deptid) on department region. in order reduce latency further, can send chunk of results client, while server keeps running function. since mention have more 7000 entries in result, can batch 500 @ time server. this:

@override public void execute(functioncontext context) {   regionfunctioncontext rfc = (regionfunctioncontext) context;   region<empid, pdxinstance> employee = partitionregionhelper.getlocalprimarydata(rfc.getdataset());   region<deptid, pdxinstance> department = partitionregionhelper.getlocalprimarydata(rfc.getdataset());   int count = 0;   map<pdxinstance, pdxinstance> results = new hashmap<>();   (region.entry<empid, pdxinstance> e : employee.entryset()) {     pdxinstance dept = department.get(e.getvalue().get("deptid"));     results.put(e.getvalue(), dept);     if (count == 500) {       context.getresultsender().sendresult(results);       results.clear();       count = 0;     }   }   context.getresultsender().lastresult(results); }

then on client can use custom result collector able process results chunk-by-chunk arrive server.

Search This Blog

Ant COmde

java - Apache Geode - Query performance on joins -

Comments

Post a Comment

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -