java - Apache Geode - Query performance on joins -
i using apache geode caching solution. have requirement store data within 2 different regions , retrieve them simple join query.
i have tried both replicated partitioned regions have found query takes long time return results. have added indexes on both regions has improved performance still not fast enough. can please on how improve performance on query.
here have tried
example 1 - partitioned regions
time taken retrieve 7300 records cache 36 seconds
configuration in cache.xml
<region name="department"> <region-attributes> <partition-attributes redundant-copies="1"> </partition-attributes> </region-attributes> <index name="deptindex" from-clause="/department" expression="deptid"/> </region> <region name="employee"> <region-attributes> <partition-attributes redundant-copies="1" colocated-with="department"> </partition-attributes> </region-attributes> <index name="empindex" from-clause="/employee" expression="deptid"/> </region>
queryfunction
@override public void execute(functioncontext context) { // todo auto-generated method stub cache cache = cachefactory.getanyinstance(); queryservice queryservice = cache.getqueryservice(); arraylist arguments = (arraylist)context.getarguments(); string querystr = (string)arguments.get(0); query query = queryservice.newquery(querystr); try { selectresults result = (selectresults)query.execute((regionfunctioncontext)context); arraylist arrayresult = (arraylist)result.aslist(); context.getresultsender().sendresult(arrayresult); context.getresultsender().lastresult(null); } catch (functiondomainexception e) { // todo auto-generated catch block e.printstacktrace(); } catch (typemismatchexception e) { // todo auto-generated catch block e.printstacktrace(); } catch (nameresolutionexception e) { // todo auto-generated catch block e.printstacktrace(); } catch (queryinvocationtargetexception e) { // todo auto-generated catch block e.printstacktrace(); } }
executing function
function function = new queryfunction(); string querystr = "select * /department d, /employee e d.deptid=e.deptid"; arraylist arglist = new arraylist(); arglist.add(querystr); object result = functionservice.onregion(cachefactory.getanyinstance().getregion("department")).withargs(arglist).execute(function).getresult(); arraylist resultlist = (arraylist)result; arraylist<structimpl> finallist = (arraylist)resultlist.get(0);
example 2 - replicated regions
time taken retrieve 7300 records cache 29 seconds
configuration in cache.xml
<region name="department"> <region-attributes refid="replicate"> </region-attributes> <index name="deptindex" from-clause="/department" expression="deptid"/> </region> <region name="employee"> <region-attributes refid="replicate"> </region-attributes> <index name="empindex" from-clause="/employee" expression="deptid"/> </region>
query
@override public selectresults fetchjoineddataforindex() { queryservice queryservice = getclientcache().getqueryservice(); query query = queryservice.newquery("select * /department d, /employee e d.deptid=e.deptid"); selectresults result = null; try { result = (selectresults)query.execute(); system.out.println(result.size()); } catch (functiondomainexception e) { // todo auto-generated catch block e.printstacktrace(); } catch (typemismatchexception e) { // todo auto-generated catch block e.printstacktrace(); } catch (nameresolutionexception e) { // todo auto-generated catch block e.printstacktrace(); } catch (queryinvocationtargetexception e) { // todo auto-generated catch block e.printstacktrace(); } return result; }
can please describe domain objects? keys , values in employees , department regions? using pdx?
one simple approach make deptid
key department region. in function, can iterate on employee
region , get(deptid)
on department
region. in order reduce latency further, can send chunk of results client, while server keeps running function. since mention have more 7000 entries in result, can batch 500 @ time server. this:
@override public void execute(functioncontext context) { regionfunctioncontext rfc = (regionfunctioncontext) context; region<empid, pdxinstance> employee = partitionregionhelper.getlocalprimarydata(rfc.getdataset()); region<deptid, pdxinstance> department = partitionregionhelper.getlocalprimarydata(rfc.getdataset()); int count = 0; map<pdxinstance, pdxinstance> results = new hashmap<>(); (region.entry<empid, pdxinstance> e : employee.entryset()) { pdxinstance dept = department.get(e.getvalue().get("deptid")); results.put(e.getvalue(), dept); if (count == 500) { context.getresultsender().sendresult(results); results.clear(); count = 0; } } context.getresultsender().lastresult(results); }
then on client can use custom result collector able process results chunk-by-chunk arrive server.
Comments
Post a Comment