Mongodb - Sort by computed field -

- January 15, 2015

i'm struggling finding solution problem mongo db:

i need run query on collection high write/read ratio. query consists in sorting documents field derived other fields belonging same document. moreover, 1 of fields size of array, makes harder.

a simple example:

d1 - { _id: 1, field: 1, array_field: [a,b,c,d] } -> score = 1 + 4 = 5 d2 - { _id: 2, field: 2, array_field: [a,b] }     -> score = 2 + 2 = 4

expected result:

d1 - { _id: 2, score: 4 } d2 - { _id: 1, score: 5 }

(the score not required in resultset)

the solutions i've tried far:

add score field of document, consistently updated other fields updated. problems:
- it not possible parameterize query (tuning) once score has been computed
- it expensive because index on score has updated frequently
create aggregation pipeline makes things easy develop , solves parameterization problem. however, performance drop high beacuse mongo can't rely on use indexes on computed fields, causing memory issues (100mb query error). possible solution enable allowdiskuse flag. however, query become slow.

update: i'd point out query run 10 times second. therefore, pre-computing , storing score in different document might not viable solution.

pratical use: since problem difficult. let me give bit more of context. have document of posts (like facebook posts) sorting creation date , last update. i'd able sort posts "hotness" defined score talking about. thought interesting way compute score follow:

score = * likes - b * dislikes + c * num_comments + d * ( - creation_date)

where a, b, c , d parameters can change tune algorithm. likes , dislikes arrays of objectids referencing users, while num_comments number of comments. query run provide response rest endpoint. no further operations: request -> query -> response.

have had experience derived/aggregated fields? thanks!

it looks complex issue.

this query job, i'd hear performance.

db.perlz.aggregate([ // {$match:{whatever needed here}}         {             $project : {                 _id : 1,                 score : {                     $sum : [{                             "$size" : "$array_field"                         }, "$field"]                 }             }         }, {             $sort : {                 score : 1             }         }      ])

as done on busy server consider replica set setup, , try balance load issuing queries on slave server.

edit

as per update, i'm wondering if steps applicable problem:

update document structure have 2 types of likes: processed , new. processed like added document score worker (that affects likes, dislikes, numcomments fields) , setup score - need calculate delta/difference value.
try determine lowest input set based on previous point (pre-computed score)
limit output known amount of documents (implement paging)

as per dynamic field value - there no huge amount of computation needed score value. considered project fields used in computation , _id, use $lookup last stage , macz parent document scored , sorted result.

any comments welcome!

Search This Blog

Ant COmde

Mongodb - Sort by computed field -

Comments

Post a Comment

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -