lucene - Getting total word frequencies for a subset of documents in Solr -


i'm interested in using solr analyze documents , obtain word frequencies document matching particular criteria.

i tried termvectorcomponent able term frequencies individual documents not totals on groups of documents.

for example given following data:

  {     "id": "1",     "category": "cat1",     "includes": "the green car.",   },   {     "id": "2",     "category": "cat1",     "includes": "the red car.",   },   {     "id": "3",     "category": "cat2",     "includes": "the black car.",   } 

i able total term frequency counts per category. e.g.

<category name="cat1">    <lst name="the">2</lst>    <lst name="car">2</lst>    <lst name="green">1</lst>    <lst name="red">1</lst> </category> <category name="cat2">    <lst name="the">1</lst>    <lst name="car">1</lst>    <lst name="black">1</lst> </category> 

i tried using facets unable them combine word counts individual documents shown above. noticed termvector supports gives document frequency terms use in entire index not useful me. need total frequency counts subsets of documents.

does have suggestions how information solr/lucene?

thanks in advance.

i found link; you'll have modify termscomponent.java link (solrj perhaps?)

i've never tried it, use functionquery (i.e. sum) add tv.df values? here's full list of functionqueries link


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -