The feature of adjusting for how words are treated in a text frequency query is very useful. However, if the pointer is set to its second level - "include stemmed words" - the resulting sheet is a bit messy. Rather than using the "most basic" word for a given stem, NVivo appears to either do it randomly, or perhaps to choose the word with the greatest number of occurences for a given stem. Hence, oftentimes I get results in plural or the word "busy" for business, businesses. Though I can visually deduce what I need from that, it won't be very helpful to have a word cloud based off it.
Thefore, I would like to propose the following options:
- Choose in "real time", by clicking on each line in the resultant text frequency query, which word to be shown as being the "basis" word (i.e. in the example above, I'd choose business over busy or businesses). An extention to that would be to choose to separate a list of words, represented by one word in the results, and adding that as a separate line in the results. In the example above, I could then choose to take "busy" our of the list "busy, business, businesses", and have it as a separate line - something like "unstemming". This is going to be especially helpful if I double-click on a term to generate a text search query, so as not to see references to "busy" when I am actually looking for references to "business";
- Show the word, which is the closest to the stem (i.e., singular forms preferred over plural ones);
- Make it possible to see the number of occurences for each of the word forms of the same stem. Thus I could see if "business" is more frequent than "businesses".
In fact, the above are not always mutually exclusive, so a combination might really improve the stemming and thus improve the results and value of word frequency queries.