stata - How to see tabulate/table/display of frequency that different groups take on values of a variable -
using stata, suppose have data:
clear set more off input /// id str5 value 1 fox 1 ox 1 cow 2 fox 2 fox 3 ox 3 fox 3 cow 4 cow 4 ox end
as in previous answer, if 1 wants determine within group if values same, 1 can use:
bysort id (value) : gen onevalue = value[1] == value[_n]
my question relates this, takes 1 step further. want know frequencies of combinations of value
id
takes on. not want consider frequency or order within id
- care if appears @ least once. may little confusing, illustrate, want know following:
there 3 different groups occur in data: a) fox, ox, cow
, b) fox
, , c) cow, ox
. note ids 1
, 3
both belong group a
, id 2
belongs group b
, , id 4
belongs group c
.
combination freq fox, ox, cow 2 fox 1 cow, ox 1
i not need in exact format, knowing information helpful me. there simple way accomplish task? best thought of involves creating bunch of new variables indicators whether element of value
in id
, , tab
bing combinations of these variables. feel there should better way.
i able drop id
s based on results of above.
this variation on @roberto ferrer's helpful answer. concatenate in place, avoiding reshape
. assumption looking @ string variable. if not, apply tostring
or string()
first.
. clear . input id str5 value id value 1. 1 fox 2. 1 ox 3. 1 cow 4. 2 fox 5. 2 fox 6. 3 ox 7. 3 fox 8. 3 cow 9. 4 cow 10. 4 ox 11. 5 cow 12. 5 fox 13. 5 fox 14. end . bysort id (value) : gen = value if _n == 1 (8 missing values generated) . id : replace = cond(value != value[_n-1], all[_n-1] + " " + value, all[_n-1]) if _n > 1 (8 real changes made) . id : replace = all[_n] (6 real changes made) . tab all, sort | freq. percent cum. ------------+----------------------------------- cow fox ox | 6 46.15 46.15 cow fox | 3 23.08 69.23 cow ox | 2 15.38 84.62 fox | 2 15.38 100.00 ------------+----------------------------------- total | 13 100.00 . egen tag = tag(id) . tab if tag, sort | freq. percent cum. ------------+----------------------------------- cow fox ox | 2 40.00 40.00 cow fox | 1 20.00 60.00 cow ox | 1 20.00 80.00 fox | 1 20.00 100.00 ------------+----------------------------------- total | 5 100.00 . groups id +-----------------------------------+ | id freq. percent | |-----------------------------------| | 1 cow fox ox 3 23.08 | | 2 fox 2 15.38 | | 3 cow fox ox 3 23.08 | | 4 cow ox 2 15.38 | | 5 cow fox 3 23.08 | +-----------------------------------+
groups
here user-written installed ssc inst groups
. count identifier, not observations, use egen, tag()
tag each identifier once.
another immediate trick apply wordcount()
. drop
ping identifiers conditionally on these results should (more) straightforward.
if string values include spaces, use other concatenating punctuation appropriate, commas, semi-colons or colons.