stata - How to see tabulate/table/display of frequency that different groups take on values of a variable -


using stata, suppose have data:

clear set more off  input /// id str5 value 1    fox 1    ox 1    cow  2    fox 2    fox 3    ox  3    fox 3    cow  4    cow 4    ox end 

as in previous answer, if 1 wants determine within group if values same, 1 can use:

bysort id (value) : gen onevalue = value[1] == value[_n] 

my question relates this, takes 1 step further. want know frequencies of combinations of value id takes on. not want consider frequency or order within id - care if appears @ least once. may little confusing, illustrate, want know following:

there 3 different groups occur in data: a) fox, ox, cow, b) fox, , c) cow, ox. note ids 1 , 3 both belong group a, id 2 belongs group b, , id 4 belongs group c.

combination    freq fox, ox, cow   2 fox            1 cow, ox        1 

i not need in exact format, knowing information helpful me. there simple way accomplish task? best thought of involves creating bunch of new variables indicators whether element of value in id, , tabbing combinations of these variables. feel there should better way.

i able drop ids based on results of above.

this variation on @roberto ferrer's helpful answer. concatenate in place, avoiding reshape. assumption looking @ string variable. if not, apply tostring or string() first.

. clear  . input id str5 value              id      value   1. 1    fox   2. 1    ox   3. 1    cow    4. 2    fox   5. 2    fox   6. 3    ox    7. 3    fox   8. 3    cow    9. 4    cow  10. 4    ox  11. 5    cow  12. 5    fox  13. 5    fox  14. end   . bysort id (value) : gen = value if _n == 1 (8 missing values generated)  . id : replace = cond(value != value[_n-1], all[_n-1] + " " + value, all[_n-1]) if  _n > 1   (8 real changes made)  . id : replace = all[_n]  (6 real changes made)  . tab all, sort           |      freq.     percent        cum. ------------+-----------------------------------  cow fox ox |          6       46.15       46.15     cow fox |          3       23.08       69.23      cow ox |          2       15.38       84.62         fox |          2       15.38      100.00 ------------+-----------------------------------       total |         13      100.00  . egen tag = tag(id)  . tab if tag, sort          |      freq.     percent        cum. ------------+-----------------------------------  cow fox ox |          2       40.00       40.00     cow fox |          1       20.00       60.00      cow ox |          1       20.00       80.00         fox |          1       20.00      100.00 ------------+-----------------------------------       total |          5      100.00   . groups id     +-----------------------------------+   | id            freq.   percent |   |-----------------------------------|   |  1   cow fox ox       3     23.08 |   |  2          fox       2     15.38 |   |  3   cow fox ox       3     23.08 |   |  4       cow ox       2     15.38 |   |  5      cow fox       3     23.08 |   +-----------------------------------+ 

groups here user-written installed ssc inst groups. count identifier, not observations, use egen, tag() tag each identifier once.

another immediate trick apply wordcount(). dropping identifiers conditionally on these results should (more) straightforward.

if string values include spaces, use other concatenating punctuation appropriate, commas, semi-colons or colons.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -