python - Pandas: frequency of an element in two groups of columns -


i have dataframe data laid out this:

observation     a_1    a_2    a_3    b_1    b_2    b_3 obs1            yes    no     yes    no     no     no obs2            no     no     no     yes    yes    yes obs3            yes    yes    yes    yes    yes    yes 

the goal: calculate frequency of observations marked "yes" are:

  • only in "a" samples
  • only in "b" samples
  • in both groups

edit: means need exclude, first 2 counts, observations contain "yes" both , b group (see third line).

i thought using groupby:

grouper = data.groupby(lambda x: x.split("_")[0], axis=1) grouped = grouper.agg(lambda x: sum(x == "yes")) 

but have counts divided row, not want.

what best couse of action here?

edit: requested, more information on output. i'd like

frequency of valid [meaning "yes"] observations in group a: x frequency of valid observations in group "b": y frequency valid observations: z 

where x, y, , z counts returned.

i'm not caring specific output individual observations. i'm interested in values across of them.

in [129]: = ['a_1', 'a_2', 'a_3']  in [130]: b = ['b_1', 'b_2', 'b_3']  in [131]: ina = (df[a] == 'yes').any(axis=1)  in [132]: inb = (df[b] == 'yes').any(axis=1)  in [133]: ina & ~inb out[133]: observation obs1            true obs2           false obs3           false dtype: bool  in [134]: ~ina & inb out[134]: observation obs1           false obs2            true obs3           false dtype: bool  in [135]: ina & inb out[135]: observation obs1           false obs2           false obs3            true dtype: bool 

counting can done using value_counts: (ina & inb).value_counts()[true]


Comments

Popular posts from this blog

Change php variable from jquery value using ajax (same page) -

Pull out data related to my apps from Android Play Store and iOS App Store -

How can I fetch data from a web server in an android application? -