python - Pandas: frequency of an element in two groups of columns -
i have dataframe data laid out this:
observation a_1 a_2 a_3 b_1 b_2 b_3 obs1 yes no yes no no no obs2 no no no yes yes yes obs3 yes yes yes yes yes yes the goal: calculate frequency of observations marked "yes" are:
- only in "a" samples
- only in "b" samples
- in both groups
edit: means need exclude, first 2 counts, observations contain "yes" both , b group (see third line).
i thought using groupby:
grouper = data.groupby(lambda x: x.split("_")[0], axis=1) grouped = grouper.agg(lambda x: sum(x == "yes")) but have counts divided row, not want.
what best couse of action here?
edit: requested, more information on output. i'd like
frequency of valid [meaning "yes"] observations in group a: x frequency of valid observations in group "b": y frequency valid observations: z where x, y, , z counts returned.
i'm not caring specific output individual observations. i'm interested in values across of them.
in [129]: = ['a_1', 'a_2', 'a_3'] in [130]: b = ['b_1', 'b_2', 'b_3'] in [131]: ina = (df[a] == 'yes').any(axis=1) in [132]: inb = (df[b] == 'yes').any(axis=1) in [133]: ina & ~inb out[133]: observation obs1 true obs2 false obs3 false dtype: bool in [134]: ~ina & inb out[134]: observation obs1 false obs2 true obs3 false dtype: bool in [135]: ina & inb out[135]: observation obs1 false obs2 false obs3 true dtype: bool counting can done using value_counts: (ina & inb).value_counts()[true]
Comments
Post a Comment