Filter by AVG using Apache Pig -


i have file "error_data.txt" below:

10474   3.0 2013-05-01  7 10474   94.0    2013-05-01  3 10538   72.0    2013-05-01  15 11001   95.0    2013-05-01  248 13113   78.0    2013-05-01  18 13116   53.0    2013-05-01  4 13116   95.0    2013-05-01  1 13122   89.0    2013-05-01  2 10001   56.0    2013-05-02  7 10413   61.0    2013-05-02  6 ......... ......... 

this have till , works fine:

error_data = load 'error_data.txt' (ppapi_error_code:int, api_version:chararray, day:chararray, count:long); filtered_data = filter error_data api_version=='61.0';                                                       grouped_data = group filtered_data day;                                                                      grouped_count = foreach grouped_data generate group day, sum(filtered_data.count) error_count; store grouped_count 'out_1'; 

now want filter grouped_count values have error_count greater average.

i have obtained average follows:

grouped_count_bag = group grouped_count all; average = foreach grouped_count_bag generate avg(grouped_count.error_count); 

when dump it, value in tuple (578.9444444444445). able filter value as

filtered_grouped_count = filter grouped_count (error_count>578.9444444444445); 

but want as

filtered_grouped_count = filter grouped_count (error_count>average); 

which not seem allowed. assistance appreciated.

average = foreach grouped_count_bag generate avg(grouped_count.error_count) avg; grouped_count_average = cross grouped_count, average; filtered_grouped_count = filter grouped_count_average (error_count>avg); 

i know cross seems wasteful, far know that's way it.


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -