Filter by AVG using Apache Pig -
i have file "error_data.txt" below:
10474 3.0 2013-05-01 7 10474 94.0 2013-05-01 3 10538 72.0 2013-05-01 15 11001 95.0 2013-05-01 248 13113 78.0 2013-05-01 18 13116 53.0 2013-05-01 4 13116 95.0 2013-05-01 1 13122 89.0 2013-05-01 2 10001 56.0 2013-05-02 7 10413 61.0 2013-05-02 6 ......... .........
this have till , works fine:
error_data = load 'error_data.txt' (ppapi_error_code:int, api_version:chararray, day:chararray, count:long); filtered_data = filter error_data api_version=='61.0'; grouped_data = group filtered_data day; grouped_count = foreach grouped_data generate group day, sum(filtered_data.count) error_count; store grouped_count 'out_1';
now want filter grouped_count
values have error_count
greater average.
i have obtained average follows:
grouped_count_bag = group grouped_count all; average = foreach grouped_count_bag generate avg(grouped_count.error_count);
when dump
it, value in tuple (578.9444444444445)
. able filter value as
filtered_grouped_count = filter grouped_count (error_count>578.9444444444445);
but want as
filtered_grouped_count = filter grouped_count (error_count>average);
which not seem allowed. assistance appreciated.
average = foreach grouped_count_bag generate avg(grouped_count.error_count) avg; grouped_count_average = cross grouped_count, average; filtered_grouped_count = filter grouped_count_average (error_count>avg);
i know cross seems wasteful, far know that's way it.
Comments
Post a Comment