php - Counting similarity of arrays inside array -
i have problem pretty unsure how solve this.
given arrays in such format:
$array01 = array( 0 => array("hallo", "welt", "du", "ich"), 1 => array("mag", "dich"), 2 => array("nicht", "haha", "huhu") ); $array02 = array( 0 => array("haha", "welt", "dich"), 1 => array("hallo", "mag", "nicht"), 2 => array("du", "ich", "huhu") );
now want calculate kind of similarity value of these arrays. these arrays result of clustering terms according meaning.
what want know how similar these terms clustered 2 different users ($array01 = user1, $array02 = user2)
. 0,1,2 clusters (they don't have same length)
edit: try describe little bit further: every array result of user clustering terms (hallo, welt, du, ich...) according meaning. every sub-array 1 cluster defined user. problem user not restricted in places term or whole cluster, cannot compare $array01[0] $array02[0]. guess need compare sub-arrays terms in common. every user has cluster terms though.
so example:
$array01[0] , $array02[2]. have 2 terms in common: "du" , "ich" -> +1
the other terms have no clear clustering, guess example yield 1, because clusterings not similar.
how this?
get_similar_items
code:
<?php $array01 = array( 0 => array("hallo", "welt", "du", "ich"), 1 => array("mag", "dich"), 2 => array("nicht", "haha", "huhu") ); $array02 = array( 0 => array("haha", "welt", "dich"), 1 => array("hallo", "mag", "nicht"), 2 => array("du", "ich", "huhu") ); function get_similar_items() { $arrs = func_get_args(); foreach ($arrs &$arr) { while (list($k, $v) = each($arr)) { if (is_array($v)) { array_splice($arr,$k,1,$v); next($arr); } } } return call_user_func_array('array_intersect',$arrs); } print_r(get_similar_items($array01,$array02));
result:
array ( [0] => hallo [1] => welt [2] => du [3] => ich [4] => mag [5] => dich [6] => nicht [7] => haha [8] => huhu )
get_similar_items_count
code:
<?php $array01 = array( 0 => array("hallo", "welt", "du", "ich"), 1 => array("mag", "dich"), 2 => array("nicht", "haha", "huhu") ); $array02 = array( 0 => array("haha", "welt", "dich"), 1 => array("hallo", "mag", "nicht"), 2 => array("du", "ich", "huhu") ); $array03 = array( 0 => array("haha", "haha", "dich"), 1 => array("dich", "mag", "mag"), 2 => array("du", "ich", "haha") ); function get_similar_items_count() { $arrs = func_get_args(); foreach ($arrs &$arr) { while (list($k, $v) = each($arr)) { if (is_array($v)) { array_splice($arr,$k,1,$v); next($arr); } } } unset($arr); $counts = array(); foreach ($arrs $arr) { foreach (array_count_values($arr) $k => $v) { if ($v) { if (!isset($counts[$k])) { $counts[$k] = $v; } else { $counts[$k] += $v; } } } } return $counts; } print_r(get_similar_items_count($array01,$array02,$array03));
result:
array ( [hallo] => 2 [welt] => 2 [du] => 3 [ich] => 3 [mag] => 4 [dich] => 4 [nicht] => 2 [haha] => 5 [huhu] => 2 )
Comments
Post a Comment