arrays - Checking two strings for approximate match in PHP -
i'm trying check approximately similarity of strings.
here criteria use that.
1) order of words important 2) words can have 80% of similarity.
example:
$string1 = "how cost me" //string in vocabulary (all "right" words here) $string2 = "how costs " //"costs" instead "cost" -is deliberate mistake (user input);
algoritm: 1) check similarity of words , create clean string "right" words (according order appear in vocabulary). output: "how cost" 2) create clean string "right" words in order appear in user input. output: "how cost it" 3)compare 2 outputs - if not same - return no, else if same return yes.
any suggestions?
i started write code, i'm not familiar tools in php, don't know how rationally , efficiently.
it looks more javascript/php
$string1="how cost me" ; $string2= "how costs it"; function comparestrings($string1, $string2) { if (strlen($s1)==0 || strlen($s2)==0) { return 0; } while (strpos($s1, " ")!==false) { $s1 = str_replace(" ", " ", $s1); } while (strpos($s2, " ")!==false) { $s2 = str_replace(" ", " ", $s2); } $ar1 = explode(" ",$s1); $ar2 = explode(" ",$s2); $array1 = array_flip($ar1); $array2 = array_flip($ar2); $l1 = count($ar1); $l2 = count($ar2); $meaning=""; $rightorder="" ($i=0;$i<=$l1;$i++) { ($j=0;$j<=$l2;$j++) { $k= similar_text($array1[i], $array2[j], $perc).php_eol; if ($perc>=85) { $meaning=$meaning." ".$array1[j]; //generating string of first output $rightorder[i]= array1[i]; //generating array second output } } } }
the idea thet $meaning "how cost" , $rightorder
$rightorder[0]='how' $rightorder[1]='much' $rightorder[2]='' $rightorder[3]='cost' $rightorder[4]='it'
after somehow onvert string "how cost it"
and compare two.
if ("how cost it"=="how cost") return true; else return false.
your problem belongs science of nlp (natural language processing).
each issue mentioned in question has filed of study of own:
splitting string words tokenization. seems trivial in english, not in other languages, german. there problem of how parse punctuation marks.
creating "right words" called stemming. there number of tools that. if words in english may try porter stemming algorithm. other languages may have own stemming techniques, dictionary algorithm exists.
calculating similarity of string based on individual word occurrences called "cosine similarity". there number of other techniques. there alse problem od synonymy , polysemy
i hope problem mixture of above-mentioned problems.
Comments
Post a Comment