arrays - Checking two strings for approximate match in PHP -


i'm trying check approximately similarity of strings.

here criteria use that.

1) order of words important 2) words can have 80% of similarity.

example:

$string1 = "how cost me" //string in vocabulary (all "right" words here) $string2 = "how costs "   //"costs" instead "cost" -is deliberate mistake (user input); 

algoritm: 1) check similarity of words , create clean string "right" words (according order appear in vocabulary). output: "how cost" 2) create clean string "right" words in order appear in user input. output: "how cost it" 3)compare 2 outputs - if not same - return no, else if same return yes.

any suggestions?

i started write code, i'm not familiar tools in php, don't know how rationally , efficiently.

it looks more javascript/php

$string1="how cost me" ; $string2= "how costs it";  function comparestrings($string1, $string2) {      if (strlen($s1)==0 || strlen($s2)==0) {         return 0;     }      while (strpos($s1, "  ")!==false) {         $s1 = str_replace("  ", " ", $s1);     }     while (strpos($s2, "  ")!==false) {         $s2 = str_replace("  ", " ", $s2);     }      $ar1 = explode(" ",$s1);     $ar2 = explode(" ",$s2);     $array1 = array_flip($ar1);     $array2 = array_flip($ar2);     $l1 = count($ar1);     $l2 = count($ar2);   $meaning="";     $rightorder=""      ($i=0;$i<=$l1;$i++) {           ($j=0;$j<=$l2;$j++) {           $k=   similar_text($array1[i], $array2[j], $perc).php_eol; if ($perc>=85) {     $meaning=$meaning." ".$array1[j]; //generating string of first output     $rightorder[i]= array1[i]; //generating array second output  }          }       }  } 

the idea thet $meaning "how cost" , $rightorder

$rightorder[0]='how' $rightorder[1]='much' $rightorder[2]='' $rightorder[3]='cost' $rightorder[4]='it' 

after somehow onvert string "how cost it"

and compare two.

if ("how cost it"=="how cost") return true; else return false. 

your problem belongs science of nlp (natural language processing).

each issue mentioned in question has filed of study of own:

  • splitting string words tokenization. seems trivial in english, not in other languages, german. there problem of how parse punctuation marks.

  • creating "right words" called stemming. there number of tools that. if words in english may try porter stemming algorithm. other languages may have own stemming techniques, dictionary algorithm exists.

  • calculating similarity of string based on individual word occurrences called "cosine similarity". there number of other techniques. there alse problem od synonymy , polysemy

i hope problem mixture of above-mentioned problems.


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -