html - PHP, extracting mailing address -
i have problem need fixing. trying create script crawls websites mailing addresses. german addresses, unsure of how create said script, have created 1 extracts email addresses said websites. address 1 puzzling because there isn't real format.. here couple german addresses examples on way possibly extract data.
ilona mustermann hauptstr. 76 27852 musterheim andreas mustermann schwarzwaldhochstraße 1 27812 musterhausen d. mustermann kaiser-wilhelm-str.3 27852 mustach those few examples of looking extract websites. possible php?
edit:
this have far
function extract_address($str) { $str = strip_tags($str); $name = null; $zcc = null; $street = null; foreach(preg_split('/([^a-za-z0-9üß\-\@\.\(\) .])+/', $str) $token) { if(preg_match('/([a-za-z\.])+ ([a-za-z\.])+/', $token)){ $name = $token; } if(preg_match('/ /', $token)){ $street = $token; } if(preg_match('/[0-9]{5} [a-za-zü]+/', $token)){ $zcc = $token; } if(isset($name) && isset($zcc) && isset($street)){ echo($name."<br />".$street."<br />".$zcc."<br /><br />"); $name = null; $street = null; $zcc = null; } } } it works retrieve $name(ie: ilona mustermann , city/zipcode(27852 musterheim) unsure of regex retrieve streets?
well have came far, , seems working 60% of time on streets, zip/city work 100% , name. when tries extract street fails.. idea why?
function extract_address($str) { $str = strip_tags($str); $name = null; $zcc = null; $street = null; foreach(preg_split('/([^a-za-z0-9üß\-\@\.\(\)\& .])+/', $str) $token) { if(preg_match('/([a-za-z\&.])+ ([a-za-z.])+/', $token) && !preg_match('/([a-za-zß])+ ([0-9])+/', $token)){ //echo("n:$token<br />"); $name = $token; } if(preg_match('/(\.)+/', $token) || preg_match('/(ß)+/', $token) || preg_match('/([a-za-zß\.])+ ([0-9])+/', $token)){ $street = $token; } if(preg_match('/([0-9]){5} [a-za-züß]+/', $token)){ $zcc = $token; } /*echo("<br /> n:$name <br /> s:$street <br /> z:$zcc <br /> ");*/ if(isset($name) && isset($zcc) && isset($street)){ echo($name."<br />".$street."<br />".$zcc."<br /><br />"); $name = null; $street = null; $zcc = null; } } }
of course possible need use preg_match() function. making regex pattern.
for example post-code
<?php $str = "your adresses string here"; preg_match('/([0-9]+) ([a-za-z]+)/', $str, $matches); print_r($matches); ?> this regex matches adresses you've given need put in native characters.
[a-za-züß.]+ [a-za-z.üß]+\s[a-za-z. 0-9ß-]+\s[0-9]+ [a-za-züß.]+
Comments
Post a Comment