How to split string of biographical info into different dictionaries using regex, in Python? -
recently got hands on research project benefit learning how parse string of biographical data on several individuals set of dictionaries each individual.
the string contains break words , hoping create keys off of breakwords , separate dictionaries line breaks. here 2 people want create 2 different dictionaries within data:
bankers = [ ' bakstansky, peter; senior vice president, federal reserve bank of new york, in charge of public information since 1976, when joined ny fed vice president. senior officer in charge of die office of regional , community affairs, ombudsman bank , senior administrative officer executive group, m 0 children educ city college of new york (bachelor of business administration, 1961); university of illinois, graduate school, , new york university, graduate school of business. 1962-6: business , financial writer, new york, on american banker, new york-world telegram & sun, neia york herald tribune (banking editor 1964-6). 1966-74: chase manhattan bank: manager of public relations, based in paris, 1966-71; manager of chase's european marketing , planning, based in brussels, 1971-2; vice president , director of public relations, 1972-4.1974-76: bache & co., vice president , director of corporate communications. barron, patrick k.; first vice president , < operating officer of federal reserve bank o atlanta since february 1996. member of fed" reserve systems conference of first vice preside vice chairman of bank's management con , of discount committee, m 3 child educ university of miami (bachelor's degree in management); harvard business school (prog management development); stonier graduate sr of banking, rutgers university. 1967: joined fed reserve bank of atlanta in computer operations 1971: transferred miami branch; 1974: assist: president; 1987: senior vice president.1988: re1- atlanta head of corporate services. member executive committee of georgia council on igmic education; former vice diairman of greater ji§?charnber of commerce , president'sof university of miami; in atlanta, former ||mte vice chairman united way of atlanta feisinber of leadership atlanta. member of council on economic education. interest. ' ]
so example, in data have 2 people - peter batanksy , patrick k. barron. want create dictionary each individual these 4 keys: bankerjobs
, number of children
, education
, , nonbankerjobs
.
in text there break words: "m" = number of children "educ", , before "m" bankerjobs , after first "." after educ nonbankerjobs, , keyword break between individuals seems amount of spaces after "." >1
how can create dictionary each of these 2 individuals these 4 keys using regular expressions on these break words?
specifically, set of regex me create dictionary these 2 individuals these 4 keys (built on above specified break words)?
a pattern thinking in perl:
pattern = [r'(m/[ '(.*);(.*)m(.*)educ(.*)/)']
but i'm not sure..
i'm thinking code similar please correct if im wrong:
my_banker_parser = re.compile(r'somefancyregex') def nested_dict_from_text(text): m = re.search(my_banker_parser, text) if not m: raise valueerror d = m.groupdict() return { "centralbanker": d } result = nested_dict_from_text(bankers) print(result)
my hope take code , run through rest of biographies of individuals of interest.
using named groups less brittle, since doesn't depend on pieces of data being in same order in each biography. should work:
>>> import re >>> regex = re.compile(r'(?p<foo>foo)|(?p<bar>bar)|(?p<baz>baz)') >>> data = {} >>> match in regex.finditer('bar baz foo something'): ... data.update((k, v) k, v in match.groupdict().items() if v not none) ... >>> data {'baz': 'baz', 'foo': 'foo', 'bar': 'bar'}
Comments
Post a Comment