ruby - Why don't my regex scans work? -
i have script scans text file , puts csv file. grabs debtor
information, puts creditor
information following it.
the problem is, gets each debtor
puts same creditor
information each debtor
, it's not getting new information below debtor
:
fastercsv.open('data.csv', 'a') |csv| debtor_info = results.scan(/^(\d{2}\-\d{5})(\s+)(.*)(\s+)(total:)(\s+)(\$(\d+\,? \.?)+)/) debtor_info.each |line| case_number = line.at(0) debtor = line.at(2).strip total_amount = line.at(6) csv << [case_number, debtor, total_amount] creditor_info = results.scan(/((\d{1,2})\/(\d{1,2})\/(\d{1,4}))\s+(\$(\d+\,?\.?)+)\s+(\d{1,5}bk)\s+(.*)/) creditor_info.each |info| date = info.at(0) amount = info.at(4) fund_number = info.at(6) creditor = info.at(7) empty = " " csv << [empty, date, amount, fund_number, creditor] end end end
this sample input:
00-000## company inc total: $3,134.55 2/25/2003 $416.02 0000bk comp inc 2/25/2003 $105.60 0000bk california imprinted apparel 2/25/2003 $58.41 0000bk john doe 2/25/2003 $33.41 0000bk e doe & assoc 2/25/2003 $78.28 0000bk candle candles 2/25/2003 $44.74 0000bk personnel svcs 2/25/2003 $28.34 0000bk jane doe 2/25/2003 $32.77 0000bk water co 2/25/2003 $141.21 0000bk xyx 2/25/2003 $250.96 0000bk pdq inc 2/25/2003 $146.17 0000bk rs fm 2/25/2003 $722.91 0000bk corp 2/25/2003 $841.14 0000bk bac corp 2/25/2003 $202.57 0000bk abc communications 2/25/2003 $32.02 0000bk yxy sa corp 00-00128 may june total: $29.60 6/26/2002 $29.60 0000bk may june 00-00653 joe doey total: $347.10 7/10/2002 $59.62 0000bk financial corp 7/10/2002 $287.48 0000bk abc corp 00-00657 thomas p public total: $1,096.75 7/2/2003 $1,096.75 0000bk contract svc 00-00735 jean jane total: $29.89 6/18/2003 $29.89 0000bk jean jane
with given structure, need scan either creditor or debtor line single list, work through it, setting "current debtor" whenever encounter one.
assuming results
slurp of input file (you don't say, looks likely):
combined_info = results.scan(/^(\d{2}\-\d{5})(\s+)(.*)(\s+)(total:)(\s+)(\$(\d+\,?\d+\.?)+)|((\d{1,2})\/(\d{1,2})\/(\d{1,4}))\s+(\$(\d+\,?\.?)+)\s+(\d{1,5}bk)\s+(.*)/) case_number = "unknown" debtor = "unknown" total_amount = "unknown" combined_info.each |line| # if it's debtor, set variables, no output if line.at(0) case_number = line.at(0) debtor = line.at(2).strip total_amount = line.at(6) next end # creditor, collect data , output, note our capture indices have moved . . . date = line.at(8) amount = line.at(12) fund_number = line.at(14) creditor = line.at(15) empty = " " csv << [case_number, debtor, total_amount, empty, date, amount, fund_number, creditor] end
your regexes little work, (such removing non-necessary captures), should started.
there may other approaches fit data input in cleaner way - instance line-by-line identification during input rather use .scan
- answer intended build on existing approach.
Comments
Post a Comment