ruby - Why don't my regex scans work? -


i have script scans text file , puts csv file. grabs debtor information, puts creditor information following it.

the problem is, gets each debtor puts same creditor information each debtor, it's not getting new information below debtor:

fastercsv.open('data.csv', 'a') |csv|    debtor_info = results.scan(/^(\d{2}\-\d{5})(\s+)(.*)(\s+)(total:)(\s+)(\$(\d+\,?   \.?)+)/)     debtor_info.each |line|    case_number = line.at(0)    debtor = line.at(2).strip    total_amount = line.at(6)    csv << [case_number, debtor, total_amount]      creditor_info = results.scan(/((\d{1,2})\/(\d{1,2})\/(\d{1,4}))\s+(\$(\d+\,?\.?)+)\s+(\d{1,5}bk)\s+(.*)/)   creditor_info.each |info|        date = info.at(0)        amount = info.at(4)        fund_number = info.at(6)        creditor = info.at(7)        empty = " "        csv << [empty, date, amount, fund_number, creditor]       end  end end 

this sample input:

00-000##     company inc                            total: $3,134.55    2/25/2003       $416.02    0000bk       comp inc   2/25/2003       $105.60    0000bk       california imprinted apparel    2/25/2003        $58.41    0000bk       john doe    2/25/2003        $33.41    0000bk       e doe & assoc   2/25/2003        $78.28    0000bk       candle candles    2/25/2003        $44.74    0000bk       personnel svcs   2/25/2003        $28.34    0000bk       jane doe    2/25/2003        $32.77    0000bk       water co    2/25/2003       $141.21    0000bk       xyx   2/25/2003       $250.96    0000bk       pdq inc    2/25/2003       $146.17    0000bk       rs fm   2/25/2003       $722.91    0000bk       corp    2/25/2003       $841.14    0000bk       bac corp   2/25/2003       $202.57    0000bk       abc communications    2/25/2003        $32.02    0000bk       yxy sa corp  00-00128     may june                                           total: $29.60   6/26/2002        $29.60    0000bk       may june  00-00653     joe doey                                             total: $347.10    7/10/2002        $59.62    0000bk       financial corp   7/10/2002       $287.48    0000bk       abc corp  00-00657     thomas p public                                        total: $1,096.75    7/2/2003     $1,096.75    0000bk       contract svc  00-00735     jean jane                                            total: $29.89    6/18/2003        $29.89    0000bk       jean jane 

with given structure, need scan either creditor or debtor line single list, work through it, setting "current debtor" whenever encounter one.

assuming results slurp of input file (you don't say, looks likely):

combined_info = results.scan(/^(\d{2}\-\d{5})(\s+)(.*)(\s+)(total:)(\s+)(\$(\d+\,?\d+\.?)+)|((\d{1,2})\/(\d{1,2})\/(\d{1,4}))\s+(\$(\d+\,?\.?)+)\s+(\d{1,5}bk)\s+(.*)/) case_number = "unknown" debtor = "unknown" total_amount = "unknown"  combined_info.each |line|   # if it's debtor, set variables, no output   if line.at(0)     case_number = line.at(0)     debtor = line.at(2).strip     total_amount = line.at(6)     next   end    # creditor, collect data , output, note our capture indices have moved . . .   date = line.at(8)   amount = line.at(12)   fund_number = line.at(14)   creditor = line.at(15)   empty = " "   csv << [case_number, debtor, total_amount, empty, date, amount, fund_number, creditor] end 

your regexes little work, (such removing non-necessary captures), should started.

there may other approaches fit data input in cleaner way - instance line-by-line identification during input rather use .scan - answer intended build on existing approach.


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -