excel - Parsing downloaded data to a simpler structure -


every month download data vendor of ours small in format not easy use lookup formulas on. read mess of cell references , hope looking right spot. best way read data , structure in image below. need read columns a:g 1 month , next month a:h have 12 months max ever, , structure work in report pictured in i2:k10,

the "location" may not have data in download vendor. locations changing. need download 30 of these small data ranges them put in bigger report. data pasted own sheet , pulled data on another. open vba suggestions cell formulas.

the different colors there show trying read , need written.

thanks,

-scheballs enter image description here

this part 2 of answer introduces solution , contains main routine. part 3 contains sub routines. part 1 introduces techniques use in solution.

my solution requires macro's workbook contain 2 worksheets: 1 errors , 1 consolidated data. names of these workbooks defined constants can changed required.

i created number of csv files believe match format of download. typical example is:

 1   caution: rates have not been adjusted patient mix  2   st anthony's hospital  3   jan 2013 - april 2013 location comparison based on 6 locations  4   cms qualified hcahps data service lines  5   communications medications composite results  6   location,jan 2013,feb 2013,mar 2013,apr 2013,composite rate,percentile  7   2e,70,72.22,64.62,81.82,72.17,49th  8   2s,60,62.22,54.62,71.82,62.17,39th  9   3n,78.57,83.33,66.67,nr,76.19,74th 10   3s,50,90,50,100,72.5,56th 11   4n,88.89,75,77.27,100,85.29,85th 12   icu/pcu,72.73,50,80,100,75.68,54th 13 14   st anthony's hospital,73.5,73.28,67.89,84.21,74.72,59th 15   vendor db % top box,72.29,72.86,73.58,75.17,73.48 

the hospital names real although coincidence if ones of interest you. questions believe correct. locations , data imaginary.

my code thoroughly checks format of csv file because have been caught authors changing format of such files without warning. gross changes might crash macro minor changes can go unnoticed months.

checks include matching date range of row 3 against idnividal dates on row 6. failed check result in message in error worksheet. checks result in file being rejected. however, fatal error 2 csv files have different date ranges.

i had planned create consolidated worksheet based on data found. however, use absolute addresses copy values reporting worksheets not want data move month month depending on locations included in csv file. instead have created fixed layout:

consolidate worksheet before updating

hospital names in column 1. name must against first location hospital optional susequent rows. no doubt pick 1 style or other mixed styles tests. csv file hospital name other 1 of in listed here rejected.

locations in column 2. there no significance sequence of location except final row must total/average/summary. have used "total" row title can change anything. not every location listed here need appear in csv file if csv file contains unexpected location, rejected.

the questions listed a3 onwards. csv file containing question not listed here rejected.

the initial contents of data area of worksheet not matter because cleared macro.

after running macro, worksheet might this. gaps mean have no test data hospital/questions:

consolidate worksheet after updating

i believe comments within code sufficient change match format of csv files if different guess.

this code designed in own module. code not rely on in demo macros. luck.

option explicit  ' constants convenient way of defining values not change ' during run of macro.  particular suitable for: '  (1) replacing numbers meaningful name.  if column 5 used '      names, say, using colname instead of 5 helps document macro. '  (2) values used in several places , might change. when '      change, 1 amendment sufficient update macro. const colconsolhosp long = 1        '\ const colconsollocn long = 2        '| if columns of consolidate const colconsolquestfirst long = 3  '| worksheet rearranged, these const colconsolquestlast long = 12  '/ valuesmust ajusted match. const colerrortime long = 1 const colerrorfile long = 2 const colerrorrow long = 3 const colerrorcol long = 4 const colerrormsg long = 5 const fmtdate string = "dmmmyy" const fmtdatetime string = "dmmmyy hh:mm" const wkshtnameconsol string = "consolidate" '\ change if require output const wkshtnameerror string = "error"        '/ different worksheets. sub consolidate()    dim cellvalueconsol() variant      ' cell values used range                                         ' of consoldate worksheet   dim colsrccompositerate long       ' column hold composite rate   dim colconsolcrnt long   dim datestartall date   dim datestartcrnt date   dim dateendall date   dim dateendcrnt date   dim errmsg string   dim filecellvaluesrc() variant     ' value of usedrange each csv file   dim fileerror() boolean            ' error state each file   dim fileinxhosp() long             ' hospital each csv file   dim fileinxquest() long            ' question each csv file   dim filename() string              ' name each csv file   dim filesysobj object   dim fileobj object   dim folderobj object   dim found boolean   dim hospname() variant             ' names of hospitals   dim hospnamecrnt string   dim inxfilecrnt long   dim inxfiledate long   dim inxhospcrnt long   dim inxlocncrnt long   dim inxquestcrnt long   dim locn() variant                 ' locations each hosital   dim numcsvfile long                ' number of csv files   dim numhosps long   dim nummonthsdata long   dim pathname string   dim quest variant                  ' array of questions   dim rowconsolcrnt long   dim rowconsolhospfirst() long      ' first row each hospital                                         ' within consolidate worksheet   dim rowconsoltemp long   dim rowerrorcrnt long   dim rowsrccrnt long   dim wkbksrc workbook    application.screenupdating = false        ' reduces screen flash , increases speed    ' load csv files   ' ==============    pathname = application.thisworkbook.path    set filesysobj = createobject("scripting.filesystemobject")   set folderobj = filesysobj.getfolder(pathname)    numcsvfile = 0    ' loop through files count number of csv files   each fileobj in folderobj.files     if lcase(right(fileobj.name, 4)) = ".csv"       numcsvfile = numcsvfile + 1     end if   next    ' size arrays holding data per file   redim filecellvaluesrc(1 numcsvfile)   redim fileerror(1 numcsvfile)   redim fileinxhosp(1 numcsvfile)   redim fileinxquest(1 numcsvfile)   redim filename(1 numcsvfile)    inxfilecrnt = 0    ' loop through files save names , cell values.   each fileobj in folderobj.files     if lcase(right(fileobj.name, 4)) = ".csv"       inxfilecrnt = inxfilecrnt + 1       filename(inxfilecrnt) = fileobj.name       set wkbksrc = workbooks.open(pathname & "\" & fileobj.name)       filecellvaluesrc(inxfilecrnt) = wkbksrc.activesheet.usedrange       wkbksrc.close   ' close csv file     end if   next    ' release resources   set filesysobj = nothing   set folderobj = nothing    ' extract controlling values consolidate worksheet   ' =====================================================    worksheets(wkshtnameconsol)     cellvalueconsol = .usedrange.value   end    'debug.print ubound(cellvalueconsol, 1)   'debug.print ubound(cellvalueconsol, 2)    ' code assumes single header row consisting of:   '   hospital  location  question1  question2 ...   ' appropriate names in first 2 columns.  cells under   ' questions overwritten.   ' these columns accessed using constants.  limited variation   ' achieved within amending code changing constants.    ' execution stop @ debug.assert statement if expression has   ' value of false. easy way of confirming worksheet   ' expected. if user might change format of output worksheet,   ' should replaced msgbox statement.   debug.assert cellvalueconsol(1, colconsolhosp) = "hospital"   debug.assert cellvalueconsol(1, colconsollocn) = "location"    ' count number of hospitals.   ' code assumes locations hospital , start @   ' row 2. hospital name may repeated or may blank on second ,   ' subsequent rows hospital.  is, following acceptable:   '   hospitala   x   '   hospitala   y   '   hospitala   z   '   hospitalb   x   '               y   '               z    ' count number of hospitals   hospnamecrnt = cellvalueconsol(2, colconsolhosp)   numhosps = 1   rowconsolcrnt = 3 ubound(cellvalueconsol, 1)     if cellvalueconsol(rowconsolcrnt, colconsolhosp) <> hospnamecrnt , _        cellvalueconsol(rowconsolcrnt, colconsolhosp) <> ""       numhosps = numhosps + 1       hospnamecrnt = cellvalueconsol(rowconsolcrnt, colconsolhosp)     end if   next    'debug.print numhosps    ' size hospname, locn , rowconsolhospfirst number of hospitals   redim hospname(1 numhosps)   redim locn(1 numhosps)   redim rowconsolhospfirst(1 numhosps)    ' load hospital , location arrays   inxhospcrnt = 1   hospnamecrnt = cellvalueconsol(2, colconsolhosp)   hospname(inxhospcrnt) = hospnamecrnt   rowconsolhospfirst(inxhospcrnt) = 2   rowconsolcrnt = 3 ubound(cellvalueconsol, 1)     if cellvalueconsol(rowconsolcrnt, colconsolhosp) <> hospnamecrnt , _        cellvalueconsol(rowconsolcrnt, colconsolhosp) <> ""       ' load locations worksheet location array       call extractsubarray(cellvalueconsol, locn(inxhospcrnt), _                            rowconsolhospfirst(inxhospcrnt), colconsollocn, _                            rowconsolcrnt - 1, colconsollocn)       hospnamecrnt = cellvalueconsol(rowconsolcrnt, colconsolhosp)       inxhospcrnt = inxhospcrnt + 1       hospname(inxhospcrnt) = hospnamecrnt       rowconsolhospfirst(inxhospcrnt) = rowconsolcrnt     end if   next   ' load locations final hospital worksheet location array   call extractsubarray(cellvalueconsol, locn(inxhospcrnt), _                   rowconsolhospfirst(inxhospcrnt), colconsollocn, _                   ubound(cellvalueconsol, 1), colconsollocn)    ' load questions   call extractsubarray(cellvalueconsol, quest, _                   1, colconsolquestfirst, _                   1, colconsolquestlast)    ' clear data area of consolidate worksheet   ' =======================================    rowconsolcrnt = 2 ubound(cellvalueconsol, 1)     colconsolcrnt = colconsolquestfirst colconsolquestlast       cellvalueconsol(rowconsolcrnt, colconsolcrnt) = ""     next   next    ' prepare error worksheet   '========================    worksheets(wkshtnameerror)     .cells.entirerow.delete     .cells(1, colerrortime).value = "time"     .cells(1, colerrorfile)       .value = "file"       .columnwidth = 71.71     end     .cells(1, colerrorrow)       .value = "row"       .horizontalalignment = xlright       .columnwidth = 4     end     .cells(1, colerrorcol)       .value = "col"       .horizontalalignment = xlright       .columnwidth = 4     end     .cells(1, colerrormsg)       .value = "error"       .columnwidth = 71.71     end   end   rowerrorcrnt = 1    ' validate csv files , extract key information   ' ==================================================    inxfiledate = -1     'date range not yet found   nummonthsdata = 0    inxfilecrnt = 1 ubound(filename)     fileerror(inxfilecrnt) = false      ' no error found file     if isempty(filecellvaluesrc(inxfilecrnt))       ' csv file empty       call recorderror(filename(inxfilecrnt), 0, 0, _                                               "empty csv file", rowerrorcrnt)       fileerror(inxfilecrnt) = true     ' csv file ignored     elseif vartype(filecellvaluesrc(inxfilecrnt)) = vbstring       ' csv file contained single value       call recorderror(filename(inxfilecrnt), 0, 0, _                            "csv file contains single string", rowerrorcrnt)       fileerror(inxfilecrnt) = true     ' csv file ignored     else       ' remaining format returned range       ' array        ' check cells contain values expected.       ' checking code has been placed in subroutines.  keeps code       ' in main routine clean , simple , allows subroutines       ' copied new workbooks macros performing similar tasks.        ' check cell a1 = "caution: rates have not been adjusted patient mix"       call checkcellvaluesingle(filename(inxfilecrnt), _                      filecellvaluesrc(inxfilecrnt), fileerror(inxfilecrnt), _                      1, 1, _                      "caution: rates have not been adjusted patient mix", _                      rowerrorcrnt)        ' check cell a2 known hospital.  save inxhosp against file       call checkcellvaluemultiple(filename(inxfilecrnt), _                                   filecellvaluesrc(inxfilecrnt), _                                   fileerror(inxfilecrnt), 2, 1, hospname, _                                   fileinxhosp(inxfilecrnt), rowerrorcrnt)        ' check cell a3 is: date - date location comparison based on n locations       call checkdaterangelocn(filename(inxfilecrnt), _                                      filecellvaluesrc(inxfilecrnt), _                                      fileerror(inxfilecrnt), 3, 1, _                                      datestartcrnt, dateendcrnt, rowerrorcrnt)        ' save datestartcrnt , datendcrnt or check same       ' saved values       if inxfiledate = -1         ' first set of dates         datestartall = datestartcrnt         dateendall = dateendcrnt         inxfiledate = inxfilecrnt  ' first file found these dates       else         if datestartall = datestartcrnt , dateendall = dateendcrnt           ' date range csv file matches of previous files         else         call recorderror(filename(inxfilecrnt), 3, 1, _                        "**fatal error**: date ranges not match:" & vblf & _                        format(datestartall, fmtdate) & " - " & _                        format(dateendall, fmtdate) & "  " & _                        filename(inxfiledate) & vblf & _                        format(datestartcrnt, fmtdate) & " - " & _                        format(dateendcrnt, fmtdate) & "  " & _                        filename(inxfilecrnt), rowerrorcrnt)           ' there incompatible csv files.  fatal error.  give up.           exit sub         end if       end if        ' check cell a4 = "cms qualified hcahps data service lines"       call checkcellvaluesingle(filename(inxfilecrnt), _                          filecellvaluesrc(inxfilecrnt), _                          fileerror(inxfilecrnt), 4, 1, _                          "cms qualified hcahps data service lines", _                          rowerrorcrnt)        ' check cell a5 = question " composite results"       if not checkbound(filecellvaluesrc(inxfilecrnt), 5, 1, errmsg)         call recorderror(filename(inxfilecrnt), 5, 1, errmsg, rowerrorcrnt)         fileerror(inxfilecrnt) = true       else         fileinxquest(inxfilecrnt) = -1     ' no match against question         inxquestcrnt = 1 ubound(quest)           if filecellvaluesrc(inxfilecrnt)(5, 1) = _              quest(inxquestcrnt) & " composite results"              fileinxquest(inxfilecrnt) = inxquestcrnt             exit           end if         next         if fileinxquest(inxfilecrnt) = -1           ' no match found           fileerror(inxfilecrnt) = true           call recorderror(filename(inxfilecrnt), 5, 1, """" & _                            filecellvaluesrc(inxfilecrnt)(5, 1) & _                            """ not match known question", rowerrorcrnt)         end if       end if        ' check cell a6 is: "location"       call checkcellvaluesingle(filename(inxfilecrnt), _                          filecellvaluesrc(inxfilecrnt), _                          fileerror(inxfilecrnt), 6, 1, "location", _                          rowerrorcrnt)       ' check cells b6 x6 1st day of month       '                                         datestartall dateendall       call checkdatesequence(filename(inxfilecrnt), _                              filecellvaluesrc(inxfilecrnt), _                              fileerror(inxfilecrnt), 6, 2, datestartall, _                              dateendall, "a", "m", rowerrorcrnt)       ' check cells y6 "composite rate"       if not fileerror(inxfilecrnt)         ' data range not guaranteed until file error free         nummonthsdata = datediff("m", datestartall, dateendall) + 1         colsrccompositerate = nummonthsdata + 2         call checkcellvaluesingle(filename(inxfilecrnt), _                            filecellvaluesrc(inxfilecrnt), _                            fileerror(inxfilecrnt), 6, colsrccompositerate, _                            "composite rate", rowerrorcrnt)       end if        if not fileerror(inxfilecrnt)         ' row 7 down first empty column a, check column contains         ' known location , colsrccompositerate numeric.         rowsrccrnt = 7         inxhospcrnt = fileinxhosp(inxfilecrnt)         while true           if not checkbound(filecellvaluesrc(inxfilecrnt), _                                                    rowsrccrnt, 1, errmsg)             ' row not present             call recorderror(filename(inxfilecrnt), rowsrccrnt, 1, _                                                          errmsg, rowerrorcrnt)             fileerror(inxfilecrnt) = true             exit           end if           if not checkbound(filecellvaluesrc(inxfilecrnt), _                                  rowsrccrnt, colsrccompositerate, errmsg)             ' composite rate missing             call recorderror(filename(inxfilecrnt), rowsrccrnt, _                                     colsrccompositerate, errmsg, rowerrorcrnt)             fileerror(inxfilecrnt) = true             exit           elseif not isnumeric(filecellvaluesrc(inxfilecrnt)(rowsrccrnt, _                                                      colsrccompositerate))             ' composite rate not numeric             call recorderror(filename(inxfilecrnt), rowsrccrnt, _                           colsrccompositerate, "composite rate """ & _                           filecellvaluesrc(inxfilecrnt)(rowsrccrnt, _                           colsrccompositerate) & """ not numeric", _                           rowerrorcrnt)           end if           if filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) = ""             ' end of location list within file             exit           end if           found = false           inxlocncrnt = 1 ubound(locn(inxhospcrnt))             if filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) = _                locn(inxhospcrnt)(inxlocncrnt)               ' location csv file found in list consolidate worksheet               found = true               exit             end if           next           if not found             call recorderror(filename(inxfilecrnt), rowsrccrnt, 1, _                               "location """ & _                               filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) & _                               """ not found in list worksheet """ & _                               wkshtnameconsol & """", rowerrorcrnt)             fileerror(inxfilecrnt) = true           end if           rowsrccrnt = rowsrccrnt + 1         loop       end if       if not fileerror(inxfilecrnt)         ' row rowsrccrnt have blank column 1         rowsrccrnt = rowsrccrnt + 1         ' check column total line hospital         call checkcellvaluesingle(filename(inxfilecrnt), _                            filecellvaluesrc(inxfilecrnt), _                            fileerror(inxfilecrnt), rowsrccrnt, 1, _                            hospname(fileinxhosp(inxfilecrnt)), _                            rowerrorcrnt)         if not checkbound(filecellvaluesrc(inxfilecrnt), _                                  rowsrccrnt, colsrccompositerate, errmsg)           ' composite rate missing           call recorderror(filename(inxfilecrnt), rowsrccrnt, _                                     colsrccompositerate, errmsg, rowerrorcrnt)           fileerror(inxfilecrnt) = true           elseif not isnumeric(filecellvaluesrc(inxfilecrnt)(rowsrccrnt, _                                                      colsrccompositerate))             ' composite rate not numeric             call recorderror(filename(inxfilecrnt), rowsrccrnt, _                           colsrccompositerate, "composite rate """ & _                           filecellvaluesrc(inxfilecrnt)(rowsrccrnt, _                           colsrccompositerate) & """ not numeric", _                           rowerrorcrnt)         end if       end if     end if   next inxfilecrnt    ' if here there has not been fatal error although 1 or more   ' individual files may have been rejected.    inxfilecrnt = 1 ubound(filename)     if not fileerror(inxfilecrnt)       ' no error has been found in file       inxhospcrnt = fileinxhosp(inxfilecrnt)       inxquestcrnt = fileinxquest(inxfilecrnt)       colconsolcrnt = 2 + inxquestcrnt       rowsrccrnt = 7        ' first location row       while true         if filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) = ""           ' end of location list within file           exit         end if         inxlocncrnt = 1 ubound(locn(inxhospcrnt))           if filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) = _              locn(inxhospcrnt)(inxlocncrnt)             ' location csv file found in list consolidate worksheet             rowconsolcrnt = rowconsolhospfirst(inxhospcrnt) + inxlocncrnt - 1             cellvalueconsol(rowconsolcrnt, colconsolcrnt) = _                  filecellvaluesrc(inxfilecrnt)(rowsrccrnt, colsrccompositerate)             exit           end if         next         rowsrccrnt = rowsrccrnt + 1       loop       rowsrccrnt = rowsrccrnt + 1  ' advance hospital total line       ' assume last location row total       rowconsolcrnt = rowconsolhospfirst(inxhospcrnt) + _                                                  ubound(locn(inxhospcrnt)) - 1       cellvalueconsol(rowconsolcrnt, colconsolcrnt) = _                  filecellvaluesrc(inxfilecrnt)(rowsrccrnt, colsrccompositerate)     end if   next    ' write new values consolidate worksheet   ' ==============================================    worksheets(wkshtnameconsol)     .usedrange.value = cellvalueconsol   end  end sub 

Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -