excel - Parsing downloaded data to a simpler structure -
every month download data vendor of ours small in format not easy use lookup formulas on. read mess of cell references , hope looking right spot. best way read data , structure in image below. need read columns a:g 1 month , next month a:h have 12 months max ever, , structure work in report pictured in i2:k10,
the "location" may not have data in download vendor. locations changing. need download 30 of these small data ranges them put in bigger report. data pasted own sheet , pulled data on another. open vba suggestions cell formulas.
the different colors there show trying read , need written.
thanks,
-scheballs
this part 2 of answer introduces solution , contains main routine. part 3 contains sub routines. part 1 introduces techniques use in solution.
my solution requires macro's workbook contain 2 worksheets: 1 errors , 1 consolidated data. names of these workbooks defined constants can changed required.
i created number of csv files believe match format of download. typical example is:
1 caution: rates have not been adjusted patient mix 2 st anthony's hospital 3 jan 2013 - april 2013 location comparison based on 6 locations 4 cms qualified hcahps data service lines 5 communications medications composite results 6 location,jan 2013,feb 2013,mar 2013,apr 2013,composite rate,percentile 7 2e,70,72.22,64.62,81.82,72.17,49th 8 2s,60,62.22,54.62,71.82,62.17,39th 9 3n,78.57,83.33,66.67,nr,76.19,74th 10 3s,50,90,50,100,72.5,56th 11 4n,88.89,75,77.27,100,85.29,85th 12 icu/pcu,72.73,50,80,100,75.68,54th 13 14 st anthony's hospital,73.5,73.28,67.89,84.21,74.72,59th 15 vendor db % top box,72.29,72.86,73.58,75.17,73.48
the hospital names real although coincidence if ones of interest you. questions believe correct. locations , data imaginary.
my code thoroughly checks format of csv file because have been caught authors changing format of such files without warning. gross changes might crash macro minor changes can go unnoticed months.
checks include matching date range of row 3 against idnividal dates on row 6. failed check result in message in error worksheet. checks result in file being rejected. however, fatal error 2 csv files have different date ranges.
i had planned create consolidated worksheet based on data found. however, use absolute addresses copy values reporting worksheets not want data move month month depending on locations included in csv file. instead have created fixed layout:
hospital names in column 1. name must against first location hospital optional susequent rows. no doubt pick 1 style or other mixed styles tests. csv file hospital name other 1 of in listed here rejected.
locations in column 2. there no significance sequence of location except final row must total/average/summary. have used "total" row title can change anything. not every location listed here need appear in csv file if csv file contains unexpected location, rejected.
the questions listed a3 onwards. csv file containing question not listed here rejected.
the initial contents of data area of worksheet not matter because cleared macro.
after running macro, worksheet might this. gaps mean have no test data hospital/questions:
i believe comments within code sufficient change match format of csv files if different guess.
this code designed in own module. code not rely on in demo macros. luck.
option explicit ' constants convenient way of defining values not change ' during run of macro. particular suitable for: ' (1) replacing numbers meaningful name. if column 5 used ' names, say, using colname instead of 5 helps document macro. ' (2) values used in several places , might change. when ' change, 1 amendment sufficient update macro. const colconsolhosp long = 1 '\ const colconsollocn long = 2 '| if columns of consolidate const colconsolquestfirst long = 3 '| worksheet rearranged, these const colconsolquestlast long = 12 '/ valuesmust ajusted match. const colerrortime long = 1 const colerrorfile long = 2 const colerrorrow long = 3 const colerrorcol long = 4 const colerrormsg long = 5 const fmtdate string = "dmmmyy" const fmtdatetime string = "dmmmyy hh:mm" const wkshtnameconsol string = "consolidate" '\ change if require output const wkshtnameerror string = "error" '/ different worksheets. sub consolidate() dim cellvalueconsol() variant ' cell values used range ' of consoldate worksheet dim colsrccompositerate long ' column hold composite rate dim colconsolcrnt long dim datestartall date dim datestartcrnt date dim dateendall date dim dateendcrnt date dim errmsg string dim filecellvaluesrc() variant ' value of usedrange each csv file dim fileerror() boolean ' error state each file dim fileinxhosp() long ' hospital each csv file dim fileinxquest() long ' question each csv file dim filename() string ' name each csv file dim filesysobj object dim fileobj object dim folderobj object dim found boolean dim hospname() variant ' names of hospitals dim hospnamecrnt string dim inxfilecrnt long dim inxfiledate long dim inxhospcrnt long dim inxlocncrnt long dim inxquestcrnt long dim locn() variant ' locations each hosital dim numcsvfile long ' number of csv files dim numhosps long dim nummonthsdata long dim pathname string dim quest variant ' array of questions dim rowconsolcrnt long dim rowconsolhospfirst() long ' first row each hospital ' within consolidate worksheet dim rowconsoltemp long dim rowerrorcrnt long dim rowsrccrnt long dim wkbksrc workbook application.screenupdating = false ' reduces screen flash , increases speed ' load csv files ' ============== pathname = application.thisworkbook.path set filesysobj = createobject("scripting.filesystemobject") set folderobj = filesysobj.getfolder(pathname) numcsvfile = 0 ' loop through files count number of csv files each fileobj in folderobj.files if lcase(right(fileobj.name, 4)) = ".csv" numcsvfile = numcsvfile + 1 end if next ' size arrays holding data per file redim filecellvaluesrc(1 numcsvfile) redim fileerror(1 numcsvfile) redim fileinxhosp(1 numcsvfile) redim fileinxquest(1 numcsvfile) redim filename(1 numcsvfile) inxfilecrnt = 0 ' loop through files save names , cell values. each fileobj in folderobj.files if lcase(right(fileobj.name, 4)) = ".csv" inxfilecrnt = inxfilecrnt + 1 filename(inxfilecrnt) = fileobj.name set wkbksrc = workbooks.open(pathname & "\" & fileobj.name) filecellvaluesrc(inxfilecrnt) = wkbksrc.activesheet.usedrange wkbksrc.close ' close csv file end if next ' release resources set filesysobj = nothing set folderobj = nothing ' extract controlling values consolidate worksheet ' ===================================================== worksheets(wkshtnameconsol) cellvalueconsol = .usedrange.value end 'debug.print ubound(cellvalueconsol, 1) 'debug.print ubound(cellvalueconsol, 2) ' code assumes single header row consisting of: ' hospital location question1 question2 ... ' appropriate names in first 2 columns. cells under ' questions overwritten. ' these columns accessed using constants. limited variation ' achieved within amending code changing constants. ' execution stop @ debug.assert statement if expression has ' value of false. easy way of confirming worksheet ' expected. if user might change format of output worksheet, ' should replaced msgbox statement. debug.assert cellvalueconsol(1, colconsolhosp) = "hospital" debug.assert cellvalueconsol(1, colconsollocn) = "location" ' count number of hospitals. ' code assumes locations hospital , start @ ' row 2. hospital name may repeated or may blank on second , ' subsequent rows hospital. is, following acceptable: ' hospitala x ' hospitala y ' hospitala z ' hospitalb x ' y ' z ' count number of hospitals hospnamecrnt = cellvalueconsol(2, colconsolhosp) numhosps = 1 rowconsolcrnt = 3 ubound(cellvalueconsol, 1) if cellvalueconsol(rowconsolcrnt, colconsolhosp) <> hospnamecrnt , _ cellvalueconsol(rowconsolcrnt, colconsolhosp) <> "" numhosps = numhosps + 1 hospnamecrnt = cellvalueconsol(rowconsolcrnt, colconsolhosp) end if next 'debug.print numhosps ' size hospname, locn , rowconsolhospfirst number of hospitals redim hospname(1 numhosps) redim locn(1 numhosps) redim rowconsolhospfirst(1 numhosps) ' load hospital , location arrays inxhospcrnt = 1 hospnamecrnt = cellvalueconsol(2, colconsolhosp) hospname(inxhospcrnt) = hospnamecrnt rowconsolhospfirst(inxhospcrnt) = 2 rowconsolcrnt = 3 ubound(cellvalueconsol, 1) if cellvalueconsol(rowconsolcrnt, colconsolhosp) <> hospnamecrnt , _ cellvalueconsol(rowconsolcrnt, colconsolhosp) <> "" ' load locations worksheet location array call extractsubarray(cellvalueconsol, locn(inxhospcrnt), _ rowconsolhospfirst(inxhospcrnt), colconsollocn, _ rowconsolcrnt - 1, colconsollocn) hospnamecrnt = cellvalueconsol(rowconsolcrnt, colconsolhosp) inxhospcrnt = inxhospcrnt + 1 hospname(inxhospcrnt) = hospnamecrnt rowconsolhospfirst(inxhospcrnt) = rowconsolcrnt end if next ' load locations final hospital worksheet location array call extractsubarray(cellvalueconsol, locn(inxhospcrnt), _ rowconsolhospfirst(inxhospcrnt), colconsollocn, _ ubound(cellvalueconsol, 1), colconsollocn) ' load questions call extractsubarray(cellvalueconsol, quest, _ 1, colconsolquestfirst, _ 1, colconsolquestlast) ' clear data area of consolidate worksheet ' ======================================= rowconsolcrnt = 2 ubound(cellvalueconsol, 1) colconsolcrnt = colconsolquestfirst colconsolquestlast cellvalueconsol(rowconsolcrnt, colconsolcrnt) = "" next next ' prepare error worksheet '======================== worksheets(wkshtnameerror) .cells.entirerow.delete .cells(1, colerrortime).value = "time" .cells(1, colerrorfile) .value = "file" .columnwidth = 71.71 end .cells(1, colerrorrow) .value = "row" .horizontalalignment = xlright .columnwidth = 4 end .cells(1, colerrorcol) .value = "col" .horizontalalignment = xlright .columnwidth = 4 end .cells(1, colerrormsg) .value = "error" .columnwidth = 71.71 end end rowerrorcrnt = 1 ' validate csv files , extract key information ' ================================================== inxfiledate = -1 'date range not yet found nummonthsdata = 0 inxfilecrnt = 1 ubound(filename) fileerror(inxfilecrnt) = false ' no error found file if isempty(filecellvaluesrc(inxfilecrnt)) ' csv file empty call recorderror(filename(inxfilecrnt), 0, 0, _ "empty csv file", rowerrorcrnt) fileerror(inxfilecrnt) = true ' csv file ignored elseif vartype(filecellvaluesrc(inxfilecrnt)) = vbstring ' csv file contained single value call recorderror(filename(inxfilecrnt), 0, 0, _ "csv file contains single string", rowerrorcrnt) fileerror(inxfilecrnt) = true ' csv file ignored else ' remaining format returned range ' array ' check cells contain values expected. ' checking code has been placed in subroutines. keeps code ' in main routine clean , simple , allows subroutines ' copied new workbooks macros performing similar tasks. ' check cell a1 = "caution: rates have not been adjusted patient mix" call checkcellvaluesingle(filename(inxfilecrnt), _ filecellvaluesrc(inxfilecrnt), fileerror(inxfilecrnt), _ 1, 1, _ "caution: rates have not been adjusted patient mix", _ rowerrorcrnt) ' check cell a2 known hospital. save inxhosp against file call checkcellvaluemultiple(filename(inxfilecrnt), _ filecellvaluesrc(inxfilecrnt), _ fileerror(inxfilecrnt), 2, 1, hospname, _ fileinxhosp(inxfilecrnt), rowerrorcrnt) ' check cell a3 is: date - date location comparison based on n locations call checkdaterangelocn(filename(inxfilecrnt), _ filecellvaluesrc(inxfilecrnt), _ fileerror(inxfilecrnt), 3, 1, _ datestartcrnt, dateendcrnt, rowerrorcrnt) ' save datestartcrnt , datendcrnt or check same ' saved values if inxfiledate = -1 ' first set of dates datestartall = datestartcrnt dateendall = dateendcrnt inxfiledate = inxfilecrnt ' first file found these dates else if datestartall = datestartcrnt , dateendall = dateendcrnt ' date range csv file matches of previous files else call recorderror(filename(inxfilecrnt), 3, 1, _ "**fatal error**: date ranges not match:" & vblf & _ format(datestartall, fmtdate) & " - " & _ format(dateendall, fmtdate) & " " & _ filename(inxfiledate) & vblf & _ format(datestartcrnt, fmtdate) & " - " & _ format(dateendcrnt, fmtdate) & " " & _ filename(inxfilecrnt), rowerrorcrnt) ' there incompatible csv files. fatal error. give up. exit sub end if end if ' check cell a4 = "cms qualified hcahps data service lines" call checkcellvaluesingle(filename(inxfilecrnt), _ filecellvaluesrc(inxfilecrnt), _ fileerror(inxfilecrnt), 4, 1, _ "cms qualified hcahps data service lines", _ rowerrorcrnt) ' check cell a5 = question " composite results" if not checkbound(filecellvaluesrc(inxfilecrnt), 5, 1, errmsg) call recorderror(filename(inxfilecrnt), 5, 1, errmsg, rowerrorcrnt) fileerror(inxfilecrnt) = true else fileinxquest(inxfilecrnt) = -1 ' no match against question inxquestcrnt = 1 ubound(quest) if filecellvaluesrc(inxfilecrnt)(5, 1) = _ quest(inxquestcrnt) & " composite results" fileinxquest(inxfilecrnt) = inxquestcrnt exit end if next if fileinxquest(inxfilecrnt) = -1 ' no match found fileerror(inxfilecrnt) = true call recorderror(filename(inxfilecrnt), 5, 1, """" & _ filecellvaluesrc(inxfilecrnt)(5, 1) & _ """ not match known question", rowerrorcrnt) end if end if ' check cell a6 is: "location" call checkcellvaluesingle(filename(inxfilecrnt), _ filecellvaluesrc(inxfilecrnt), _ fileerror(inxfilecrnt), 6, 1, "location", _ rowerrorcrnt) ' check cells b6 x6 1st day of month ' datestartall dateendall call checkdatesequence(filename(inxfilecrnt), _ filecellvaluesrc(inxfilecrnt), _ fileerror(inxfilecrnt), 6, 2, datestartall, _ dateendall, "a", "m", rowerrorcrnt) ' check cells y6 "composite rate" if not fileerror(inxfilecrnt) ' data range not guaranteed until file error free nummonthsdata = datediff("m", datestartall, dateendall) + 1 colsrccompositerate = nummonthsdata + 2 call checkcellvaluesingle(filename(inxfilecrnt), _ filecellvaluesrc(inxfilecrnt), _ fileerror(inxfilecrnt), 6, colsrccompositerate, _ "composite rate", rowerrorcrnt) end if if not fileerror(inxfilecrnt) ' row 7 down first empty column a, check column contains ' known location , colsrccompositerate numeric. rowsrccrnt = 7 inxhospcrnt = fileinxhosp(inxfilecrnt) while true if not checkbound(filecellvaluesrc(inxfilecrnt), _ rowsrccrnt, 1, errmsg) ' row not present call recorderror(filename(inxfilecrnt), rowsrccrnt, 1, _ errmsg, rowerrorcrnt) fileerror(inxfilecrnt) = true exit end if if not checkbound(filecellvaluesrc(inxfilecrnt), _ rowsrccrnt, colsrccompositerate, errmsg) ' composite rate missing call recorderror(filename(inxfilecrnt), rowsrccrnt, _ colsrccompositerate, errmsg, rowerrorcrnt) fileerror(inxfilecrnt) = true exit elseif not isnumeric(filecellvaluesrc(inxfilecrnt)(rowsrccrnt, _ colsrccompositerate)) ' composite rate not numeric call recorderror(filename(inxfilecrnt), rowsrccrnt, _ colsrccompositerate, "composite rate """ & _ filecellvaluesrc(inxfilecrnt)(rowsrccrnt, _ colsrccompositerate) & """ not numeric", _ rowerrorcrnt) end if if filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) = "" ' end of location list within file exit end if found = false inxlocncrnt = 1 ubound(locn(inxhospcrnt)) if filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) = _ locn(inxhospcrnt)(inxlocncrnt) ' location csv file found in list consolidate worksheet found = true exit end if next if not found call recorderror(filename(inxfilecrnt), rowsrccrnt, 1, _ "location """ & _ filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) & _ """ not found in list worksheet """ & _ wkshtnameconsol & """", rowerrorcrnt) fileerror(inxfilecrnt) = true end if rowsrccrnt = rowsrccrnt + 1 loop end if if not fileerror(inxfilecrnt) ' row rowsrccrnt have blank column 1 rowsrccrnt = rowsrccrnt + 1 ' check column total line hospital call checkcellvaluesingle(filename(inxfilecrnt), _ filecellvaluesrc(inxfilecrnt), _ fileerror(inxfilecrnt), rowsrccrnt, 1, _ hospname(fileinxhosp(inxfilecrnt)), _ rowerrorcrnt) if not checkbound(filecellvaluesrc(inxfilecrnt), _ rowsrccrnt, colsrccompositerate, errmsg) ' composite rate missing call recorderror(filename(inxfilecrnt), rowsrccrnt, _ colsrccompositerate, errmsg, rowerrorcrnt) fileerror(inxfilecrnt) = true elseif not isnumeric(filecellvaluesrc(inxfilecrnt)(rowsrccrnt, _ colsrccompositerate)) ' composite rate not numeric call recorderror(filename(inxfilecrnt), rowsrccrnt, _ colsrccompositerate, "composite rate """ & _ filecellvaluesrc(inxfilecrnt)(rowsrccrnt, _ colsrccompositerate) & """ not numeric", _ rowerrorcrnt) end if end if end if next inxfilecrnt ' if here there has not been fatal error although 1 or more ' individual files may have been rejected. inxfilecrnt = 1 ubound(filename) if not fileerror(inxfilecrnt) ' no error has been found in file inxhospcrnt = fileinxhosp(inxfilecrnt) inxquestcrnt = fileinxquest(inxfilecrnt) colconsolcrnt = 2 + inxquestcrnt rowsrccrnt = 7 ' first location row while true if filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) = "" ' end of location list within file exit end if inxlocncrnt = 1 ubound(locn(inxhospcrnt)) if filecellvaluesrc(inxfilecrnt)(rowsrccrnt, 1) = _ locn(inxhospcrnt)(inxlocncrnt) ' location csv file found in list consolidate worksheet rowconsolcrnt = rowconsolhospfirst(inxhospcrnt) + inxlocncrnt - 1 cellvalueconsol(rowconsolcrnt, colconsolcrnt) = _ filecellvaluesrc(inxfilecrnt)(rowsrccrnt, colsrccompositerate) exit end if next rowsrccrnt = rowsrccrnt + 1 loop rowsrccrnt = rowsrccrnt + 1 ' advance hospital total line ' assume last location row total rowconsolcrnt = rowconsolhospfirst(inxhospcrnt) + _ ubound(locn(inxhospcrnt)) - 1 cellvalueconsol(rowconsolcrnt, colconsolcrnt) = _ filecellvaluesrc(inxfilecrnt)(rowsrccrnt, colsrccompositerate) end if next ' write new values consolidate worksheet ' ============================================== worksheets(wkshtnameconsol) .usedrange.value = cellvalueconsol end end sub
Comments
Post a Comment