numpy - Pandas: fancy indexing a dataframe -
i have pandas dataframe, df1, year-long 5 minute timeseries columns a-z.
df1.shape (105121, 26) df1.index <class 'pandas.tseries.index.datetimeindex'> [2002-01-02 00:00:00, ..., 2003-01-02 00:00:00] length: 105121, freq: 5t, timezone: none i have second dataframe, df2, year-long daily timeseries (over same period) matching columns. values of second frame booleans.
df2.shape (365, 26) df2.index <class 'pandas.tseries.index.datetimeindex'> [2002-01-02 00:00:00, ..., 2003-01-01 00:00:00] length: 365, freq: d, timezone: none i want use df2 fancy index df1, i.e. "df1.ix[df2]" or somesuch, such subset of df1's columns each date -- i.e. df2 says true on date (with timestamps thereon). shape of result should (105121, width), width number of distinct columns booleans imply (width<=26).
currently, df1.ix[df2] partially works. 00:00 values each day picked out, makes sense in light of df2's 'point-like' time series.
i next tried time spans df2 index:
df2.index periodindex: 365 entries, 2002-01-02 2003-01-01 this time, error:
/home/wchapman/.local/lib/python2.7/site-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/index.pyc in get_indexer(self, target, method, limit) 844 = self.astype(object) 845 target = target.astype(object) --> 846 return this.get_indexer(target, method=method, limit=limit) 847 848 if not self.is_unique: attributeerror: 'numpy.ndarray' object has no attribute 'get_indexer' my interim solution loop date, seems inefficient. pandas capable of kind of fancy indexing? don't see examples anywhere in documentation.
here's 1 way this:
t_index = df1.index d_index = df2.index mask = t_index.map(lambda t: t.date() in d_index) df1[mask] and faster (but same idea) use:
mask = pd.to_datetime([datetime.date(*t_tuple) t_tuple in zip(t_index.year, t_index.month, t_index.day)]).isin(d_index)
Comments
Post a Comment