numpy - Pandas: fancy indexing a dataframe -

i have pandas dataframe, df1, year-long 5 minute timeseries columns a-z.

df1.shape (105121, 26) df1.index <class 'pandas.tseries.index.datetimeindex'> [2002-01-02 00:00:00, ..., 2003-01-02 00:00:00] length: 105121, freq: 5t, timezone: none

i have second dataframe, df2, year-long daily timeseries (over same period) matching columns. values of second frame booleans.

df2.shape (365, 26) df2.index <class 'pandas.tseries.index.datetimeindex'> [2002-01-02 00:00:00, ..., 2003-01-01 00:00:00] length: 365, freq: d, timezone: none

i want use df2 fancy index df1, i.e. "df1.ix[df2]" or somesuch, such subset of df1's columns each date -- i.e. df2 says true on date (with timestamps thereon). shape of result should (105121, width), width number of distinct columns booleans imply (width<=26).

currently, df1.ix[df2] partially works. 00:00 values each day picked out, makes sense in light of df2's 'point-like' time series.

i next tried time spans df2 index:

df2.index periodindex: 365 entries, 2002-01-02 2003-01-01

this time, error:

/home/wchapman/.local/lib/python2.7/site-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/index.pyc in get_indexer(self, target, method, limit)     844             = self.astype(object)     845             target = target.astype(object) --> 846             return this.get_indexer(target, method=method, limit=limit)     847      848         if not self.is_unique:  attributeerror: 'numpy.ndarray' object has no attribute 'get_indexer'

my interim solution loop date, seems inefficient. pandas capable of kind of fancy indexing? don't see examples anywhere in documentation.

here's 1 way this:

t_index = df1.index d_index = df2.index mask = t_index.map(lambda t: t.date() in d_index) df1[mask]

and faster (but same idea) use:

mask = pd.to_datetime([datetime.date(*t_tuple)                            t_tuple in zip(t_index.year,                                               t_index.month,                                               t_index.day)]).isin(d_index)

Search This Blog

New Mian

numpy - Pandas: fancy indexing a dataframe -

Comments

Post a Comment