python - Efficient time series data extract -


i have problem in python not sure how solve efficient. have large set of time series data read in generator. of now, when call yield, each data given me 1 one. fine when each time series have same index, each start on same date , end on same date. problem when have set of time series data not have same start date, same end date.

what best implementation whereby when query, return values specific date. way not have worry start date. point in time.

i use pandas , have no clue how implement efficiently.

code use import csv file file:

def _open_convert_csv_files(self):      comb_index = none     s in self.symbol_list:         print s         # load csv file no header information, indexed on date         self.symbol_data[s] = pd.io.parsers.read_csv(                                   os.path.join(self.csv_dir, '%s.csv' % s),                                   header=0, index_col=0, parse_dates=true,                                   names=['date','open','high','low','close','total volume']                               ).sort()           # combine index pad forward values         if comb_index none:             comb_index = self.symbol_data[s].index         else:             comb_index.union(self.symbol_data[s].index)          # set latest symbol_data none         self.latest_symbol_data[s] = []      print ''     # reindex dataframes     s in self.symbol_list:         self.symbol_data[s] = self.symbol_data[s].reindex(index=comb_index, method='pad').iterrows() 

as can see, self.symbol_data[s] works fine when time series have same start date, when don't, wont work during simulation, loop through each symbol within loop data. word need take in account cross-sectional price data each date of iteration

love hear others doing achieve this.

i understand can line them side side dates match , loop row row, when have 100k different securities, slow in memory. besides, each csv file not single column multiple columns...

thanks,


date    open    high    low close   total volume 19991118    29.69620186 32.63318885 26.10655108 28.71720619 685497 19991119    28.02375093 28.06454241 25.98417662 26.3513 166963 19991122    26.96317229 28.71720619 26.14734257 28.71720619 72092 19991123    27.73821052 28.47245727 26.10655108 26.10655108 65492 19991124    26.18813405 27.37108715 26.10655108 26.80000634 53081 19991126    26.67763189 27.08554675 26.59604891 26.88158932 18955 

let's start this:

pd.read_csv(file_path, parse_dates=true, index_col=0)                  open       high        low      close  total volume date                                                                 1999-11-18  29.696202  32.633189  26.106551  28.717206        685497 1999-11-19  28.023751  28.064542  25.984177  26.351300        166963 1999-11-22  26.963172  28.717206  26.147343  28.717206         72092 1999-11-23  27.738211  28.472457  26.106551  26.106551         65492 1999-11-24  26.188134  27.371087  26.106551  26.800006         53081 1999-11-26  26.677632  27.085547  26.596049  26.881589         18955 

how not sufficient needs?


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -