sql - Rolling average postgres -
i running postgres 9.2 , have large table like
create table sensor_values ( ts timestamp time zone not null, value double precision not null default 'nan'::real, sensor_id integer not null )
i have values coming system ie many per minute. want maintain rolling standard deviation / average last 200 values can determine if new values entering system within 3 standard deviations of mean. need current standard deviation , mean updated last 200 values. table can hundreds of millions of rows not want last 200 rows sensor ordered time , vg(value), var_samp(value) every new value coming in. , assuming faster updated standard deviation , mean.
i have started writing pl/pgsql function update rolling variance , mean on each new value entering system particular sensor.
i can using code pseudo like
newavg = oldavg + (new_value - old_value)/window_size new_variance += (new_value-old_value)*(new_value-newavg+old_value-oldavg)/(window_size-1)
this based on http://jonisalonen.com/2014/efficient-and-accurate-rolling-standard-deviation/
basically window of size 200 values. old_value first value of window. when new value comes in shift window forward one. after result store following values sensor
the first value of window. mean average of window values. variance of window values.
this way don't have there last 200 value , sum etc.i can reuse values when new sensor value come in.
my problem when first running have no previous window data sensor ie 3 values above have slow way.
something like
with s (select value sensor_values sensor_values.sensor_id = $1 , ts >= (now() - interval '2 day')::timestamptz order ts desc limit 200) select avg(value), var_samp(value) last_window_average, last_window_variance s;
but how last value (ealiest) save select statement ? can access first row s in pl/pgsql.
i thought pl/pgsql faster / cleaner approach maybe better client code ? there better ways perform type on rolling statistic update ?
i assume, not drastically slow re-calculated latest 200 entries each time proper indexing. if you'll index, like:
create index i_sensor_values on sensor_values(sensor_id, ts desc);
you'll able results doing:
select sum("value") -- add more expressions required sensor_values sensor_id=$1 order ts desc limit 200;
you can execute query in loop pl/pgsql
function. if you'll migrate 9.3 (or higher) time soon, you'll able use lateral
joins purpose.
i not think covering index thing here, table changing , indexonlyscan
not kick in.
it check loose index scans also.
p.s. column name value
should double quoted, sql reserved word.