sql - Rolling average postgres -


i running postgres 9.2 , have large table like

create table sensor_values (   ts timestamp time zone not null,   value double precision not null default 'nan'::real,   sensor_id integer not null ) 

i have values coming system ie many per minute. want maintain rolling standard deviation / average last 200 values can determine if new values entering system within 3 standard deviations of mean. need current standard deviation , mean updated last 200 values. table can hundreds of millions of rows not want last 200 rows sensor ordered time , vg(value), var_samp(value) every new value coming in. , assuming faster updated standard deviation , mean.

i have started writing pl/pgsql function update rolling variance , mean on each new value entering system particular sensor.

i can using code pseudo like

newavg = oldavg + (new_value - old_value)/window_size new_variance += (new_value-old_value)*(new_value-newavg+old_value-oldavg)/(window_size-1) 

this based on http://jonisalonen.com/2014/efficient-and-accurate-rolling-standard-deviation/

basically window of size 200 values. old_value first value of window. when new value comes in shift window forward one. after result store following values sensor

the first value of window. mean average of window values. variance of window values. 

this way don't have there last 200 value , sum etc.i can reuse values when new sensor value come in.

my problem when first running have no previous window data sensor ie 3 values above have slow way.

something like

with s         (select value sensor_values sensor_values.sensor_id = $1  , ts >= (now() - interval '2 day')::timestamptz order ts desc limit 200)     select avg(value), var_samp(value)  last_window_average, last_window_variance s; 

but how last value (ealiest) save select statement ? can access first row s in pl/pgsql.

i thought pl/pgsql faster / cleaner approach maybe better client code ? there better ways perform type on rolling statistic update ?

i assume, not drastically slow re-calculated latest 200 entries each time proper indexing. if you'll index, like:

create index i_sensor_values on sensor_values(sensor_id, ts desc); 

you'll able results doing:

select sum("value") -- add more expressions required   sensor_values  sensor_id=$1  order ts desc  limit 200; 

you can execute query in loop pl/pgsql function. if you'll migrate 9.3 (or higher) time soon, you'll able use lateral joins purpose.

i not think covering index thing here, table changing , indexonlyscan not kick in.

it check loose index scans also.

p.s. column name value should double quoted, sql reserved word.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -