Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations GregLocock on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

MatLab: handling real-time data while trending, continue trend with additional data next iteration

Status
Not open for further replies.

SteveOnYourSide

Mechanical
Feb 26, 2016
5
I have a system here that takes in some data, then needs to decide what to do depending on the data, but no real data yet(...all the pieces clunking along before refining). I was hoping to get some guidance on handling the data.
So, no filters here, just creating a smoothed version (or trend) from the data (time is of the essence in this system's run-time). Starting theory in MatLab, then converting to C/C++ later (I speak MatLab, the other guys speak C; I wish I spoke both). I'll break it down; say you want to, for example:

1. Take in data for 150 (or N) samples, then after that you want to
2. generate a trend for initial data set (from initial sample to N)
3. Taking in new data for 20 (or F) of samples, consider past trend in
the new iteration of current trend (from F to N+F)...keep in mind,
is NOT like: moving F forward and taking a NEW trend across the new
data window, that would not take into account past trend.
4. Repeat 3.

Right now, we've selected "kridging" for smoothing. I have some code that runs, though I'm not entirely sure it's doing what I really want. The data for testing code is fake, but one important anomaly is mimicked in the fake data: a sharp spike at about the 1000 sample mark lasting about 45 samples (see picture).
JustData_jtwv4u.jpg

(y-data included as attachment...if anyone would like to plug it into code and run)
The question is...if you're running in data at real-time (almost) and want to ignore this or similar anomalies via computational and statistical methods, how should I go about doing this to maintain the trend from the first chunk of data, then continue on with smaller chunks of data being added. I have seen some other examples, but unfortunately nothing like this in MatLab that I can interpret. Here's the code I'm working with (thanks to developer of KrigingP.m):

Code:
    x=0:1:1000; % sample number.
    %y= sorry, really long, included as txt link in post.
    sigma=2; % for use with Kriging function (Standard deviation).
    
    N=150; % total number of samples in each interpolation.
    F=20; % number of samples added/discarded each time.
    k=1; % separate counter, so I can move where loop starts and nothing breaks.
    m=0; % counter for conditional statement control.
    for j=950:1:length(y) % run through number of samples.
        if k==N+1 & m==0
            figure
            plot(x(j-N:j),y(j-N:j),'r-'); %plot data.
            hold on
            [Res]=KrigingP([x(j-N:j)' y(j-N:j)'],1,sigma,2,3); %calculate krig.
            yTrend(j-N:j)=Res(:,2);
            
            plot(x(j-N:j),yTrend(j-N:j),'b-'); % plot krig-ed interpolation.
            axis( [ x(j-N) x(j) 0 6 ] ); % for prediction: x(j+F+1)
            legend('Data','Kriging','Location','NorthEast');
            hold off
            m=m+1;
        elseif  (mod(k,F)==0) & (j < length(y)-N) & m==1 % do if j=every N and j<N from the end of y and (j>N, but not before first if has run)
            figure % ('units','normalized','outerposition',[0 0 1 1])
            plot(x(j-N:j),y(j-N:j),'r-'); %plot data.
            hold on
            [Res]=KrigingP([x(j-N:j)' y(j-N:j)'],1,sigma,2,3);
            yTrend(j-N:j)=Res(:,2);
            [Res]=KrigingP([x(j-N:j)' yTrend(j-N:j)'],1,sigma,2,3); %create trend
            yTrend(j-N:j)=Res(:,2);
            
            plot(x(j-N:j),yTrend(j-N:j),'b-'); % plot krig-ed interpolation.
            plot(Res(:,1),Res(:,2),'b-'); % plot krig-ed interpolation.
            axis( [ x(j-N) x(j) 0 6 ] );
            legend('Data','Kriging','Location','NorthEast');
            hold off
        elseif j>=1200 % early limit for code testing (less figures made).
            %fprintf('break'); % for debugging.
            break
        else
            %fprintf('else, j= %d \n',j); %for debugging
        end
        k=k+1;
    end

Here's an example output plot:
TrendWithAnomaly_eudpsf.jpg

Later, this will be used to mathematically match a function to the trend and plug-in a prediction point some number of samples ahead (say maybe 20-50). Which is why it's important to travel across the anomaly with little influence to the trend.

More specifically, am I using the incorrect approach to handle the data under these conditions, and if so, could you please back it up with an explanation?

Thanks for any help you can provide (with statistical theory and/or coding).


 
Replies continue below

Recommended for you

Most data processing approaches have some scheme for rejecting obviously bad data. However, your simulated data does not simulate a spike, since it consists of multiple data points that are locally monotonic, which is more like aberrant behavior than a spike. Is it really a spike? It has no noise, i.e., it's much smoother than the actual data. So, is the smooth behavior inherent in the anomalous data?

TTFN
I can do absolutely anything. I'm an expert!
homework forum: //faq731-376 forum1529
 
@IRstuff Ah, yes...so, that's a good point. That "spike" when I generated that fake data didn't include the noise the other data did, though I'm not sure it would make much of a difference when running it through the trend. So, to answer your question, the smoothness of the spike is not necessarily part of the anomaly, though the real situation may yield that there would be less noise during that short time period anyway.
Also, as far as the rejecting of bad data, I would really like to just bundle that type of rejection into the forming of my trend, but that's the tricky part. Technically that anomaly would be acceptable data and the trend should be more influenced by it, if the anomaly type of shape occurred for a longer time period. I hope you see what I mean.

"What you have there is a bad case of the Murphy's."
 
@IRstuff Yes, that's what I'm trying to do, but by tweaking the parameters to the trending function(or the trending method). With the sensors I have, technically all the data is true (or not false data), but I am building this trend to create a predicted output. I just don't want my system to react to it, especially if the trend that the predictive model is taking as an input throws the prediction to a value that is far off from the trend of the data toward the end of the data set. For example, in post where the output plot is, imagine I fit that trend (blue line, kriging) to a mathematical function (like a polynomial function) then plugged in an x-value of 50 ahead of the data. Well, already with this trending method, you can see that the prediction would be at least about (data at y=3.2 and trend at y=4.0) y= 0.8 off of the data. I understand the method of creating rejection criteria for observing or logging data, but in this case, I am using a system with outputs that react in very short time periods...so I don't have the luxury of processing data for long periods of time. I am really asking more on a theoretical / mathematical basis if this method can be improved.

"What you have there is a bad case of the Murphy's."
 
I've been playing around with different methods, and I am starting to realize for the prediction, maybe I really want a linear fit because that type of anomaly if rejected simply makes the data fit to a nice line. I may still want to store the smoothed result for evaluating other trends in the data, but will be for later. Any thoughts on those more talented in the way of fitting or statistics? I imagine the knowledgeable would tell me I can do many methods, the very knowledgeable would tell me which ones I don't need to use. That's what I'm getting at here...what can I get away with my trying to jump from trended data to prediction? (this may sound "unsafe" to a lot of data-analysts, but for what I'm doing, it's just fine)

see pictures for results using linear methods (including predictions this time):
Also, note my sensor choice has changed (more resolution and higher range... 0-10 now), ignore colored horizontal lines (means little in way of the problem).

Description for first picture: All the data shown (with 2 anomalies a relatively abrupt disturbance at samples 1000-1050 and ~2200-2700 a relatively subtle disturbance)
2016-02-27dataOnlyZerotoTen_yg06ab.jpg

Description for second picture: Some figures plotted of the data around first anomaly, linear trend method in dark blue (sorry, labelled incorrectly as kriging here), and predictions (part 1):
2016-02-27_first_anomaly_prediciton_figures_using_linear_methods_prt1_igr0fj.jpg

Description for third picture: Some figures plotted of the data around first anomaly, linear trend method in dark blue (sorry, labelled incorrectly as kriging here), and predictions (part 2):
2016-02-27_first_anomaly_prediciton_figures_using_linear_methods_prt2_ks2epi.jpg

Description for fourth figure: Some figures plotted of the data around second anomaly, linear trend method in dark blue (sorry, labelled incorrectly as kriging here), and predictions (part 2):
2016-02-27_second_anomaly_prediciton_figures_using_linear_methods_qgwvfe.jpg


Any thoughts on the matter, still would be appreciated.


"What you have there is a bad case of the Murphy's."
 
There is nothing you ca do to "tweak" simple algorithms against problems such as this. One thing that you could potentially play with, assuming timing allows, which you didn't specifically answer, is to apply a 100+ sample median filter. This will still produce some humps, but greatly reduced. However, these humps are dependent on the anomalous data being sufficiently short in duration.

TTFN
I can do absolutely anything. I'm an expert!
homework forum: //faq731-376 forum1529
 
Thanks for your input, I really appreciate it, but I'm not sure I really agree. For example, I was using a function polyfit in MatLab before, and I did indeed just tweak my code by using a first order parameter into the polyfit (which is really just a linear model) and my prediction error improved.
That is the type of tweaking I'm speaking of...also, someone else may have other clever methods of resolving some of these types of issues and would love to learn from anyone else if they have some experience/insight into the matter.
In the meantime, I'll continue being optimistic. [wink]

"...But he with a chuckle replied
That 'maybe it couldn’t,' but he would be one
Who wouldn’t say so till he’d tried..." -an excerpt from It Couldn’t Be Done by Edgar Albert Guest

"What you have there is a bad case of the Murphy's."
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor