class RobustShell( IterativeFitter ) | Source |
---|
RobustShell tries to make a fit more robust in the presence of outliers.
It is a shell around another fitter. Technically it is in itself a a Fitter, but with limited possiblities.
A RobustShell tries to make a fit more robust in the presence of outliers. It does it by manipulating the weights: outliers are downweighted. "Normal" points keep their weights, more or less.
Apart from methods specific to the robustification, RobustShell has a fit method and little else from the Fitter family. Methods to get chi2, the covariance matrix, the evidence, the noise scale etc. should be taken from the embedded fitter.
The theory behind robust fitting can be found in Wikipedia: Robust Statistics, and the references therein.
To determine which points are "normal" and which are "outliers" we need a previous fit. The difference to this earlier fit determines the normalcy. Its amount is used to adjust the weights.
The fact that we need a previous iteration makes robust estimation a iterative procedure. Typically the procedure should stabilize in half a dozen steps maximally.
There are several schemes to adjust the weights. But before we go into that we need two values in each scheme. Firstly the amount of noise present in the data. By default the noise is taken from the previous fit via Fitter.scale. The second value needed is the size of the influence domain in terms of the noise scale.
For all schemes the deviant is calculated as the difference between data and fit divided by noisescale and domainsize
d = ( data - fit ) / ( noisescale * domainsize )
The domainsize is defined such that deviants upto 3 times the noisescale fall within the fwhm.
A number of weighting schemes are provided.
Kernel | domain | support | edges | comment |
---|---|---|---|---|
Biweight | 5.54 | bound | smooth | Tukey: Default kernel |
CosSquare | 6.00 | bound | smooth | |
Tricube | 5.08 | bound | smooth | |
Triweight | 6.60 | bound | smooth | |
Uniform | 3.00 | bound | hard | clip outside domain |
Cosine | 4.50 | bound | hard | |
Triangle | 6.00 | bound | hard | |
Parabola | 4.50 | bound | hard | |
Huber | 1.50 | unbound | smooth | In domain mean; out domain median |
Gauss | 2.12 | unbound | smooth | |
Lorentz | 3.00 | unbound | smooth |
Other schemes can be written by making another Kernel or writing a function
wgts = func( d )
where d is the deviant as above.
Attributes
- fitter : BaseFitter
The fitter to be used - kernel : Kernel or callable
Kernel take function from this kernel
callable in the form f(d), where d = ( data - mock ) / ( domain * scale ) - domain : float
domain of the kernel function - onesided : [-1,0,+1]
-1 apply to negative residuals
0 aplly to both sided (not onesided)
+1 apply to positive residuals.
Notes
Robust fitting is even more dangerous than ordinary fitting. Never trust what you get without thorough checking.
Example
model = PolynomialModel( 1 ) # some model
x = numpy.arange( 100, dtype=float ) / 100 # some x values
y = numpy.arange( 100, dtype=float ) / 4 # digitization noise
y[1,11,35,67] += 10 # create outliers
ftr = Fitter( x, model ) # a fitter, a model and a x
rob = RobustShell( ftr, domain=7 ) # robust fitter using someFtr
par = rob.fit( y ) # solution
print( rob ) #
print( rob.weights ) # print final weights
print( ftr.chisq ) # get from someFtr
Author Do Kester.
RobustShell( fitter, kernel=Biweight, domain=None, onesided=None, **kwargs ) |
---|
Create a new class, providing the fitter to be used.
Parameters
- fitter : BaseFitter
to be used - kernel : Kernel or callable
All Kernels have a methodresult( d )
which is applied to the deviants.
where d = ( data - model ) / ( domain * scale )
If kernel is a callable method it is assumed to be a similar result mathod. - domain : None or float
Width of the kernel.
None : automatic calculation of domain according to table in class doc.
float : overrides autocalculation. - onesided : None or "positive" or "p" or "negative" or "n"
None : apply robust weights to positive and negative residuals
"positive" : apply robust weights to positive residuals only
"negative" : apply robust weights to negative residuals only
setKernel( kernel ) |
---|
Parameters
- kernel : Kernel or callable
All Kernels have a methodresult( d )
which is applied to the deviants.
where d = ( data - model ) / ( domain * scale )
If kernel is a callable method it is assumed to be a similar result mathod.
Raises
ValueError when kernel is not recognized.
setOneSided( onesided ) |
---|
Parameters
- onesided : None or "positive" or "negative"
None : apply robust weights to positive and negative residuals
"positive" : apply robust weights to positive residuals only
"negative" : apply robust weights to negative residuals only
Raises
ValueError when onesided could not be interpreted.
fit( data, weights=None, kernel=None, domain=None, onesided=None, **kwargs ) |
---|
Parameters
- data : array_like
the data as they go into a fitter - kwargs : dict
keyword args to be passed to fitter.fit()
Methods inherited from IterativeFitter |
---|
- setParameters( params )
- doPlot( param, force=False )
- fitprolog( ydata, weights=None, accuracy=None, keep=None )
- report( verbose, param, chi, more=None, force=False )
Methods inherited from BaseFitter |
---|
- setMinimumScale( scale=0 )
- fitpostscript( ydata, plot=False )
- keepFixed( keep=None )
- insertParameters( fitpar, index=None, into=None )
- modelFit( ydata, weights=None, keep=None )
- limitsFit( ydata, weights=None, keep=None )
- checkNan( ydata, weights=None, accuracy=None )
- getVector( ydata, index=None )
- getHessian( params=None, weights=None, index=None )
- getInverseHessian( params=None, weights=None, index=None )
- getCovarianceMatrix( )
- makeVariance( scale=None )
- normalize( normdfdp, normdata, weight=1.0 )
- getDesign( params=None, xdata=None, index=None )
- chiSquared( ydata, params=None, weights=None )
- getStandardDeviations( )
- monteCarloError( xdata=None, monteCarlo=None)
- getScale( )
- getEvidence( limits=None, noiseLimits=None )
- getLogLikelihood( autoscale=False, var=1.0 )
- getLogZ( limits=None, noiseLimits=None )
- plotResult( xdata=None, ydata=None, model=None, residuals=True,