Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

Try to relax and enjoy the crisis. -- Ashleigh Brilliant


sci / sci.stat.math / Re: statistics in Roberts. Was: RAW vs. raw image format

Subject: Re: statistics in Roberts. Was: RAW vs. raw image format
From: David Jones
Newsgroups: sci.stat.math
Organization: A noiseless patient Spider
Date: Mon, 20 Mar 2023 17:23 UTC
References: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Path: eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: dajhawkxx@nowherel.com (David Jones)
Newsgroups: sci.stat.math
Subject: Re: statistics in Roberts. Was: RAW vs. raw image format
Date: Mon, 20 Mar 2023 17:23:53 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 203
Message-ID: <tva4r9$3kbgn$1@dont-email.me>
References: <f5a15ad4-4faf-440a-a59f-c5890d395961n@googlegroups.com> <20230219220058.8d3d14741e18cce1bf19e256@gmail.com> <51151e80-a719-46ef-8095-6535309e7d02n@googlegroups.com> <20230220003936.ca90df6f8848a095271a0cbe@gmail.com> <m35ybw2609.fsf@leonis4.robolove.meer.net> <tt3eil$183th$2@dont-email.me> <tt5fue$1iapr$1@dont-email.me> <20230223193132.41882edd1d9110b60e745dac@gmail.moc> <d7ufvhh40n67k40iqim6ikhnuil7luoavb@4ax.com> <20230225001353.60271597ed5a42bec16e8d54@gmail.moc> <0u3qvhlnu50kk3kg7e7jn6ujnene2fo8jk@4ax.com> <ttksrl$3jrcu$1@dont-email.me> <20230319004103.a1d8cad77b443543374dc671@gmail.moc> <tv5jp8$2ms1e$1@dont-email.me> <20230320010854.d766debddd20812faa887c04@gmail.moc> <tv8504$37218$1@dont-email.me> <20230320115849.90c21c2892f5496d42646f0f@g{oogle}mail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 20 Mar 2023 17:23:53 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="a3a4618c19882437bdc45fca4dc9eea3";
logging-data="3812887"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19COD0ZKn9Gl5C+hLHqtEPazRUYurUMTGk="
User-Agent: XanaNews/1.21-f3fb89f (x86; Portable ISpell)
Cancel-Lock: sha1:+iPBg1tYVvHiRkbwF49M+lm283s=
View all headers

Anton Shepelev wrote:

> David Jones to Anton Shepelev:
> > > David Jones to Anton Shepelev:
> > >
> > > > > The data of a "run" is physically a time-series.
> > > >
> > > > unless you can sure that doing so
> > > > (a) does not remove or mask effects you are looking
> > > > (b) does not introduce effects of the kind you are
> > > > looking for.
> > >
> > > I am sure the serialisation in question does neither.
> >
> > The question will be: will anyone else be sure?
>
> Indeed, but it is hard to prove the absense of either loss
> you mention.
>
> If you, or anybody else, think that the serialisation of the
> turns of a run may introduce some distorition or make the
> signal otherwise less noticeable, then please share you
> specific concerns, that we may discuss whether they are
> justified.
>
> For my part, I can only repeat the each "run" represents
> twenty or more consequent turns of the interferometer within
> a space of 15-20 minutes. It contains 20*16+1=321
> observations made over twenty "observation turns",
> occasionally interrupted by "adjustment turns", during which
> no observations were recorded. The data, therefore, is a
> physical time series with gaps. You can view them in this
> form in the seq_t directory in this archive:
>
> http://freeshell.de/~antonius/file_host/RobertsMillerData.7z
>
> Since the signal we seek is half-periodic in a turn,
> adjustment turns do not disrupt it (in any way that I can
> think of).
>
> > I think I see a common approach between you and Prof.
> > Roberts: "I think I see a problem, this is what I think
> > will solve the problem, this is what I have done,
> > therefore I have solved the problem"
>
> Please note, that I initiated discussion of the statistical
> analysis of the Miller experiemnts in this group,
> specifically because I needed your help and advice as expert
> statisticians. Mr. Roberts, on the other hand, professes no
> such desire...
>
> > > I thought that the spectral-significance (SigSpec)
> > > measure was made to answer such questions.
> >
> > You will need to get someone competent to check all the
> > assumptions involved.
>
> Can you help me first to identify those assumtptions? That
> the signal saught is stationary and periodic in a half-turn
> is a fact. Noise is not periodic. The instrumental dirft may
> be assumed to be aperiodic from looking at the measurements,
> but a specific phycial or statistical justification is
> welcome. The key point is to determine whether it may pose
> as signal or not.
>
> > Let me expand on that. It seems that the "statistical
> > tests" are based on asymptotic properties/results that are
> > only valid if there is a stationary process to be
> > analysed. You agreed that the observed series looks non-
> > stationary. So the basic results cannot be used. However
> > the package might contain something to allow some version
> > to be applied.
>
> How does one determine whether the instrumental drift is a
> stationary process? What do you think can make that process
> non-stationary? The dominance of the basic linear drift
> during the entire run seems to indicate that it is
> statuionary within the period of the run. After consulting
> the definitiona of a stationary process, I retract my
> previous statement to the contrary.
>
> The SigSpec program performs a multisine analysys of a time
> series, finding its most significant spectral components (in
> no way limited to multiples of a fundamental frequency),
> their respective significance, and the residual data. This
> should work as well if the singal is stationary and the
> error is not.
>
> > You may be hoping that a spectral analysis package will
> > provide all your answers, but recall that results of the
> > FFT are just a sophisticated version of regression
> > analysis, and you may be better off looking to that for a
> > way to proceed.... provided that you don't apply the parts
> > of the theory of regression that are not valid here.
>
> With FFT, we know our basis beforehand. With multisine, we
> do not, which makes it less "prejudiced" to what is sought.
> If a significant half-period component appear in multisine,
> it will indicate much more than such a component in the FFT,
> where it is mathematicaly bound to appear, as Mr. Roberts
> correctly observes. Thank you for the advice about
> regression. I will think how I can apply it to the data in a
> way different from that of Mr. Roberts. Basically,j
>

<snip>

Obviously we can’t hope to deal with the whole of statistical theory
here. But we can look, in some simple cases, at the effects of dealing
or not dealing with pre-analysis data-manipulations within the data
analysis.

Even the most basic statistics work relates to dealing with
within-analysis manipulations. For example the usual formula for the
estimated variance contains the divisor (n-1) instead of the divisor n,
and this can be considered to be an adjustment to take account of the
fact that you subtract-off the sample mean within the analysis.
Similarly, in regression, the sum-of-squares is divided by (n-p) to
take account of fitting a total of p parameters. In both cases the
adjustment is made to get an unbiased estimate of the variance.

So, let’s consider some pre-analysis data manipulations. Let’s assume
you have two pairs of observations (X1,X2) and (Y1,Y2), with
statistical independence within and between pairs. Let the theoretical
mean of each observation in the first pair be M1, and let the
theoretical mean of each observation in the second pair be M2. Suppose
it is assumed the theoretical variance for each of the four
observations is the same, and consider two cases where this is either
known to be 1, or else it needs to be estimated. Then consider four
versions of analyses with different pre-analysis manipulations as
follows.

(a) Separate analysis. Here the data being analysed consists of the two
pairs (X1,X2) and (Y1,Y2). Then the sample-means with each pair,
provide unbiased estimates of the two values M1 and M2, and the
theoretical variance of each estimate is 1/2 if the variance of the
observations is assumed known at 1. If the variance of observations is
unknown, one could get and use two different estimates of that variance
from the sample variance applied within each pair. Each such estimate
would be unbiased.

(b) Separate analysis, but pooled. This is the same as for (a), above,
except that the variance of the observations is estimated by the
average the sampling variances from the two pairs. The theoretical
variances of the means remain the same as in (a), but one gets better
estimates of those variances. This is achieved by making use of an
assumed structure across the pairs (that the variances are the same).

(c) Subtraction of means. To yield a special case of what might be done
for longer series, suppose that a single dataset of 4 values
(Q1,Q2,Q3,Q4) is constructed from the two pairs by subtracting the two
sample means, giving
Q1=(X1-X2)/2, Q2=(X2-X1)/2, Q3=(Y1-Y2)/2, Q4=(Y2-Y1)/2
Obviously doing this prevents any estimation of the means M1 and M2.
Applying the usual formula to get a sample variance from (Q1,Q2,Q3,Q4)
gives an estimate that has a mean value of 2/3 when the true
observation variance is known to be 1. To get a good (unbiased)
estimate you have to know the structure of the pre-analysis data
manipulation that yielded the data-to-be-analysed (Q1,Q2,Q3,Q4). In
fact this turns out to be the pooled sample variance from the original
pairs as in (b). Thus, not all is necessarily lost in doing
pre-analysis data-manipulations, provided that the actual analysis
takes account of those manipulations.

(d) Joining of data. To emulate the data-joining of the paper and of
your proposed analysis, we can consider dealing with a revised dataset
(Z1,Z2,Z3), where
Z1=X1, Z2=X2, Z3=Y2+X2-Y1
Then the mean of each value is M1, and it clear that M1 can be
estimated but not M2. One might use the sample mean of (Z1,Z2,Z3) to
estimate M1: this estimate has a theoretical variance of 7/9. Thus this
estimate is worse than the sample mean of just (Z1,Z2), which is the
same as the sample mean of (X1,X2), whose variance is 1/2. The usual
sample variance obtained from (Z1,Z2,Z3) has an expected value of 5/3
when the theoretical observation variance is 1. If the usual sample
variance obtained from (Z1,Z2,Z3) is used to estimate the variance of
the sample mean of (Z1,Z2,Z3), this would have an expected value of 5/9
rather than the true variance of this sample mean which is 7/9. So
here, if one ignores the way in which (Z1,Z2,Z3) were obtained and just
uses the usual sample estimates, we get an estimate for M1 which is
worse (in terms of variance) than what might have been obtained by just
using one the one sample pair (X1,X2). Moreover the usual formula would
give estimated variances which are biased in either case of trying to
estimate the observation variance or the variance of the sample mean.
One might consider other estimates here, derived from (Z1,Z2,Z3), but
whether or not one looked for optimal estimates this would involve
taking into account the structure by which the dataset was created. To
summarise, poor performance will arise from any attempt to analyse the
constructed dataset without taking into account the details of how it
was constructed. In this example, the data-manipulation throws away any
ability to estimate an important property (M2) of one part the original
dataset whereas retaining all the original data and the structure
therein allows everything to be estimated.

So my conclusion is that you should not try to merge groups of data
into one supposedly-continuous time-series as you don’t have to do so.
It is possible to do a combined analysis of all groups within joining
them. Since there is just one pre-specified frequency there is no need
to do a spectral analysis. But, if you really wanted to do a spectral
analysis combining all groups without joining them together, this is
certainly possible ... you just have to understand the meaning of the
quantities produced in the analysis of a single series.

SubjectRepliesAuthor
o Re: statistics in Roberts. Was: RAW vs. raw image format

By: Anton Shepelev on Sat, 18 Mar 2023

15Anton Shepelev

rocksolid light 0.9.8
clearnet tor