Message-ID:

BOFH excuse #141: disks spinning backwards - toggle the hemisphere jumper.

sci / sci.stat.math / Re: Q confidence intervals for model parameters and future predictions

Subject: Q confidence intervals for model parameters and future predictions
From: Cosine
Newsgroups: sci.stat.math
Date: Sun, 16 Apr 2023 22:06 UTC

X-Received: by 2002:ac8:578b:0:b0:3e3:7cec:ae42 with SMTP id v11-20020ac8578b000000b003e37cecae42mr4044691qta.6.1681682809392;
Sun, 16 Apr 2023 15:06:49 -0700 (PDT)
X-Received: by 2002:a05:6871:149:b0:187:7d4f:83 with SMTP id
z9-20020a056871014900b001877d4f0083mr5890597oab.0.1681682808865; Sun, 16 Apr
2023 15:06:48 -0700 (PDT)
Path: eternal-september.org!news.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: sci.stat.math
Date: Sun, 16 Apr 2023 15:06:48 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=114.24.51.187; posting-account=H-IscAoAAABkDNrURGSxo9jPN3MJ3a8A
NNTP-Posting-Host: 114.24.51.187
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d8dae6a1-58ec-4e5c-874f-c25c44de94bbn@googlegroups.com>
Subject: Q confidence intervals for model parameters and future predictions
From: asecant@gmail.com (Cosine)
Injection-Date: Sun, 16 Apr 2023 22:06:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1938

View all headers

Hi:

Often we want to build a model to predict the population. To do that, we need to draw a set of samples and then determine the parameters of the model in some sense, e.g., least-squares sense. Having the model, we could use it to predict future outcomes. However, as we are dealing with random variables, the obtained model parameters have uncertainty, i.e., their values would be different when we draw another set of samples to determine them. Therefore, we need to determine the confidence intervals of there parameters. Due to the same reason, the future outcome of the model also needs such a confidence interval.

We have explicit expressions for these confidence intervals when we use the linear least-squares model. The question is, how do we determine these confidence intervals when using a model other than the linear least-squares?

Subject: Re: Q confidence intervals for model parameters and future predictions
From: David Jones
Newsgroups: sci.stat.math
Organization: A noiseless patient Spider
Date: Mon, 17 Apr 2023 17:42 UTC
References: 1

Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dajhawk18xx@@nowhere.com (David Jones)
Newsgroups: sci.stat.math
Subject: Re: Q confidence intervals for model parameters and future predictions
Date: Mon, 17 Apr 2023 17:42:45 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <u1k0el$34hjm$1@dont-email.me>
References: <d8dae6a1-58ec-4e5c-874f-c25c44de94bbn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Apr 2023 17:42:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0bfdf1b61ca7ca9120716ca66a573194";
logging-data="3294838"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/AMxltVAGDFa6Cbt5xsv0aeK1zj28+bvc="
User-Agent: XanaNews/1.21-f3fb89f (x86; Portable ISpell)
Cancel-Lock: sha1:lyYK5242+pFMYuj0hsX7WOQPGS4=

View all headers

Cosine wrote:

> Hi:
>
> Often we want to build a model to predict the population. To do
> that, we need to draw a set of samples and then determine the
> parameters of the model in some sense, e.g., least-squares sense.
> Having the model, we could use it to predict future outcomes.
> However, as we are dealing with random variables, the obtained model
> parameters have uncertainty, i.e., their values would be different
> when we draw another set of samples to determine them. Therefore, we
> need to determine the confidence intervals of there parameters. Due
> to the same reason, the future outcome of the model also needs such a
> confidence interval.
>
> We have explicit expressions for these confidence intervals when we
> use the linear least-squares model. The question is, how do we
> determine these confidence intervals when using a model other than
> the linear least-squares?

The question is answered by the theory of maximum likelihood. You might
find the details already worked-out for some specific models.In
particular, see https://en.wikipedia.org/wiki/Generalized_linear_model

Subject: Re: Q confidence intervals for model parameters and future predictions
From: Cosine
Newsgroups: sci.stat.math
Date: Tue, 18 Apr 2023 03:45 UTC
References: 1

X-Received: by 2002:a05:622a:1898:b0:3d7:8712:a808 with SMTP id v24-20020a05622a189800b003d78712a808mr5002323qtc.1.1681789519321;
Mon, 17 Apr 2023 20:45:19 -0700 (PDT)
X-Received: by 2002:a05:6830:565:b0:6a5:d909:4851 with SMTP id
f5-20020a056830056500b006a5d9094851mr279940otc.1.1681789519117; Mon, 17 Apr
2023 20:45:19 -0700 (PDT)
Path: eternal-september.org!news.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: sci.stat.math
Date: Mon, 17 Apr 2023 20:45:18 -0700 (PDT)
In-Reply-To: <d8dae6a1-58ec-4e5c-874f-c25c44de94bbn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=114.24.51.187; posting-account=H-IscAoAAABkDNrURGSxo9jPN3MJ3a8A
NNTP-Posting-Host: 114.24.51.187
References: <d8dae6a1-58ec-4e5c-874f-c25c44de94bbn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ba62412a-71dd-43ad-b3e1-3f92fb7a5f0dn@googlegroups.com>
Subject: Re: Q confidence intervals for model parameters and future predictions
From: asecant@gmail.com (Cosine)
Injection-Date: Tue, 18 Apr 2023 03:45:19 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1574

View all headers

What if we use the method of cross-validation, e.g., the k-fold method?

Then we will have k sample values for each of the parameters and the predicted value.

We could then calculate the sample mean and standard error for each of them to build the corresponding confidence interval.

However, this requires the assumption that the parameter and predicted value are normal distributions or student distributions.

Subject: Re: Q confidence intervals for model parameters and future predictions
From: Rich Ulrich
Newsgroups: sci.stat.math
Date: Tue, 18 Apr 2023 04:54 UTC
References: 1 2

Path: eternal-september.org!news.eternal-september.org!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!69.80.99.23.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 18 Apr 2023 04:54:55 +0000
From: rich.ulrich@comcast.net (Rich Ulrich)
Newsgroups: sci.stat.math
Subject: Re: Q confidence intervals for model parameters and future predictions
Date: Tue, 18 Apr 2023 00:54:56 -0400
Message-ID: <q08s3i9sb26odn3usd105t6bu1kaluf8kf@4ax.com>
References: <d8dae6a1-58ec-4e5c-874f-c25c44de94bbn@googlegroups.com> <ba62412a-71dd-43ad-b3e1-3f92fb7a5f0dn@googlegroups.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 25
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-1BRWu45k+4Ds0ryMDyEnRtjMWHTsQtl/XjXU8c/t1MmI1ZEYCVLt89XtAz49mLXg7pxeL6ZLf+s6+xB!POjUq51Mzj+zILpNjx5cvnU8oZbO4JVa9R2TD7bEHnBAMDDnKogSzvCyfX1f2H+3pRhdQhM=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Received-Bytes: 2201

View all headers

On Mon, 17 Apr 2023 20:45:18 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

>
>What if we use the method of cross-validation, e.g., the k-fold method?
>
>Then we will have k sample values for each of the parameters and the predicted value.
>
>We could then calculate the sample mean and standard error for each of them to build the corresponding confidence interval.
>
>However, this requires the assumption that the parameter and predicted value are normal distributions or student distributions.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191021/

Here is a long article from a generally good site, discussing their
own proposal and earlier ones. They are using k-fold plus bootstrap,
and intend to remove the biases for parameter-estimates (and their
errors) inherent in the simple applications of k-fold or bootstrap.

In the early fraction of it that I read, it does mention CIs as
product.

--
Rich Ulrich

Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dajhawkxx@nowherel.com (David Jones)
Newsgroups: sci.stat.math
Subject: Re: Q confidence intervals for model parameters and future predictions
Date: Tue, 18 Apr 2023 08:33:45 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <u1lkl9$3fcav$1@dont-email.me>
References: <d8dae6a1-58ec-4e5c-874f-c25c44de94bbn@googlegroups.com> <ba62412a-71dd-43ad-b3e1-3f92fb7a5f0dn@googlegroups.com> <q08s3i9sb26odn3usd105t6bu1kaluf8kf@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 18 Apr 2023 08:33:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="35f59d26982f9ca1b038adef83272f21";
logging-data="3649887"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1++GX0YDvJ+rKPADLYLLR0LUutaG/tNP8s="
User-Agent: XanaNews/1.21-f3fb89f (x86; Portable ISpell)
Cancel-Lock: sha1:aVgxdExkYSQRoHmHI0tV+NXloCU=

View all headers

Rich Ulrich wrote:

> On Mon, 17 Apr 2023 20:45:18 -0700 (PDT), Cosine <asecant@gmail.com>
> wrote:
>
> >
> > What if we use the method of cross-validation, e.g., the k-fold
> > method?
> >
> > Then we will have k sample values for each of the parameters and
> > the predicted value.
> >
> > We could then calculate the sample mean and standard error for each
> > of them to build the corresponding confidence interval.
> >
> > However, this requires the assumption that the parameter and
> > predicted value are normal distributions or student distributions.
>
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191021/
>
> Here is a long article from a generally good site, discussing their
> own proposal and earlier ones. They are using k-fold plus bootstrap,
> and intend to remove the biases for parameter-estimates (and their
> errors) inherent in the simple applications of k-fold or bootstrap.
>
> In the early fraction of it that I read, it does mention CIs as
> product.

Some of the ideas here relate to the now-old idea of balanced
bootstrapping: see
https://mathweb.ucsd.edu/~ronspubs/90_09_bootstrap.pdf
for example.

I have seen early work on cross-validation for model-selection in
multiple regression where a typical suggestion was to work with
leaving-out 20% of the samples at a time, but that may relate to the
context of overall sample-size and having data that is not from
designed experiments.

But the joint questions "balance" and of "designed experiments" raises
the question of whether any of the considerations of partially-balanced
factorial designs can be employed or extended so as to provide a scheme
to provide slices of the data for treating as units in some
cross-validation or other analysis.

The OP says "However, this requires the assumption that the parameter
and predicted value are normal distributions or student distributions."
This may indicate that the plan would be to do multiple analyses on
small sections of the data, in contrast to doing multiple analyses on
nearly-complete versions of the data where only a small part is
left-out each time. The possible benefits of either approach would
depend on what is being attempted. In theory, if all the usual
assumptions apply, the best answers come from a single analysis of the
complete dataset. That one contemplates doing something else suggests
that there are worries about the assumptions: not having a fixed model
in mind, not having Gaussian random errors, or not having independence
between observations.

Subject: Re: Q confidence intervals for model parameters and future predictions
From: Rich Ulrich
Newsgroups: sci.stat.math
Date: Tue, 18 Apr 2023 22:13 UTC
References: 1 2 3 4

Path: eternal-september.org!news.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!69.80.99.23.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 18 Apr 2023 22:13:38 +0000
From: rich.ulrich@comcast.net (Rich Ulrich)
Newsgroups: sci.stat.math
Subject: Re: Q confidence intervals for model parameters and future predictions
Date: Tue, 18 Apr 2023 18:13:38 -0400
Message-ID: <qg4u3i1571et717v8lmkfvfmloumsjvffm@4ax.com>
References: <d8dae6a1-58ec-4e5c-874f-c25c44de94bbn@googlegroups.com> <ba62412a-71dd-43ad-b3e1-3f92fb7a5f0dn@googlegroups.com> <q08s3i9sb26odn3usd105t6bu1kaluf8kf@4ax.com> <u1lkl9$3fcav$1@dont-email.me>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 20
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-GY5VAtCqRd3XGSLaWu4kvmBX9lrBbledLTa4uSe1/RGjT2UDqcvX4S8Dn4FtmvntwC844RWGtl1ewlT!NPyL0ctrhRaW6dhVKZhDgg0lbSzKBM2Z5Q2IcRXldBLghEpyS3SAG9K3MXEVlGdat458CgU=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Received-Bytes: 1923

View all headers

On Tue, 18 Apr 2023 08:33:45 -0000 (UTC), "David Jones"
<dajhawkxx@nowherel.com> wrote:

> In theory, if all the usual
>assumptions apply, the best answers come from a single analysis of the
>complete dataset. That one contemplates doing something else suggests
>that there are worries about the assumptions: not having a fixed model
>in mind, not having Gaussian random errors, or not having independence
>between observations.

Nicely put.

"All the usual assumptions" must include having the proper
model, scales of measurement, and suitable sample.

--
Rich Ulrich

Subject	Author
Q confidence intervals for model parameters and future predictions	Cosine
Re: Q confidence intervals for model parameters and future predictions	David Jones
Re: Q confidence intervals for model parameters and future predictions	Cosine
Re: Q confidence intervals for model parameters and future predictions	Rich Ulrich
Re: Q confidence intervals for model parameters and future predictions	David Jones
Re: Q confidence intervals for model parameters and future predictions	Rich Ulrich