Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

Beware of a dark-haired man with a loud tie.


sci / sci.stat.math / Re: Q differences between these two ways of comparisons

SubjectAuthor
* Q differences between these two ways of comparisonsCosine
`* Re: Q differences between these two ways of comparisonsRich Ulrich
 `* Re: Q differences between these two ways of comparisonsCosine
  `* Re: Q differences between these two ways of comparisonsRich Ulrich
   `* Re: Q differences between these two ways of comparisonsCosine
    `* Re: Q differences between these two ways of comparisonsRich Ulrich
     `* Re: Q differences between these two ways of comparisonsCosine
      `- Re: Q differences between these two ways of comparisonsRich Ulrich

1
Subject: Q differences between these two ways of comparisons
From: Cosine
Newsgroups: sci.stat.math
Date: Sat, 12 Aug 2023 01:28 UTC
X-Received: by 2002:a05:622a:1752:b0:403:3448:1a21 with SMTP id l18-20020a05622a175200b0040334481a21mr53400qtk.3.1691803731321;
Fri, 11 Aug 2023 18:28:51 -0700 (PDT)
X-Received: by 2002:a17:902:ce84:b0:1b2:436b:931d with SMTP id
f4-20020a170902ce8400b001b2436b931dmr1214070plg.2.1691803731063; Fri, 11 Aug
2023 18:28:51 -0700 (PDT)
Path: eternal-september.org!news.eternal-september.org!border-1.nntp.ord.giganews.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: sci.stat.math
Date: Fri, 11 Aug 2023 18:28:50 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=114.24.94.138; posting-account=H-IscAoAAABkDNrURGSxo9jPN3MJ3a8A
NNTP-Posting-Host: 114.24.94.138
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a405f5d0-2b13-44fa-ab7a-534db6bf3789n@googlegroups.com>
Subject: Q differences between these two ways of comparisons
From: asecant@gmail.com (Cosine)
Injection-Date: Sat, 12 Aug 2023 01:28:51 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 22
View all headers

Hi:

Suppose we have 5 algorithms: A, B, C, D, and E, and we did the following two kinds of performance comparison. The performance comparison is to compare the two algorithms' values of a given performance metric, M.

Kind-1:

M_A > M_B, M_A > M_C, M_A >M_D, and M_A >M_E

Then we claim that A performs better than all the rest 4 algorithms.

Kind-2:

M_A > M_B, M_A > M_C, M_A > M_D, M_A > M_E,
M_B > M_C, M_B > M_D, M_B > M_E,
M_C > M_D, M_C > M_E, and
M_D > M_E

Then, we claim that A performs best among all the 5 algorithms.

Subject: Re: Q differences between these two ways of comparisons
From: Rich Ulrich
Newsgroups: sci.stat.math
Date: Sat, 12 Aug 2023 04:09 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!border-1.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 12 Aug 2023 04:09:51 +0000
From: rich.ulrich@comcast.net (Rich Ulrich)
Newsgroups: sci.stat.math
Subject: Re: Q differences between these two ways of comparisons
Date: Sat, 12 Aug 2023 00:09:50 -0400
Message-ID: <i31edi5hrcicnr45u1ie02kqugl84fut8l@4ax.com>
References: <a405f5d0-2b13-44fa-ab7a-534db6bf3789n@googlegroups.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 36
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-22ahlhR04WhSII77GWF9VTNIGzEaun9Ep4APvNUjyIDRyQt8o1T1SRnrJt+/0FUQlM0cDbAtrguir9z!85DzU1Y1CHvmdcSOBAQBZrztiDguZEMkiZW1YtlWWlGgRiknr5Je0GJMC/w7xWJZSmzy8yg=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
View all headers

On Fri, 11 Aug 2023 18:28:50 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

>Hi:
>
> Suppose we have 5 algorithms: A, B, C, D, and E, and we did the following two kinds of performance comparison. The performance comparison is to compare the two algorithms' values of a given performance metric, M.
>
>Kind-1:
>
> M_A > M_B, M_A > M_C, M_A >M_D, and M_A >M_E
>
> Then we claim that A performs better than all the rest 4 algorithms.

It seems that you are describing the RESULT of a set
of comparisons. The two 'kinds' would be, A versus each other,
and "all comparisons among them."

You should say, "on these test data" and "better on M than ..."
and "performed" (past tense).

>
>Kind-2:
>
> M_A > M_B, M_A > M_C, M_A > M_D, M_A > M_E,
> M_B > M_C, M_B > M_D, M_B > M_E,
> M_C > M_D, M_C > M_E, and
> M_D > M_E
>
> Then, we claim that A performs best among all the 5 algorithms.
>

I would state that A performed better (on M) than the rest, and also
the rest were strictly ordered in how well they performed.

--
Rich Ulrich

Subject: Re: Q differences between these two ways of comparisons
From: Cosine
Newsgroups: sci.stat.math
Date: Sat, 12 Aug 2023 06:31 UTC
References: 1 2
X-Received: by 2002:ad4:4aea:0:b0:63c:fb61:1a4a with SMTP id cp10-20020ad44aea000000b0063cfb611a4amr53847qvb.4.1691821902451;
Fri, 11 Aug 2023 23:31:42 -0700 (PDT)
X-Received: by 2002:a17:90b:4393:b0:263:3437:a0b0 with SMTP id
in19-20020a17090b439300b002633437a0b0mr861788pjb.3.1691821901863; Fri, 11 Aug
2023 23:31:41 -0700 (PDT)
Path: eternal-september.org!news.eternal-september.org!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: sci.stat.math
Date: Fri, 11 Aug 2023 23:31:41 -0700 (PDT)
In-Reply-To: <i31edi5hrcicnr45u1ie02kqugl84fut8l@4ax.com>
Injection-Info: google-groups.googlegroups.com; posting-host=114.24.94.138; posting-account=H-IscAoAAABkDNrURGSxo9jPN3MJ3a8A
NNTP-Posting-Host: 114.24.94.138
References: <a405f5d0-2b13-44fa-ab7a-534db6bf3789n@googlegroups.com> <i31edi5hrcicnr45u1ie02kqugl84fut8l@4ax.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <24ebfa39-8ff8-485e-8636-5774caabc107n@googlegroups.com>
Subject: Re: Q differences between these two ways of comparisons
From: asecant@gmail.com (Cosine)
Injection-Date: Sat, 12 Aug 2023 06:31:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 57
View all headers

Rich Ulrich 在 2023年8月12日 星期六中午12:10:06 [UTC+8] 的信中寫道:
> On Fri, 11 Aug 2023 18:28:50 -0700 (PDT), Cosine
> wrote:
> >Hi:
> >
> > Suppose we have 5 algorithms: A, B, C, D, and E, and we did the following two kinds of performance comparison. The performance comparison is to compare the two algorithms' values of a given performance metric, M.
> >
> >Kind-1:
> >
> > M_A > M_B, M_A > M_C, M_A >M_D, and M_A >M_E
> >
> > Then we claim that A performs better than all the rest 4 algorithms.
> It seems that you are describing the RESULT of a set
> of comparisons. The two 'kinds' would be, A versus each other,
> and "all comparisons among them."
>
> You should say, "on these test data" and "better on M than ..."
> and "performed" (past tense).
> >
> >Kind-2:
> >
> > M_A > M_B, M_A > M_C, M_A > M_D, M_A > M_E,
> > M_B > M_C, M_B > M_D, M_B > M_E,
> > M_C > M_D, M_C > M_E, and
> > M_D > M_E
> >
> > Then, we claim that A performs best among all the 5 algorithms.
> >
> I would state that A performed better (on M) than the rest, and also
> the rest were strictly ordered in how well they performed.
>
> --
> Rich Ulrich

In other words, if the purpose is only to demonstrate that A performed better on M than the rest 4 algorithms,
we only need to do the first kind of comparison. We do the second kind only if we want to demonstrate the ordering.

By the way. it seems that to reach the desired conclusion, both kinds of comparison require doing multiple comparisons.

The first kind requires 4 ( = 5-1 ) and the second requires C(5,2) = 10..

Therefore, if we use Bonferroni correction, the significant level will be corrected to alpha/(n-1) and alpha/C(n,2), respectively.

If we use more than one metric, e.g., M_1, to M_m, then we need to further divide the previous alphas by m, right?

But wouldn't the corrected alpha value be too small, especially when we have certain numbers of n and m?

Subject: Re: Q differences between these two ways of comparisons
From: Rich Ulrich
Newsgroups: sci.stat.math
Date: Sat, 12 Aug 2023 19:24 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!border-1.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 12 Aug 2023 19:24:07 +0000
From: rich.ulrich@comcast.net (Rich Ulrich)
Newsgroups: sci.stat.math
Subject: Re: Q differences between these two ways of comparisons
Date: Sat, 12 Aug 2023 15:24:07 -0400
Message-ID: <rblfdipkgneggau48epb05olvhre4per8e@4ax.com>
References: <a405f5d0-2b13-44fa-ab7a-534db6bf3789n@googlegroups.com> <i31edi5hrcicnr45u1ie02kqugl84fut8l@4ax.com> <24ebfa39-8ff8-485e-8636-5774caabc107n@googlegroups.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 93
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-O0raqBza7uObNxej8NqHfYAmnxNilMmK2U8IvkKeA5SuOXHuXe+nCMTDbY3R+bVcZrEgqIbVmXVq2Dk!sUwy4dW3uCIp/dpOPbgxd7vD8SelINlRo1GZ2oAd3sEA6g8Cd4Yf/7ll37nRLPlN9WhjqIk=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
View all headers

On Fri, 11 Aug 2023 23:31:41 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

>Rich Ulrich ? 2023?8?12? ?????12:10:06 [UTC+8] ??????
>> On Fri, 11 Aug 2023 18:28:50 -0700 (PDT), Cosine
>> wrote:
>> >Hi:
>> >
>> > Suppose we have 5 algorithms: A, B, C, D, and E, and we did the following two kinds of performance comparison. The performance comparison is to compare the two algorithms' values of a given performance metric, M.
>> >
>> >Kind-1:
>> >
>> > M_A > M_B, M_A > M_C, M_A >M_D, and M_A >M_E
>> >
>> > Then we claim that A performs better than all the rest 4 algorithms.
>> It seems that you are describing the RESULT of a set
>> of comparisons. The two 'kinds' would be, A versus each other,
>> and "all comparisons among them."
>>
>> You should say, "on these test data" and "better on M than ..."
>> and "performed" (past tense).
>> >
>> >Kind-2:
>> >
>> > M_A > M_B, M_A > M_C, M_A > M_D, M_A > M_E,
>> > M_B > M_C, M_B > M_D, M_B > M_E,
>> > M_C > M_D, M_C > M_E, and
>> > M_D > M_E
>> >
>> > Then, we claim that A performs best among all the 5 algorithms.
>> >
>> I would state that A performed better (on M) than the rest, and also
>> the rest were strictly ordered in how well they performed.
>>
>> --
>> Rich Ulrich
>
>In other words, if the purpose is only to demonstrate that A performed better on M than the rest 4 algorithms,
>we only need to do the first kind of comparison. We do the second kind only if we want to demonstrate the ordering.
>
> By the way. it seems that to reach the desired conclusion, both kinds of comparison require doing multiple comparisons.
>
>The first kind requires 4 ( = 5-1 ) and the second requires C(5,2) = 10.

Before you take on 'multiple comparisons' and p-levels, you ought
to have a Decision to be made, or a question What do you have
here? Making a statement about what happens to fit the sample
best does not require assumptions; drawing inferences to elsewhere
does require assumptions.

Who or what does your sample /represent/? Where do the algorithms
come from? (and how do they differ?). What are you hoping to
generalize to?

I can imagine that your second set of results could be a summary
of step-wise regression, where Metric is the R-squared and A is
the result after mutiple steps. Each step shows an increase in
R-squared, by definition. Ta-da!

The hazards of step-wise regression are well-advertised by now.
I repeated Frank Harrell's commentary multiple times in the stats
Usenet groups, and others picked it up. I can add: When there
are dozens of candidate variables to Enter, each step is apt to
provide a WORSE algorithm when applied to a separate sample for
validation. Sensible algorithms usually require the application of
good sense by the developers -- instead of over-capitalizing on
chance in a model built on limited data.

If you have huge data, then you should also pay attention to
robustness and generalizability across sub-populations, rather
than focus on p-levels for the whole shebang.

>
>Therefore, if we use Bonferroni correction, the significant level will be corrected to alpha/(n-1) and alpha/C(n,2), respectively.

In my experience, I talked people out of corrections many times
by cleaning up their questions. Bonferroni fits best when you
have /independent/ questions of equal priority. And when you
have a reason to pay heed to family-wise error.

>
>If we use more than one metric, e.g., M_1, to M_m, then we need to further divide the previous alphas by m, right?
>
>But wouldn't the corrected alpha value be too small, especially when we have certain numbers of n and m?

If you don't have any idea what you are looking for, one common
procedure is to proclaim the effort 'exploratory' and report
the nominal levels.

--
Rich Ulrich

Subject: Re: Q differences between these two ways of comparisons
From: Cosine
Newsgroups: sci.stat.math
Date: Sat, 12 Aug 2023 23:03 UTC
References: 1 2 3 4
X-Received: by 2002:a05:622a:2d4:b0:403:cecf:8c12 with SMTP id a20-20020a05622a02d400b00403cecf8c12mr64422qtx.5.1691881433494;
Sat, 12 Aug 2023 16:03:53 -0700 (PDT)
X-Received: by 2002:a17:902:d4c5:b0:1b9:e867:b496 with SMTP id
o5-20020a170902d4c500b001b9e867b496mr2235197plg.0.1691881433172; Sat, 12 Aug
2023 16:03:53 -0700 (PDT)
Path: eternal-september.org!news.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: sci.stat.math
Date: Sat, 12 Aug 2023 16:03:52 -0700 (PDT)
In-Reply-To: <rblfdipkgneggau48epb05olvhre4per8e@4ax.com>
Injection-Info: google-groups.googlegroups.com; posting-host=114.24.94.138; posting-account=H-IscAoAAABkDNrURGSxo9jPN3MJ3a8A
NNTP-Posting-Host: 114.24.94.138
References: <a405f5d0-2b13-44fa-ab7a-534db6bf3789n@googlegroups.com>
<i31edi5hrcicnr45u1ie02kqugl84fut8l@4ax.com> <24ebfa39-8ff8-485e-8636-5774caabc107n@googlegroups.com>
<rblfdipkgneggau48epb05olvhre4per8e@4ax.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d1d83baf-5ce9-4dd8-8b16-c4f6350e9b5bn@googlegroups.com>
Subject: Re: Q differences between these two ways of comparisons
From: asecant@gmail.com (Cosine)
Injection-Date: Sat, 12 Aug 2023 23:03:53 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2647
View all headers

Hmm, let's start by asking or clarifying the research questions then.

Many machine learning papers I read often used a set fo metrics to show that the developed algorithm runs the best, compared to a set of benchmarks.

Typically, the authors list the metrics like accuracy, sensitivity, specificity, the area under the receiver operating characteristic (AUC) curve, recall, F1-score, and Dice score, etc.

Next, the authors list 4-6 published algorithms as benchmarks. These algorithms have similar designs and are designed for the same purpose as the developed one, e.g., segmentation, classification, and detection/diagnosis.

Then the authors run the developed algorithm and the benchmarks using the same dataset to get the values of each of the metrics listed.

Next, the authors conduct the statistical analysis y comparing the values of the metrics to demonstrate that the developed algorithm is the best, and sometimes, the rank of the algorithms (the developed one and all the benchmarks.)

Finally, the authors pick up those results showing favorable comparisons and claim these as the contribution(s) of the developed algorithm.

This looks to me that the authors are doing the statistical tests by comparing multiple algorithms with multiple metrics to conclude the final (single or multiple) contribution(s) of the developed algorithm.

Subject: Re: Q differences between these two ways of comparisons
From: Rich Ulrich
Newsgroups: sci.stat.math
Date: Mon, 14 Aug 2023 23:03 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!border-1.nntp.ord.giganews.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 14 Aug 2023 23:03:24 +0000
From: rich.ulrich@comcast.net (Rich Ulrich)
Newsgroups: sci.stat.math
Subject: Re: Q differences between these two ways of comparisons
Date: Mon, 14 Aug 2023 19:03:25 -0400
Message-ID: <sgbldid7a9l3jlrpjkg4fq3ml4uri4lgj1@4ax.com>
References: <a405f5d0-2b13-44fa-ab7a-534db6bf3789n@googlegroups.com> <i31edi5hrcicnr45u1ie02kqugl84fut8l@4ax.com> <24ebfa39-8ff8-485e-8636-5774caabc107n@googlegroups.com> <rblfdipkgneggau48epb05olvhre4per8e@4ax.com> <d1d83baf-5ce9-4dd8-8b16-c4f6350e9b5bn@googlegroups.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 58
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-2IFPdJdLNngwrr1gvGDz1wXILYcIToWYWxfTxsFN9hVm2xmOGZQ6R6HC7ZRFAJPU1cltNI6XdJEbFNa!3KzlMOOsd1OXjDcFBaMsVnfo3g2FPfIa43NMTk516PgSHR33NR9fSgLPzfpEicA79ZfuDTk=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
View all headers

On Sat, 12 Aug 2023 16:03:52 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

>Hmm, let's start by asking or clarifying the research questions then.
>
>Many machine learning papers I read often used a set fo metrics to show that the developed algorithm runs the best, compared to a set of benchmarks.
>
>Typically, the authors list the metrics like accuracy, sensitivity, specificity, the area under the receiver operating characteristic (AUC) curve, recall, F1-score, and Dice score, etc.
>
>Next, the authors list 4-6 published algorithms as benchmarks. These algorithms have similar designs and are designed for the same purpose as the developed one, e.g., segmentation, classification, and detection/diagnosis.

Okay. You are outside the scope of what I have read.
Whatever I read about machine learning, decades ago, was
far more primitive or preliminary than this. I can offer a note
or two on 'reading' such papers..

>
>Then the authors run the developed algorithm and the benchmarks using the same dataset to get the values of each of the metrics listed.
>
>Next, the authors conduct the statistical analysis y comparing the values of the metrics to demonstrate that the developed algorithm is the best, and sometimes, the rank of the algorithms (the developed one and all the benchmarks.)

Did the statisitcs include p-values?

The comparison I can think of is the demonstartions I have
seen about 'statistical tests' offered for consideration. That is,
authors are comparing (say, too simplistically) Student's t-test to
a t-test for unequal variances, or to a t on rank-orders.

Here, everyone can inspect the tests and imagine when they
will differ; randomized samples are created which feature various
aspects of non-normality, for various matches of Ns. What is
known is that the tests will differ -- a 5% test does not 'reject'
2.5% at each end, when computed on 10,000 generated samples,
when its assumptions are intentionally violated.

What is interesting is how MUCH they differ, and how much more
they differ for smaller N or for smaller alpha.

>
>Finally, the authors pick up those results showing favorable comparisons and claim these as the contribution(s) of the developed algorithm.
>
> This looks to me that the authors are doing the statistical tests by comparing multiple algorithms with multiple metrics to conclude the final (single or multiple) contribution(s) of the developed algorithm.

So, what I know (above) won't apply if you have to treat the
algorithms as 'black-box' operations -- you can't predict when
an algorithm will perform its best.

I think I would be concerned about the generality of the test
bank, and the legitimacy/credibility of the authors.

I can readily imagine a situation like with the 'meta-analyses'
that I read in the 1990s: You need a good statistician and a
good subject-area scientist to create a good meta-analysis, and
most of the ones I read had neither.

--
Rich Ulrich

Subject: Re: Q differences between these two ways of comparisons
From: Cosine
Newsgroups: sci.stat.math
Date: Tue, 15 Aug 2023 15:10 UTC
References: 1 2 3 4 5 6
X-Received: by 2002:a37:5802:0:b0:762:495d:8f89 with SMTP id m2-20020a375802000000b00762495d8f89mr127673qkb.2.1692112215807;
Tue, 15 Aug 2023 08:10:15 -0700 (PDT)
X-Received: by 2002:a17:902:e748:b0:1bd:9c78:8042 with SMTP id
p8-20020a170902e74800b001bd9c788042mr5563075plf.11.1692112215202; Tue, 15 Aug
2023 08:10:15 -0700 (PDT)
Path: eternal-september.org!news.eternal-september.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: sci.stat.math
Date: Tue, 15 Aug 2023 08:10:14 -0700 (PDT)
In-Reply-To: <sgbldid7a9l3jlrpjkg4fq3ml4uri4lgj1@4ax.com>
Injection-Info: google-groups.googlegroups.com; posting-host=114.24.88.59; posting-account=H-IscAoAAABkDNrURGSxo9jPN3MJ3a8A
NNTP-Posting-Host: 114.24.88.59
References: <a405f5d0-2b13-44fa-ab7a-534db6bf3789n@googlegroups.com>
<i31edi5hrcicnr45u1ie02kqugl84fut8l@4ax.com> <24ebfa39-8ff8-485e-8636-5774caabc107n@googlegroups.com>
<rblfdipkgneggau48epb05olvhre4per8e@4ax.com> <d1d83baf-5ce9-4dd8-8b16-c4f6350e9b5bn@googlegroups.com>
<sgbldid7a9l3jlrpjkg4fq3ml4uri4lgj1@4ax.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <30c0ab1a-67b7-40a4-a511-6c387b2bcd54n@googlegroups.com>
Subject: Re: Q differences between these two ways of comparisons
From: asecant@gmail.com (Cosine)
Injection-Date: Tue, 15 Aug 2023 15:10:15 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
View all headers

Well, let's consider a more classical problem.

Regarding the English teaching method for high school students, we develop a new method (A1) and want to demonstrate if it performs better than other methods (A2, A3, and A4) by comparing the average scores of the experimental class using different methods. Each comparison uses paired t-test. Since each comparison is independent of the other, the correct significance level using the Bonferroni test is alpha_original/( 4-1 ).

Suppose we want to investigate if the developed method (A1) is better than other methods (A2. A3. and A4) for English, Spanish, and German, then the correct alpha = alpha_original/( 4-1 )/3.

Subject: Re: Q differences between these two ways of comparisons
From: Rich Ulrich
Newsgroups: sci.stat.math
Date: Sat, 19 Aug 2023 04:25 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!69.80.99.27.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 19 Aug 2023 04:25:49 +0000
From: rich.ulrich@comcast.net (Rich Ulrich)
Newsgroups: sci.stat.math
Subject: Re: Q differences between these two ways of comparisons
Date: Sat, 19 Aug 2023 00:25:49 -0400
Message-ID: <3mf0eidgklqvq4nfpltr1dfvvkt59jnmef@4ax.com>
References: <a405f5d0-2b13-44fa-ab7a-534db6bf3789n@googlegroups.com> <i31edi5hrcicnr45u1ie02kqugl84fut8l@4ax.com> <24ebfa39-8ff8-485e-8636-5774caabc107n@googlegroups.com> <rblfdipkgneggau48epb05olvhre4per8e@4ax.com> <d1d83baf-5ce9-4dd8-8b16-c4f6350e9b5bn@googlegroups.com> <sgbldid7a9l3jlrpjkg4fq3ml4uri4lgj1@4ax.com> <30c0ab1a-67b7-40a4-a511-6c387b2bcd54n@googlegroups.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 69
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-LAcMhCtU0o44gwdObuizM3CFNIu5PiEeOz91tGFX69Dp56IwtPZ6M3pBn5oFaAKiOW2U+EeDkFKkqVU!PDTuAxaNn4/7v5S2wZUKGOjevWRAbzIYxNw+LDGD+kZxhj+Ue2ZzYeOwkWNQgYqN4rNBh4w=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
View all headers

On Tue, 15 Aug 2023 08:10:14 -0700 (PDT), Cosine <asecant@gmail.com>
wrote:

>Well, let's consider a more classical problem.
>
> Regarding the English teaching method for high school students, we
> develop a new method (A1) and want to demonstrate if it performs
> better than other methods (A2, A3, and A4) by comparing the average
> scores of the experimental class using different methods. Each
> comparison uses paired t-test. Since each comparison is independent of
> the other, the correct significance level using the Bonferroni test is
> alpha_original/( 4-1 ).

It took me a bit to figure out how this was a classical problem,
especially with paired t-tests -- I've never read that literature
in particular. 'Paired' on individuals does not work because you
can't teach the same material to the same student in two ways
from the same starting point.

Maybe I got it. 'Teachers' account for so much variance in
learning that the same teacher needs to teach two methods
to two different classes. 'Teachers' are the units of analyses,
comparing success for pairs of methods.

Doing this would be similar to what I've read a little more about,
testing two methods of clinical intervention. What also seems
similar for both is that the PI wants to know that the teacher/
clinician can and will properly administer the Method without too
much contamination.

>
> Suppose we want to investigate if the developed method (A1) is
> better than other methods (A2. A3. and A4) for English, Spanish, and
> German, then the correct alpha = alpha_original/( 4-1 )/3.

From my own consulting world, 'power of analysis' was always
a major concern. So I must mention that there is a very good
reason that studies usually compare only TWO methods if they
want a firm answer: More than two comparisons will require
larger Ns for the same power, and funding agencies (US, now)
typically care about the power of analysis matters. So if cost/size
is a problem, there won't be four Methods or four Languages.

For the combined experiment, I bring up what I said before:
Are you sure you are asking the question you want? (or that
you need?)

One way to comprise a simple design would be to look at the
two-way analysis of Method x Language. The main effect for
Method would matter, and the interaction of Method x Language
would say that they don't work the same. A main effect for
Language would mainly be confusing.

Beyond that, there is what I mentioned before, Are you sure
that family-wise alpha error deserves to be protected?

For educational methods -- or clinical ones -- being 'just as good'
may be fine if the teachers and students like it better. In fact, for
drug treatments (which I never dealt with on this level), NIH
had some (maybe confusing) prescriptions for how to 'show
equivalence'.

I say '(confusing)' because I do remember reading some criticism
and contradictory advice -- when I read about it, 20 years ago.
(I hope they've figured it out by now.)

--
Rich Ulrich

1

rocksolid light 0.9.8
clearnet tor