Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

Give your very best today. Heaven knows it's little enough.


comp / comp.lang.c / Re: realloc() - frequency, conditions, or experiences about relocation?

SubjectAuthor
* realloc() - frequency, conditions, or experiences about relocation?Janis Papanagnou
+* Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
|`* Re: realloc() - frequency, conditions, or experiences about relocation?Vir Campestris
| `* Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
|  `* Re: realloc() - frequency, conditions, or experiences about relocation?Lawrence D'Oliveiro
|   +* Re: realloc() - frequency, conditions, or experiences about relocation?Keith Thompson
|   |+* Re: realloc() - frequency, conditions, or experiences about relocation?David Brown
|   ||+* Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
|   |||+* Re: realloc() - frequency, conditions, or experiences about relocation?Scott Lurndal
|   ||||`* Re: realloc() - frequency, conditions, or experiences about relocation?Keith Thompson
|   |||| +- Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
|   |||| `- Re: realloc() - frequency, conditions, or experiences about relocation?Chris M. Thomasson
|   |||`* Re: realloc() - frequency, conditions, or experiences about relocation?Lawrence D'Oliveiro
|   ||| `* Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
|   |||  `- Re: realloc() - frequency, conditions, or experiences about relocation?Lawrence D'Oliveiro
|   ||+* Re: realloc() - frequency, conditions, or experiences about relocation?Chris M. Thomasson
|   |||`- Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
|   ||`- Re: realloc() - frequency, conditions, or experiences about relocation?Lawrence D'Oliveiro
|   |+- Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
|   |`* Re: realloc() - frequency, conditions, or experiences about relocation?Lawrence D'Oliveiro
|   | +* Re: realloc() - frequency, conditions, or experiences about relocation?Keith Thompson
|   | |+- Re: realloc() - frequency, conditions, or experiences about relocation?Richard Damon
|   | |+* Re: realloc() - frequency, conditions, or experiences about relocation?Phil Carmody
|   | ||`- Re: realloc() - frequency, conditions, or experiences about relocation?Keith Thompson
|   | |`* Re: realloc() - frequency, conditions, or experiences about relocation?James Kuyper
|   | | `- Re: realloc() - frequency, conditions, or experiences about relocation?Keith Thompson
|   | `- Re: realloc() - frequency, conditions, or experiences about relocation?James Kuyper
|   `* Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
|    `* Down the hall, past the water cooler, third door on the left... (Was: realloc() Kenny McCormack
|     `- Re: Down the hall, past the water cooler, third door on the left...Bonita Montero
+* Re: realloc() - frequency, conditions, or experiences about relocation?Ben Bacarisse
|`* Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
| +* Re: realloc() - frequency, conditions, or experiences about relocation?Ben Bacarisse
| |`* Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
| | +* Re: realloc() - frequency, conditions, or experiences about relocation?Ben Bacarisse
| | |+* Re: realloc() - frequency, conditions, or experiences about relocation?Anton Shepelev
| | ||`* Re: realloc() - frequency, conditions, or experiences about relocation?Tim Rentsch
| | || +* Indefinite pronouns [was:Re: realloc() - frequency, conditions, or experiences aAnton Shepelev
| | || |+* Re: Indefinite pronounsvallor
| | || ||`* Re: Indefinite pronounsDavid Brown
| | || || `- Re: Indefinite pronounsKeith Thompson
| | || |+- Re: Indefinite pronouns [was:Re: realloc() - frequency, conditions, or experiencTim Rentsch
| | || |+- Re: Indefinite pronouns [was:Re: realloc() - frequency, conditions, or experiencCri-Cri
| | || |`* Re: Indefinite pronouns [was:Re: realloc() - frequency, conditions, or experiencKenny McCormack
| | || | `- Re: Indefinite pronouns [was: Re: <something technical>]Janis Papanagnou
| | || `* Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
| | ||  +* Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
| | ||  |`* Re: realloc() - frequency, conditions, or experiences about relocation?Lawrence D'Oliveiro
| | ||  | `* Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
| | ||  |  +- Re: realloc() - frequency, conditions, or experiences about relocation?Ben Bacarisse
| | ||  |  `- Re: realloc() - frequency, conditions, or experiences about relocation?Lawrence D'Oliveiro
| | ||  `* Re: realloc() - frequency, conditions, or experiences about relocation?Tim Rentsch
| | ||   `- Re: realloc() - frequency, conditions, or experiences about relocation?David Brown
| | |+* Re: realloc() - frequency, conditions, or experiences about relocation?Richard Harnden
| | ||`- Re: realloc() - frequency, conditions, or experiences about relocation?Chris M. Thomasson
| | |`- Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
| | +* Re: realloc() - frequency, conditions, or experiences about relocation?Anton Shepelev
| | |+* Re: realloc() - frequency, conditions, or experiences about relocation?David Duffy
| | ||+* Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
| | |||+* Re: realloc() - frequency, conditions, or experiences about relocation?Ben Bacarisse
| | ||||`* Re: realloc() - frequency, conditions, or experiences about relocation?David Brown
| | |||| `* Re: realloc() - frequency, conditions, or experiences about relocation?Ben Bacarisse
| | ||||  `- Re: realloc() - frequency, conditions, or experiences about relocation?David Brown
| | |||`* Re: realloc() - frequency, conditions, or experiences about relocation?Anton Shepelev
| | ||| `- Re: realloc() - frequency, conditions, or experiences about relocation?Anton Shepelev
| | ||`- Re: realloc() - frequency, conditions, or experiences about relocation?Anton Shepelev
| | |+- Re: realloc() - frequency, conditions, or experiences about relocation?David Jones
| | |`* Re: realloc() - frequency, conditions, or experiences about relocation?Rich Ulrich
| | | +* Re: realloc() - frequency, conditions, or experiences about relocation?Keith Thompson
| | | |`* Re: realloc() - frequency, conditions, or experiences about relocation?Rich Ulrich
| | | | `* Re: realloc() - frequency, conditions, or experiences about relocation?Anton Shepelev
| | | |  `* Re: realloc() - frequency, conditions, or experiences about relocation?Rich Ulrich
| | | |   `- Re: realloc() - frequency, conditions, or experiences about relocation?Anton Shepelev
| | | `* Re: realloc() - frequency, conditions, or experiences about relocation?Paul
| | |  `* Re: realloc() - frequency, conditions, or experiences about relocation?Rich Ulrich
| | |   `* Re: realloc() - frequency, conditions, or experiences about relocation?Rich Ulrich
| | |    `* Re: realloc() - frequency, conditions, or experiences about relocation?Paul
| | |     +- Re: realloc() - frequency, conditions, or experiences about relocation?James Kuyper
| | |     `- Re: realloc() - frequency, conditions, or experiences about relocation?James Kuyper
| | +* Re: realloc() - frequency, conditions, or experiences about relocation?Keith Thompson
| | |+- Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
| | |`- Re: realloc() - frequency, conditions, or experiences about relocation?Malcolm McLean
| | `* Re: realloc() - frequency, conditions, or experiences about relocation?Scott Lurndal
| |  `- Re: realloc() - frequency, conditions, or experiences about relocation?Chris M. Thomasson
| `- Re: realloc() - frequency, conditions, or experiences about relocation?David Brown
+- Re: realloc() - frequency, conditions, or experiences about relocation?Rosario19
+* Re: realloc() - frequency, conditions, or experiences about relocation?Scott Lurndal
|`* Re: realloc() - frequency, conditions, or experiences about relocation?Michael S
| `- Re: realloc() - frequency, conditions, or experiences about relocation?Scott Lurndal
+- Re: realloc() - frequency, conditions, or experiences about relocation?Janis Papanagnou
+* Re: realloc() - frequency, conditions, or experiences about relocation?David Brown
|`- Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
+- Re: realloc() - frequency, conditions, or experiences about relocation?Chris M. Thomasson
`* Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
 +* Re: realloc() - frequency, conditions, or experiences about relocation?Vir Campestris
 |`* Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
 | `* Re: realloc() - frequency, conditions, or experiences about relocation?Vir Campestris
 |  `- Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero
 `* Re: realloc() - frequency, conditions, or experiences about relocation?DFS
  `- Re: realloc() - frequency, conditions, or experiences about relocation?Bonita Montero

Pages:1234
Subject: realloc() - frequency, conditions, or experiences about relocation?
From: Janis Papanagnou
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 06:08 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.c
Subject: realloc() - frequency, conditions, or experiences about relocation?
Date: Mon, 17 Jun 2024 08:08:07 +0200
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <v4ojs8$gvji$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 08:08:08 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d070642afdec4e1c027704024e3cba98";
logging-data="556658"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18zVZuXOS3rElx9IzUwsgU4"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:mvFpxMID1iO/VaYXGIWCR7GzvNA=
X-Enigmail-Draft-Status: N1110
X-Mozilla-News-Host: news://news.eternal-september.org:119
View all headers

In a recent thread realloc() was a substantial part of the discussion.
"Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a
performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.

Janis

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Chris M. Thomasson
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 06:34 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Sun, 16 Jun 2024 23:34:49 -0700
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <v4ole9$h7t3$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 08:34:50 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c5aa21a2ffdc11ceb19a167156cbaefd";
logging-data="565155"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19A7ze9uNiDyLojGC2IMFulJ1SmO1JOGvA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:6bBbXmgMOZ8vC23oyIbDEnc9+AU=
In-Reply-To: <v4ojs8$gvji$1@dont-email.me>
Content-Language: en-US
View all headers

On 6/16/2024 11:08 PM, Janis Papanagnou wrote:
> In a recent thread realloc() was a substantial part of the discussion.
> "Occasionally" the increased data storage will be relocated along
> with the previously stored data. On huge data sets that might be a
> performance factor.

[...]

It sure can be a factor to consider. Usually its pretty nice, but not so
much for certain workloads.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Ben Bacarisse
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 09:18 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
Date: Mon, 17 Jun 2024 10:18:40 +0100
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <875xu8vsen.fsf@bsb.me.uk>
References: <v4ojs8$gvji$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Mon, 17 Jun 2024 11:18:41 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f69b21fe46179def9aae9c77317b77e3";
logging-data="615472"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/V+zg+EbDwO71Sgv3/5Ez+X7ySuhqKr58="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:Pjjz8v/eeUGJR2brdGh36h8PPoE=
sha1:f8Xh0Ql2zjrM3ay/qgmv4Y/P/jw=
X-BSB-Auth: 1.007fdcf76d0baf05e01e.20240617101840BST.875xu8vsen.fsf@bsb.me.uk
View all headers

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

> In a recent thread realloc() was a substantial part of the discussion.
> "Occasionally" the increased data storage will be relocated along
> with the previously stored data. On huge data sets that might be a
> performance factor. Is there any experience or are there any concrete
> factors about the conditions when this relocation happens? - I could
> imagine that it's no issue as long as you're in some kB buffer range,
> but if, say, we're using realloc() to substantially increase buffers
> often it might be an issue to consider. It would be good to get some
> feeling about that internal.

There is obviously a cost, but there is (usually) no alternative if
contiguous storage is required. In practice, the cost is usually
moderate and can be very effectively managed by using an exponential
allocation scheme: at every reallocation multiply the storage space by
some factor greater than 1 (I often use 3/2, but doubling is often used
as well). This results in O(log(N)) rather than O(N) allocations as in
your code that added a constant to the size. Of course, some storage is
wasted (that /might/ be retrieved by a final realloc down to the final
size) but that's rarely significant.

--
Ben.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Bonita Montero
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 09:22 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!raubtier-asyl.eternal-september.org!.POSTED!not-for-mail
From: Bonita.Montero@gmail.com (Bonita Montero)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 11:22:31 +0200
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <v4ov8h$j2q2$1@raubtier-asyl.eternal-september.org>
References: <v4ojs8$gvji$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 11:22:26 +0200 (CEST)
Injection-Info: raubtier-asyl.eternal-september.org; posting-host="567fea0737ace9203b0a3bd3650a3401";
logging-data="625474"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8BNSnwWyKIgP5ybK0Se6a+z2dTCi3hSQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Re76+ouRepIyq7HdnyD+uvJlRqE=
Content-Language: de-DE
In-Reply-To: <v4ojs8$gvji$1@dont-email.me>
View all headers

Am 17.06.2024 um 08:08 schrieb Janis Papanagnou:
> In a recent thread realloc() was a substantial part of the discussion.
> "Occasionally" the increased data storage will be relocated along
> with the previously stored data. On huge data sets that might be a
> performance factor. Is there any experience or are there any concrete
> factors about the conditions when this relocation happens? - I could
> imagine that it's no issue as long as you're in some kB buffer range,
> but if, say, we're using realloc() to substantially increase buffers
> often it might be an issue to consider. It would be good to get some
> feeling about that internal.
>
> Janis

realloc() is just a convenience funciton. Usually the reallocation
can't happen in-place and a second malloc() followed by a copy and
a free() does the same.
For large data it would be nice if the pages being deallocated later
would be incrementally marked as discardable after copying a portion.
This would result in only a small portion of additional physical
memory being allocated since the newly allocated pages become asso-
ciated with phyiscal pages when they're touched first. Windows has
VirtualAlloc() with MEM_RESET for that, Linux has madvise() with
MADV_DONTNEED.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Malcolm McLean
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 09:31 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: malcolm.arthur.mclean@gmail.com (Malcolm McLean)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 10:31:59 +0100
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <v4ovqf$j5hq$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 11:31:59 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="1d441ad42b482865d1a884aec9340332";
logging-data="628282"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+A6CVh/+JuwpPEkqndZUxwE26JA/zJbJI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:AUcTheegqU4BMNKip5ZjWBIQ+58=
Content-Language: en-GB
In-Reply-To: <875xu8vsen.fsf@bsb.me.uk>
View all headers

On 17/06/2024 10:18, Ben Bacarisse wrote:
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>
>> In a recent thread realloc() was a substantial part of the discussion.
>> "Occasionally" the increased data storage will be relocated along
>> with the previously stored data. On huge data sets that might be a
>> performance factor. Is there any experience or are there any concrete
>> factors about the conditions when this relocation happens? - I could
>> imagine that it's no issue as long as you're in some kB buffer range,
>> but if, say, we're using realloc() to substantially increase buffers
>> often it might be an issue to consider. It would be good to get some
>> feeling about that internal.
>
> There is obviously a cost, but there is (usually) no alternative if
> contiguous storage is required. In practice, the cost is usually
> moderate and can be very effectively managed by using an exponential
> allocation scheme: at every reallocation multiply the storage space by
> some factor greater than 1 (I often use 3/2, but doubling is often used
> as well). This results in O(log(N)) rather than O(N) allocations as in
> your code that added a constant to the size. Of course, some storage is
> wasted (that /might/ be retrieved by a final realloc down to the final
> size) but that's rarely significant.
>
So can we work it out?

Let's assume for the moment that the allocations have a semi-normal
distribution, with negative values disallowed. Now ignoring the first
few values, if we have allocated, say, 1K, we ought to be able to
predict the value by integrating the distribution from 1k to infinity
and taking the mean.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Ben Bacarisse
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 09:55 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
Date: Mon, 17 Jun 2024 10:55:33 +0100
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <87zfrjvqp6.fsf@bsb.me.uk>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Mon, 17 Jun 2024 11:55:34 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f69b21fe46179def9aae9c77317b77e3";
logging-data="615472"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX199v8u6tLRfSX+cYrx+9x9xbaFUdoCmeCI="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:StE9LcOr9N8aN8/cCOP2wtfcoh8=
sha1:Oll7KiAojbwOqTdghjGzVCJD4AM=
X-BSB-Auth: 1.d5809cbd62f99a5f327a.20240617105533BST.87zfrjvqp6.fsf@bsb.me.uk
View all headers

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On 17/06/2024 10:18, Ben Bacarisse wrote:
>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>
>>> In a recent thread realloc() was a substantial part of the discussion.
>>> "Occasionally" the increased data storage will be relocated along
>>> with the previously stored data. On huge data sets that might be a
>>> performance factor. Is there any experience or are there any concrete
>>> factors about the conditions when this relocation happens? - I could
>>> imagine that it's no issue as long as you're in some kB buffer range,
>>> but if, say, we're using realloc() to substantially increase buffers
>>> often it might be an issue to consider. It would be good to get some
>>> feeling about that internal.
>> There is obviously a cost, but there is (usually) no alternative if
>> contiguous storage is required. In practice, the cost is usually
>> moderate and can be very effectively managed by using an exponential
>> allocation scheme: at every reallocation multiply the storage space by
>> some factor greater than 1 (I often use 3/2, but doubling is often used
>> as well). This results in O(log(N)) rather than O(N) allocations as in
>> your code that added a constant to the size. Of course, some storage is
>> wasted (that /might/ be retrieved by a final realloc down to the final
>> size) but that's rarely significant.
>>
> So can we work it out?

What is "it"?

> Let's assume for the moment that the allocations have a semi-normal
> distribution,

What allocations? The allocations I talked about don't have that
distribution.

> with negative values disallowed. Now ignoring the first few
> values, if we have allocated, say, 1K, we ought to be able to predict the
> value by integrating the distribution from 1k to infinity and taking the
> mean.

I have no idea what you are talking about. What "value" are you looking
to calculate?

--
Ben.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: David Brown
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 12:15 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 14:15:11 +0200
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <v4p9cg$lbo4$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 14:15:12 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="6eee6dfb82180fb756db1a7758fc4b5a";
logging-data="700164"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18hoQkLiBqUX7MJ+stryC9e4Fy003XgYyY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:49/onGY4mU4/oKMULHEjs17B9xY=
Content-Language: en-GB
In-Reply-To: <v4ojs8$gvji$1@dont-email.me>
View all headers

On 17/06/2024 08:08, Janis Papanagnou wrote:
> In a recent thread realloc() was a substantial part of the discussion.
> "Occasionally" the increased data storage will be relocated along
> with the previously stored data. On huge data sets that might be a
> performance factor. Is there any experience or are there any concrete
> factors about the conditions when this relocation happens? - I could
> imagine that it's no issue as long as you're in some kB buffer range,
> but if, say, we're using realloc() to substantially increase buffers
> often it might be an issue to consider. It would be good to get some
> feeling about that internal.
>
> Janis

Consider your target audience and their hardware, the target OS, and the
realistic size of your data. If the target is a PC, you can happily
malloc tens of MB at the start without a care, and for systems that do
not actually allocate system memory until you try to access the area,
there is no cost to this.

So in many situations where you are reading and parsing data from a
file, you can just do the initial malloc with more than enough space for
any realistic input file. You might still implement a realloc solution
for occasional extreme uses, and because it is nice to avoid artificial
limits for programs, but efficiency matters a lot less in those cases.

It may also be the case that even if realloc returns a different address
and logically copies a lot of data, that this is done by smarter virtual
memory mapping so that only the mapping changes, and the underlying
physical ram does not need to be copied. But I don't know if OS's and
realloc implementations are smart enough to do that.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Bonita Montero
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 12:18 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!raubtier-asyl.eternal-september.org!.POSTED!not-for-mail
From: Bonita.Montero@gmail.com (Bonita Montero)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 14:18:17 +0200
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <v4p9i2$lc3v$1@raubtier-asyl.eternal-september.org>
References: <v4ojs8$gvji$1@dont-email.me> <v4p9cg$lbo4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 17 Jun 2024 14:18:11 +0200 (CEST)
Injection-Info: raubtier-asyl.eternal-september.org; posting-host="567fea0737ace9203b0a3bd3650a3401";
logging-data="700543"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/mypHm6CzaUJNGENqmZXKHY6INhdo4q4E="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:W2rLe3FuUpXxzM6H8uF+mG4PNCE=
In-Reply-To: <v4p9cg$lbo4$1@dont-email.me>
Content-Language: de-DE
View all headers

Am 17.06.2024 um 14:15 schrieb David Brown:

> Consider your target audience and their hardware, the target OS, and the
> realistic size of your data.  If the target is a PC, you can happily
> malloc tens of MB at the start without a care, and for systems that do
> not actually allocate system memory until you try to access the area,
> there is no cost to this.

If you have a memcpy() from a to b at the end there's twice the memory
allocated even if the destination pages are assigned while touching a
page.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Malcolm McLean
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 12:45 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: malcolm.arthur.mclean@gmail.com (Malcolm McLean)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 13:45:19 +0100
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <v4pb4v$lhgk$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 14:45:19 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f3df68eb903434ea9efb6aee921a4925";
logging-data="706068"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19sTFwr4AmENuJZLmuA9zuFimiGNcxcCwY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:SEhC4UDf/MewTmUE3uaxLRCL4nE=
Content-Language: en-GB
In-Reply-To: <87zfrjvqp6.fsf@bsb.me.uk>
View all headers

On 17/06/2024 10:55, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>
>> On 17/06/2024 10:18, Ben Bacarisse wrote:
>>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>
>>>> In a recent thread realloc() was a substantial part of the discussion.
>>>> "Occasionally" the increased data storage will be relocated along
>>>> with the previously stored data. On huge data sets that might be a
>>>> performance factor. Is there any experience or are there any concrete
>>>> factors about the conditions when this relocation happens? - I could
>>>> imagine that it's no issue as long as you're in some kB buffer range,
>>>> but if, say, we're using realloc() to substantially increase buffers
>>>> often it might be an issue to consider. It would be good to get some
>>>> feeling about that internal.
>>> There is obviously a cost, but there is (usually) no alternative if
>>> contiguous storage is required. In practice, the cost is usually
>>> moderate and can be very effectively managed by using an exponential
>>> allocation scheme: at every reallocation multiply the storage space by
>>> some factor greater than 1 (I often use 3/2, but doubling is often used
>>> as well). This results in O(log(N)) rather than O(N) allocations as in
>>> your code that added a constant to the size. Of course, some storage is
>>> wasted (that /might/ be retrieved by a final realloc down to the final
>>> size) but that's rarely significant.
>>>
>> So can we work it out?
>
> What is "it"?
>
>> Let's assume for the moment that the allocations have a semi-normal
>> distribution,
>
> What allocations? The allocations I talked about don't have that
> distribution.
>
>> with negative values disallowed. Now ignoring the first few
>> values, if we have allocated, say, 1K, we ought to be able to predict the
>> value by integrating the distribution from 1k to infinity and taking the
>> mean.
>
> I have no idea what you are talking about. What "value" are you looking
> to calculate?
>
We have a continuously growing buffer, and we want the best strategy for
reallocations as the stream of characters comes at us. So, given we now
how many characters have arrived, can we predict how many will arrive,
and therefore ask for the best amount when we reallocate, so that we
neither make too many reallocation (reallocate on every byte received)
or ask for too much (demand SIZE_MAX memory when the first byte is
received).?
Your strategy for avoiding these extremes is exponential growth. You
allocate a small amount for the first few bytes. Then you use
exponential growth, with a factor of ether 2 or 1.5. My question is
whether or not we can be cuter. And of course we need to know the
statistical distribution of the input files. And I'm assuming a
semi-normal distribution, ignoring the files with small values, which we
will allocate enough for anyway.

And so we integrate the distribution between the point we are at and
infinity. Then we tkae the mean. And that gives us a best estimate of
how many bytes are to come, and therefore how much to grow the buffer by.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Janis Papanagnou
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 13:21 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 15:21:24 +0200
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <v4pd8l$m519$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 15:21:26 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="fe0891cf9321877d7d070b255a029de4";
logging-data="726057"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18eEZO575jvtDRJNaRZ8do7"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:wgdljxK8H/T0ErZqxclWGIjhl8A=
In-Reply-To: <v4ojs8$gvji$1@dont-email.me>
X-Enigmail-Draft-Status: N1110
View all headers

On 17.06.2024 08:08, Janis Papanagnou wrote:
> In a recent thread realloc() was a substantial part of the discussion.
> "Occasionally" the increased data storage will be relocated along
> with the previously stored data. On huge data sets that might be a
> performance factor. Is there any experience or are there any concrete
> factors about the conditions when this relocation happens? - I could
> imagine that it's no issue as long as you're in some kB buffer range,
> but if, say, we're using realloc() to substantially increase buffers
> often it might be an issue to consider. It would be good to get some
> feeling about that internal.

Let me add...

I'd assume that there's some basic allocation size defined; some
simple test sample with a handful of bytes didn't relocate the data.
Yet I don't know whether allocated memory is managed sequentially
or has linked blocks. A peek info the source code might help. What
I found is this comment for extending chunks:[*]
* Extending forward into following adjacent free chunk.
* Shifting backwards, joining preceding adjacent space
* Both shifting backwards and extending forward.
* Extending into newly sbrked space
Going to investigate that source code[*] later...

Janis

[*] https://elixir.bootlin.com/glibc/glibc-2.1.2/source/malloc/malloc.c
(line 3077 ff)

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Ben Bacarisse
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 14:33 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
Date: Mon, 17 Jun 2024 15:33:47 +0100
Organization: A noiseless patient Spider
Lines: 81
Message-ID: <87tthrvdtg.fsf@bsb.me.uk>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Mon, 17 Jun 2024 16:33:49 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f69b21fe46179def9aae9c77317b77e3";
logging-data="754711"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18xjJeqMTh2duIlWjnxKBGQgZFsVH+QD8Y="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:/lyU4PKPDHSE2sovhi5TWKIdXT0=
sha1:ghF6UUTOh4O+sE2BeP31c1OFOpA=
X-BSB-Auth: 1.c9787052b0049375b894.20240617153347BST.87tthrvdtg.fsf@bsb.me.uk
View all headers

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On 17/06/2024 10:55, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>
>>> On 17/06/2024 10:18, Ben Bacarisse wrote:
>>>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>
>>>>> In a recent thread realloc() was a substantial part of the discussion.
>>>>> "Occasionally" the increased data storage will be relocated along
>>>>> with the previously stored data. On huge data sets that might be a
>>>>> performance factor. Is there any experience or are there any concrete
>>>>> factors about the conditions when this relocation happens? - I could
>>>>> imagine that it's no issue as long as you're in some kB buffer range,
>>>>> but if, say, we're using realloc() to substantially increase buffers
>>>>> often it might be an issue to consider. It would be good to get some
>>>>> feeling about that internal.
>>>> There is obviously a cost, but there is (usually) no alternative if
>>>> contiguous storage is required. In practice, the cost is usually
>>>> moderate and can be very effectively managed by using an exponential
>>>> allocation scheme: at every reallocation multiply the storage space by
>>>> some factor greater than 1 (I often use 3/2, but doubling is often used
>>>> as well). This results in O(log(N)) rather than O(N) allocations as in
>>>> your code that added a constant to the size. Of course, some storage is
>>>> wasted (that /might/ be retrieved by a final realloc down to the final
>>>> size) but that's rarely significant.
>>>>
>>> So can we work it out?
>> What is "it"?
>>
>>> Let's assume for the moment that the allocations have a semi-normal
>>> distribution,
>> What allocations? The allocations I talked about don't have that
>> distribution.
>>
>>> with negative values disallowed. Now ignoring the first few
>>> values, if we have allocated, say, 1K, we ought to be able to predict the
>>> value by integrating the distribution from 1k to infinity and taking the
>>> mean.
>> I have no idea what you are talking about. What "value" are you looking
>> to calculate?
>>
> We have a continuously growing buffer, and we want the best strategy for
> reallocations as the stream of characters comes at us. So, given we now how
> many characters have arrived, can we predict how many will arrive, and
> therefore ask for the best amount when we reallocate, so that we neither
> make too many reallocation (reallocate on every byte received) or ask for
> too much (demand SIZE_MAX memory when the first byte is received).?

Obviously not, or we'd use the prediction. You question was probably
rhetorical, but it didn't read that way.

> Your strategy for avoiding these extremes is exponential growth.

It's odd to call it mine. It's very widely know and used. "The one I
mentioned" might be less confusing description.

> You
> allocate a small amount for the first few bytes. Then you use exponential
> growth, with a factor of ether 2 or 1.5. My question is whether or not we
> can be cuter. And of course we need to know the statistical distribution of
> the input files. And I'm assuming a semi-normal distribution, ignoring the
> files with small values, which we will allocate enough for anyway.
>
> And so we integrate the distribution between the point we are at and
> infinity. Then we tkae the mean. And that gives us a best estimate of how
> many bytes are to come, and therefore how much to grow the buffer by.

I would be surprised if that were worth the effort at run time. A
static analysis of "typical" input sizes might be interesting as that
could be used to get an estimate of good factors to use, but anything
more complicated than maybe a few factors (e.g. doubling up to 1MB then
3/2 thereafter) is likely to be too messy to useful.

Also, the cost of reallocations is not constant. Larger ones are
usually more costly than small ones, so if one were going to a lot of
effort to make run-time guesses, that cost should be factored in as
well.

--
Ben.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Anton Shepelev
Newsgroups: comp.lang.c, sci.stat.math
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 15:02 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton.txt@g{oogle}mail.com (Anton Shepelev)
Newsgroups: comp.lang.c,sci.stat.math
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 18:02:49 +0300
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <20240617180249.96dfaafa89392827aa162434@g{oogle}mail.com>
References: <v4ojs8$gvji$1@dont-email.me>
<875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me>
<87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 17:02:50 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e4efe3e0e0736d482d811e40a75979fd";
logging-data="765209"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qs70P/WDPq5rQMWzxksVuDUCmf03wz6Y="
Cancel-Lock: sha1:YDo8xQBFeGxeLQcXyFO0Ml++fwE=
X-Newsreader: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
View all headers

[cross-posted to: ci.stat.math]

Malcolm McLean:

> We have a continuously growing buffer, and we want the
> best strategy for reallocations as the stream of
> characters comes at us. So, given we now how many
> characters have arrived, can we predict how many will
> arrive,

Do you mean in the next bunch, or in total (till the end of
the buffer's lifetime)?

> and therefore ask for the best amount when we reallocate,
> so that we neither make too many reallocation (reallocate
> on every byte received) or ask for too much (demand
> SIZE_MAX memory when the first byte is received).?
>
> Your strategy for avoiding these extremes is exponential
> growth. You allocate a small amount for the first few
> bytes. Then you use exponential growth, with a factor of
> ether 2 or 1.5.

This strategy ensures a constant ratio between the amount of
reallocated data to the length of the buffer by making
reallocations less frequent as the buffer grows.

> And so we integrate the distribution between the point we
> are at and infinity. Then we tkae the mean. And that gives
> us a best estimate of how many bytes are to come, and
> therefore how much to grow the buffer by.

You have an apriori distribution of the buffer size (can be
tracked on-the-fly, if unknown beforehand) and a partially
filled buffer. The task is to calculate the a-posteriori
distribution of /that/ buffer's final size, and then to
allocate the predicted value based on a good percentile.

How about using a percentile instead of the mean, e.g. if
the current size corresponds to percentile p, you allocate a
capacity corresponding to percentile 1-(1-p)/k , where k>1
denotes the balance between space and time efficency. For
example, if the 60th percentile of the buffer is required
and k=2, you allocate a capacity sufficient to hold
100-(100-60)/2=80% of buffers.

--
() ascii ribbon campaign -- against html e-mail
/\ www.asciiribbon.org -- against proprietary attachments

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Anton Shepelev
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 15:10 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton.txt@g{oogle}mail.com (Anton Shepelev)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 18:10:34 +0300
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <20240617181034.74fb4cca1f4a9a3ea032825e@g{oogle}mail.com>
References: <v4ojs8$gvji$1@dont-email.me>
<875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me>
<87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me>
<87tthrvdtg.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 17:10:35 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e4efe3e0e0736d482d811e40a75979fd";
logging-data="765209"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ptEGIl30xzR/Af4nLwO8MbpdaoWXCw1U="
Cancel-Lock: sha1:dK8VWqCndljAy7vcvU9jbT5fx+Y=
X-Newsreader: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
View all headers

Ben Bacarisse to Malcolm McLean:

> > We have a continuously growing buffer, and we want the
> > best strategy for reallocations as the stream of
> > characters comes at us. So, given we now how many
> > characters have arrived, can we predict how many will
> > arrive, and therefore ask for the best amount when we
> > reallocate, so that we neither make too many
> > reallocation (reallocate on every byte received) or ask
> > for too much (demand SIZE_MAX memory when the first byte
> > is received).?
>
> Obviously not, or we'd use the prediction.

Not so obvious to me, for the exponential algorithm may be
the best when the distribution of buffer size is /not/
known, whereas Malcolm is interested in the cases when we
know it.

> > Your strategy for avoiding these extremes is exponential
> > growth.
>
> It's odd to call it mine. It's very widely know and used.
> "The one I mentioned" might be less confusing description.

I think it is a modern English idiom, which I dislike as
well. StackOverflow is full of questions starting like:
"How do you do this?" and "How do I do that?" They are
informal ways of the more literary "How does one do this?"
or "What is the way to do that?"

--
() ascii ribbon campaign -- against html e-mail
/\ www.asciiribbon.org -- against proprietary attachments

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Richard Harnden
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 15:15 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: richard.nospam@gmail.invalid (Richard Harnden)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 16:15:27 +0100
Organization: A noiseless patient Spider
Lines: 100
Message-ID: <v4pjuf$n98h$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me> <87tthrvdtg.fsf@bsb.me.uk>
Reply-To: richard.harnden@invalid.com
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 17:15:27 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="adebfc758c55f317ee4a5e3ad2ba1ae4";
logging-data="763153"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18DEZbDBn2/EW7dF+Uy3L5HP8AllpeU+Eo="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:gaB1NPToFxo/ulraijCfCFyn8hU=
Content-Language: en-GB
In-Reply-To: <87tthrvdtg.fsf@bsb.me.uk>
View all headers

On 17/06/2024 15:33, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>
>> On 17/06/2024 10:55, Ben Bacarisse wrote:
>>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>>
>>>> On 17/06/2024 10:18, Ben Bacarisse wrote:
>>>>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>>
>>>>>> In a recent thread realloc() was a substantial part of the discussion.
>>>>>> "Occasionally" the increased data storage will be relocated along
>>>>>> with the previously stored data. On huge data sets that might be a
>>>>>> performance factor. Is there any experience or are there any concrete
>>>>>> factors about the conditions when this relocation happens? - I could
>>>>>> imagine that it's no issue as long as you're in some kB buffer range,
>>>>>> but if, say, we're using realloc() to substantially increase buffers
>>>>>> often it might be an issue to consider. It would be good to get some
>>>>>> feeling about that internal.
>>>>> There is obviously a cost, but there is (usually) no alternative if
>>>>> contiguous storage is required. In practice, the cost is usually
>>>>> moderate and can be very effectively managed by using an exponential
>>>>> allocation scheme: at every reallocation multiply the storage space by
>>>>> some factor greater than 1 (I often use 3/2, but doubling is often used
>>>>> as well). This results in O(log(N)) rather than O(N) allocations as in
>>>>> your code that added a constant to the size. Of course, some storage is
>>>>> wasted (that /might/ be retrieved by a final realloc down to the final
>>>>> size) but that's rarely significant.
>>>>>
>>>> So can we work it out?
>>> What is "it"?
>>>
>>>> Let's assume for the moment that the allocations have a semi-normal
>>>> distribution,
>>> What allocations? The allocations I talked about don't have that
>>> distribution.
>>>
>>>> with negative values disallowed. Now ignoring the first few
>>>> values, if we have allocated, say, 1K, we ought to be able to predict the
>>>> value by integrating the distribution from 1k to infinity and taking the
>>>> mean.
>>> I have no idea what you are talking about. What "value" are you looking
>>> to calculate?
>>>
>> We have a continuously growing buffer, and we want the best strategy for
>> reallocations as the stream of characters comes at us. So, given we now how
>> many characters have arrived, can we predict how many will arrive, and
>> therefore ask for the best amount when we reallocate, so that we neither
>> make too many reallocation (reallocate on every byte received) or ask for
>> too much (demand SIZE_MAX memory when the first byte is received).?
>
> Obviously not, or we'd use the prediction. You question was probably
> rhetorical, but it didn't read that way.
>
>> Your strategy for avoiding these extremes is exponential growth.
>
> It's odd to call it mine. It's very widely know and used. "The one I
> mentioned" might be less confusing description.
>
>> You
>> allocate a small amount for the first few bytes. Then you use exponential
>> growth, with a factor of ether 2 or 1.5. My question is whether or not we
>> can be cuter. And of course we need to know the statistical distribution of
>> the input files. And I'm assuming a semi-normal distribution, ignoring the
>> files with small values, which we will allocate enough for anyway.
>>
>> And so we integrate the distribution between the point we are at and
>> infinity. Then we tkae the mean. And that gives us a best estimate of how
>> many bytes are to come, and therefore how much to grow the buffer by.
>
> I would be surprised if that were worth the effort at run time. A
> static analysis of "typical" input sizes might be interesting as that
> could be used to get an estimate of good factors to use, but anything
> more complicated than maybe a few factors (e.g. doubling up to 1MB then
> 3/2 thereafter) is likely to be too messy to useful.
>
> Also, the cost of reallocations is not constant. Larger ones are
> usually more costly than small ones, so if one were going to a lot of
> effort to make run-time guesses, that cost should be factored in as
> well.
>

I usually keep track:

struct
{ size_t used;
size_t allocated;
void *data;
};

Then, if used + new_size is more than what's already been allocated then
a realloc will be required.

Start with an initial allocated size that's 'resonable' - the happy path
will never need any reallocs.

Otherwise multiply by some factor. Typicall I just double it.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Malcolm McLean
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 15:36 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: malcolm.arthur.mclean@gmail.com (Malcolm McLean)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 16:36:57 +0100
Organization: A noiseless patient Spider
Lines: 127
Message-ID: <v4pl6q$nn9o$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me> <87tthrvdtg.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 17:36:59 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f27d645ef399f596eab7376e05efef48";
logging-data="777528"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ErpMsDws0af1qDI3+Tb9KEvhvu7nzNSs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ALtfStmMaQYyNc8q7OPL/l+bMDs=
In-Reply-To: <87tthrvdtg.fsf@bsb.me.uk>
Content-Language: en-GB
View all headers

On 17/06/2024 15:33, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>
>> On 17/06/2024 10:55, Ben Bacarisse wrote:
>>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>>
>>>> On 17/06/2024 10:18, Ben Bacarisse wrote:
>>>>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>>
>>>>>> In a recent thread realloc() was a substantial part of the discussion.
>>>>>> "Occasionally" the increased data storage will be relocated along
>>>>>> with the previously stored data. On huge data sets that might be a
>>>>>> performance factor. Is there any experience or are there any concrete
>>>>>> factors about the conditions when this relocation happens? - I could
>>>>>> imagine that it's no issue as long as you're in some kB buffer range,
>>>>>> but if, say, we're using realloc() to substantially increase buffers
>>>>>> often it might be an issue to consider. It would be good to get some
>>>>>> feeling about that internal.
>>>>> There is obviously a cost, but there is (usually) no alternative if
>>>>> contiguous storage is required. In practice, the cost is usually
>>>>> moderate and can be very effectively managed by using an exponential
>>>>> allocation scheme: at every reallocation multiply the storage space by
>>>>> some factor greater than 1 (I often use 3/2, but doubling is often used
>>>>> as well). This results in O(log(N)) rather than O(N) allocations as in
>>>>> your code that added a constant to the size. Of course, some storage is
>>>>> wasted (that /might/ be retrieved by a final realloc down to the final
>>>>> size) but that's rarely significant.
>>>>>
>>>> So can we work it out?
>>> What is "it"?
>>>
>>>> Let's assume for the moment that the allocations have a semi-normal
>>>> distribution,
>>> What allocations? The allocations I talked about don't have that
>>> distribution.
>>>
>>>> with negative values disallowed. Now ignoring the first few
>>>> values, if we have allocated, say, 1K, we ought to be able to predict the
>>>> value by integrating the distribution from 1k to infinity and taking the
>>>> mean.
>>> I have no idea what you are talking about. What "value" are you looking
>>> to calculate?
>>>
>> We have a continuously growing buffer, and we want the best strategy for
>> reallocations as the stream of characters comes at us. So, given we now how
>> many characters have arrived, can we predict how many will arrive, and
>> therefore ask for the best amount when we reallocate, so that we neither
>> make too many reallocation (reallocate on every byte received) or ask for
>> too much (demand SIZE_MAX memory when the first byte is received).?
>
> Obviously not, or we'd use the prediction. You question was probably
> rhetorical, but it didn't read that way.
>
>> Your strategy for avoiding these extremes is exponential growth.
>
> It's odd to call it mine. It's very widely know and used. "The one I
> mentioned" might be less confusing description.
>
>> You
>> allocate a small amount for the first few bytes. Then you use exponential
>> growth, with a factor of ether 2 or 1.5. My question is whether or not we
>> can be cuter. And of course we need to know the statistical distribution of
>> the input files. And I'm assuming a semi-normal distribution, ignoring the
>> files with small values, which we will allocate enough for anyway.
>>
>> And so we integrate the distribution between the point we are at and
>> infinity. Then we tkae the mean. And that gives us a best estimate of how
>> many bytes are to come, and therefore how much to grow the buffer by.
>
> I would be surprised if that were worth the effort at run time. A
> static analysis of "typical" input sizes might be interesting as that
> could be used to get an estimate of good factors to use, but anything
> more complicated than maybe a few factors (e.g. doubling up to 1MB then
> 3/2 thereafter) is likely to be too messy to useful.
>
There's virualy no run-time effort, unless you ask caller to pass in a
customised distribution, which you analyse on the fly, which would be
quite a bit of work.
All the work is done beforehand. We need a statistical distribution of
the files sizes of the files we are interesed in. So, probably, text
files on personal computers. Then we'll excude the small files, say
under 1kb which will have an odd distribution for various reasons, and
which we are not interested in as we can easily afford 1k as an initial
buffer.

And we're probably looking at a semi- normal, maybe log- normal
distribution. There's no reason to suspect it would be anything odd. And
with the normal distribution there is no closed form integral, but
tables of integrals are published.

So we convert 1K to a Z-score, integrate from that to infinity, halve
the result, and that gives us an estimate of the most likely file size -
having established that the file is over 1k, half will be below and half
above this size. So that's the next amount to realloc. Say, for the sake
of argument, 4K. Then we do the same thing, starting from 4k, and
working out the most likely file size, given that the file is over 4K.
Now the disribution tends to flatten out towards the tail, so if best
guess, given at least 1K, was 4K, best guess diven 4k, won't be 8K. It
will be 10k, maybe 12k. Do the same again for 12k. And we'll get a
series of numbers like this.

1k, 4k, 10k, 20k, 50k, 120k, 500k, 2MB, 8MB ...

and so on, rapidly increasing to SIZE_MAX. And then at runtime we just
hardcode those in, it's a lookup table with not too many entries.

Becuase we've chosen the mean, half the time you will reallocate. You
can easily fine tune the strategy by choosing a proportion other than
0.5, depending on whether saving memory or reducing allocations is more
important to you.

and the hard part is getting some real statistics to work on.
>
> Also, the cost of reallocations is not constant. Larger ones are
> usually more costly than small ones, so if one were going to a lot of
> effort to make run-time guesses, that cost should be factored in as
> well.
>
Unfortunately yes. Real optimisation problems can be almost impossible
for reasons like this. iF the cost of reallocations isn't constant,
tou've got to put in correctiong factors, and then what was a fairly
simple procedure becomes difficult.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Scott Lurndal
Newsgroups: comp.lang.c
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 17 Jun 2024 16:50 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!tncsrv06.tnetconsulting.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
Newsgroups: comp.lang.c
References: <v4ojs8$gvji$1@dont-email.me>
Lines: 16
Message-ID: <3PZbO.32803$Kxzd.19726@fx15.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 17 Jun 2024 16:50:07 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 17 Jun 2024 16:50:07 GMT
X-Received-Bytes: 1511
View all headers

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>In a recent thread realloc() was a substantial part of the discussion.
>"Occasionally" the increased data storage will be relocated along
>with the previously stored data. On huge data sets that might be a
>performance factor. Is there any experience or are there any concrete
>factors about the conditions when this relocation happens? - I could
>imagine that it's no issue as long as you're in some kB buffer range,
>but if, say, we're using realloc() to substantially increase buffers
>often it might be an issue to consider. It would be good to get some
>feeling about that internal.

I've not found a use for realloc in the last forty five years, myself.

I suspect that the performance issues are not an issue for relatively
small datasets, and are often exhibited during the non-performance critical
'setup' phase of an algorithm.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Scott Lurndal
Newsgroups: comp.lang.c
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 17 Jun 2024 16:58 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
Newsgroups: comp.lang.c
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk> <v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk> <v4pb4v$lhgk$1@dont-email.me>
Lines: 27
Message-ID: <gXZbO.32804$Kxzd.2258@fx15.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 17 Jun 2024 16:58:52 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 17 Jun 2024 16:58:52 GMT
X-Received-Bytes: 1660
View all headers

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>On 17/06/2024 10:55, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>

>>
>> I have no idea what you are talking about. What "value" are you looking
>> to calculate?
>>
>We have a continuously growing buffer,

At this point, you should be asking yourself
if there are better alternatives for storing
the incoming data than to a continuously growing
dynamically allocated piecemeal buffer.

C character stdio tends to work well for streaming applications
(i.e. pipelines where the input is (minimally) processed and forwarded
to the output), but not so efficiently for applications that need to
look at the data en masse.

Personnally, I'd mmap the input file and eschew stdio completely
and just walk through memory with the appropriate pointer.

(mmap showed up in the late 80s, so you can pretend it
is C90 if you like).

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Michael S
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 17:20 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 20:20:57 +0300
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <20240617202057.000001a2@yahoo.com>
References: <v4ojs8$gvji$1@dont-email.me>
<3PZbO.32803$Kxzd.19726@fx15.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 19:20:40 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="21c0d5cac32843964269a269a3ca89e9";
logging-data="607686"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/CEuwtSPRezCEzXKVecOH4AUbxt33hI/E="
Cancel-Lock: sha1:/mvWFAxutm96jFpyloNuDudqGNs=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
View all headers

On Mon, 17 Jun 2024 16:50:07 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:

> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
> >In a recent thread realloc() was a substantial part of the
> >discussion. "Occasionally" the increased data storage will be
> >relocated along with the previously stored data. On huge data sets
> >that might be a performance factor. Is there any experience or are
> >there any concrete factors about the conditions when this relocation
> >happens? - I could imagine that it's no issue as long as you're in
> >some kB buffer range, but if, say, we're using realloc() to
> >substantially increase buffers often it might be an issue to
> >consider. It would be good to get some feeling about that internal.
>
> I've not found a use for realloc in the last forty five years, myself.
>

Did you find use for std::vector:resize()?
If yes, that could be major reason behind not finding use for realloc().
Another possible reason is coding for environments where dynamic
allocation either not used at all or used only during start up.

At least for me those are major reasons why I very rarely used realloc
since beginning of programming as a pro.

> I suspect that the performance issues are not an issue for relatively
> small datasets, and are often exhibited during the non-performance
> critical 'setup' phase of an algorithm.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: David Brown
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 18:11 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 20:11:48 +0200
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <v4pu94$rlv8$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 17 Jun 2024 20:11:48 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c4e4b5756e2db6b50d94114c01c52c76";
logging-data="907240"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+vZHzQ924OK8Uhg3e+SZwoh06+tm9mgHw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:BJ2wRFizYpsBjMf9s254l3077tk=
In-Reply-To: <v4ovqf$j5hq$1@dont-email.me>
Content-Language: en-GB
View all headers

On 17/06/2024 11:31, Malcolm McLean wrote:
> On 17/06/2024 10:18, Ben Bacarisse wrote:
>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>
>>> In a recent thread realloc() was a substantial part of the discussion.
>>> "Occasionally" the increased data storage will be relocated along
>>> with the previously stored data. On huge data sets that might be a
>>> performance factor. Is there any experience or are there any concrete
>>> factors about the conditions when this relocation happens? - I could
>>> imagine that it's no issue as long as you're in some kB buffer range,
>>> but if, say, we're using realloc() to substantially increase buffers
>>> often it might be an issue to consider. It would be good to get some
>>> feeling about that internal.
>>
>> There is obviously a cost, but there is (usually) no alternative if
>> contiguous storage is required.  In practice, the cost is usually
>> moderate and can be very effectively managed by using an exponential
>> allocation scheme: at every reallocation multiply the storage space by
>> some factor greater than 1 (I often use 3/2, but doubling is often used
>> as well).  This results in O(log(N)) rather than O(N) allocations as in
>> your code that added a constant to the size.  Of course, some storage is
>> wasted (that /might/ be retrieved by a final realloc down to the final
>> size) but that's rarely significant.
>>
> So can we work it out?
>
> Let's assume for the moment that the allocations have a semi-normal
> distribution, with negative values disallowed. Now ignoring the first
> few values, if we have allocated, say, 1K, we ought to be able to
> predict the value by integrating the distribution from 1k to infinity
> and taking the mean.
>

First, there is no reason for assuming such a distribution, other than
saying "lots of things are roughly normal".

Secondly, knowing the distribution gives you /no/ information about any
given particular case. You know the distribution for the results of
rolling two die - does that mean you can predict the next roll?

Thirdly, not all distributions have a mean (look up the Cauchy
distribution if you like).

Fourthly, even if you know the mean, it tells you nothing of use.

Knowing a bit about the distribution of file sizes can be useful, but
not nearly in the way you describe here. If you know that the files are
rarely or never bigger than 10 MB, malloc 10 MB and forget the realloc.
If you know they are often bigger than that, mmap the file and forget
the realloc.

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Scott Lurndal
Newsgroups: comp.lang.c
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 17 Jun 2024 19:02 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!tncsrv06.tnetconsulting.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx34.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
Newsgroups: comp.lang.c
References: <v4ojs8$gvji$1@dont-email.me> <3PZbO.32803$Kxzd.19726@fx15.iad> <20240617202057.000001a2@yahoo.com>
Lines: 32
Message-ID: <VK%bO.4804$Gurd.1876@fx34.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 17 Jun 2024 19:02:13 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 17 Jun 2024 19:02:13 GMT
X-Received-Bytes: 2137
View all headers

Michael S <already5chosen@yahoo.com> writes:
>On Mon, 17 Jun 2024 16:50:07 GMT
>scott@slp53.sl.home (Scott Lurndal) wrote:
>
>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>> >In a recent thread realloc() was a substantial part of the
>> >discussion. "Occasionally" the increased data storage will be
>> >relocated along with the previously stored data. On huge data sets
>> >that might be a performance factor. Is there any experience or are
>> >there any concrete factors about the conditions when this relocation
>> >happens? - I could imagine that it's no issue as long as you're in
>> >some kB buffer range, but if, say, we're using realloc() to
>> >substantially increase buffers often it might be an issue to
>> >consider. It would be good to get some feeling about that internal.
>>
>> I've not found a use for realloc in the last forty five years, myself.
>>
>
>Did you find use for std::vector:resize()?

I'm pretty sure (checks) that I posted this reply to comp.lang.c.

std::vector::resize() doesn't work well from C (well, I can mangle
the names and use an explicit this pointer, but why bother?).

>If yes, that could be major reason behind not finding use for realloc().
>Another possible reason is coding for environments where dynamic
>allocation either not used at all or used only during start up.

Or because the algorithms used don't call for realloc. Or there
are better alternatives (like mmap).

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Chris M. Thomasson
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 20:05 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 13:05:23 -0700
Organization: A noiseless patient Spider
Lines: 142
Message-ID: <v4q4u4$t6bd$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me> <87tthrvdtg.fsf@bsb.me.uk>
<v4pjuf$n98h$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 17 Jun 2024 22:05:25 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c5aa21a2ffdc11ceb19a167156cbaefd";
logging-data="956781"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+G9Kcch5a33ZEK89uiG62rx2EmqOyiyBc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:vfJQnXZdYe9Frj3PnfnoMFrydrA=
In-Reply-To: <v4pjuf$n98h$1@dont-email.me>
Content-Language: en-US
View all headers

On 6/17/2024 8:15 AM, Richard Harnden wrote:
> On 17/06/2024 15:33, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>
>>> On 17/06/2024 10:55, Ben Bacarisse wrote:
>>>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>>>
>>>>> On 17/06/2024 10:18, Ben Bacarisse wrote:
>>>>>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>>>
>>>>>>> In a recent thread realloc() was a substantial part of the
>>>>>>> discussion.
>>>>>>> "Occasionally" the increased data storage will be relocated along
>>>>>>> with the previously stored data. On huge data sets that might be a
>>>>>>> performance factor. Is there any experience or are there any
>>>>>>> concrete
>>>>>>> factors about the conditions when this relocation happens? - I could
>>>>>>> imagine that it's no issue as long as you're in some kB buffer
>>>>>>> range,
>>>>>>> but if, say, we're using realloc() to substantially increase buffers
>>>>>>> often it might be an issue to consider. It would be good to get some
>>>>>>> feeling about that internal.
>>>>>> There is obviously a cost, but there is (usually) no alternative if
>>>>>> contiguous storage is required.  In practice, the cost is usually
>>>>>> moderate and can be very effectively managed by using an exponential
>>>>>> allocation scheme: at every reallocation multiply the storage
>>>>>> space by
>>>>>> some factor greater than 1 (I often use 3/2, but doubling is often
>>>>>> used
>>>>>> as well).  This results in O(log(N)) rather than O(N) allocations
>>>>>> as in
>>>>>> your code that added a constant to the size.  Of course, some
>>>>>> storage is
>>>>>> wasted (that /might/ be retrieved by a final realloc down to the
>>>>>> final
>>>>>> size) but that's rarely significant.
>>>>>>
>>>>> So can we work it out?
>>>> What is "it"?
>>>>
>>>>> Let's assume for the moment that the allocations have a semi-normal
>>>>> distribution,
>>>> What allocations?  The allocations I talked about don't have that
>>>> distribution.
>>>>
>>>>> with negative values disallowed. Now ignoring the first few
>>>>> values, if we have allocated, say, 1K, we ought to be able to
>>>>> predict the
>>>>> value by integrating the distribution from 1k to infinity and
>>>>> taking the
>>>>> mean.
>>>> I have no idea what you are talking about.  What "value" are you
>>>> looking
>>>> to calculate?
>>>>
>>> We have a continuously growing buffer, and we want the best strategy for
>>> reallocations as the stream of characters comes at us. So, given we
>>> now how
>>> many characters have arrived, can we predict how many will arrive, and
>>> therefore ask for the best amount when we reallocate, so that we neither
>>> make too many reallocation (reallocate on every byte received) or ask
>>> for
>>> too much (demand SIZE_MAX memory when the first byte is received).?
>>
>> Obviously not, or we'd use the prediction.  You question was probably
>> rhetorical, but it didn't read that way.
>>
>>> Your strategy for avoiding these extremes is exponential growth.
>>
>> It's odd to call it mine.  It's very widely know and used.  "The one I
>> mentioned" might be less confusing description.
>>
>>> You
>>> allocate a small amount for the first few bytes. Then you use
>>> exponential
>>> growth, with a factor of ether 2 or 1.5. My question is whether or
>>> not we
>>> can be cuter. And of course we need to know the statistical
>>> distribution of
>>> the input files. And I'm assuming a semi-normal distribution,
>>> ignoring the
>>> files with small values, which we will allocate enough for anyway.
>>>
>>> And so we integrate the distribution between the point we are at and
>>> infinity. Then we tkae the mean. And that gives us a best estimate of
>>> how
>>> many bytes are to come, and therefore how much to grow the buffer by.
>>
>> I would be surprised if that were worth the effort at run time.  A
>> static analysis of "typical" input sizes might be interesting as that
>> could be used to get an estimate of good factors to use, but anything
>> more complicated than maybe a few factors (e.g. doubling up to 1MB then
>> 3/2 thereafter) is likely to be too messy to useful.
>>
>> Also, the cost of reallocations is not constant.  Larger ones are
>> usually more costly than small ones, so if one were going to a lot of
>> effort to make run-time guesses, that cost should be factored in as
>> well.
>>
>
> I usually keep track:
>
> struct
> {
>     size_t used;
>     size_t allocated;
>     void *data;
> };
>
> Then, if used + new_size is more than what's already been allocated then
> a realloc will be required.

Fwiw, I remember way back using an n-ary tree of regions to accomplish
it. The trigger for creating a new region was very similar to your
logic. It performed pretty good and did not use realloc. Indeed it was
for a special use case. I remember having region partitions that would
link to other regions. Actually, it kind of reminded me of a strange
version of ropes:

https://en.wikipedia.org/wiki/Rope_(data_structure)

Fwiw, here is my old C code for a region:

https://pastebin.com/raw/f37a23918
(raw text, no ads)

Iirc, I first mentioned it in:

https://groups.google.com/g/comp.lang.c/c/7oaJFWKVCTw/m/sSWYU9BUS_QJ

>
> Start with an initial allocated size that's 'resonable' - the happy path
> will never need any reallocs.
>
> Otherwise multiply by some factor.  Typicall I just double it.
>
>
>

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Chris M. Thomasson
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 20:12 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Mon, 17 Jun 2024 13:12:54 -0700
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <v4q5c7$t6bd$2@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me> <gXZbO.32804$Kxzd.2258@fx15.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 22:12:56 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c5aa21a2ffdc11ceb19a167156cbaefd";
logging-data="956781"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19R5y7GSA0vK7hlE5IobJSnvtEbXDc1BUQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:DPM1zLWSVFJujfc2Ft7sxTt8bas=
Content-Language: en-US
In-Reply-To: <gXZbO.32804$Kxzd.2258@fx15.iad>
View all headers

On 6/17/2024 9:58 AM, Scott Lurndal wrote:
> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>> On 17/06/2024 10:55, Ben Bacarisse wrote:
>>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>>
>
>>>
>>> I have no idea what you are talking about. What "value" are you looking
>>> to calculate?
>>>
>> We have a continuously growing buffer,
>
> At this point, you should be asking yourself
> if there are better alternatives for storing
> the incoming data than to a continuously growing
> dynamically allocated piecemeal buffer.
>
> C character stdio tends to work well for streaming applications
> (i.e. pipelines where the input is (minimally) processed and forwarded
> to the output), but not so efficiently for applications that need to
> look at the data en masse.

Indeed.

> Personnally, I'd mmap the input file and eschew stdio completely
> and just walk through memory with the appropriate pointer.

That works for sure. Actually, its been a while since I have used memory
mapped files. I made a little database thing with them that multiple
processes could use, lock-free... ;^)

Also, I remember a nifty trick to use memory mapped files with certain
lock-free algorithms. I need to find that old post over on
comp.programming.threads. Have a feeling that its going to be hard to find!

> (mmap showed up in the late 80s, so you can pretend it
> is C90 if you like).

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Keith Thompson
Newsgroups: comp.lang.c
Organization: None to speak of
Date: Mon, 17 Jun 2024 23:39 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
Date: Mon, 17 Jun 2024 16:39:38 -0700
Organization: None to speak of
Lines: 36
Message-ID: <87h6dr16md.fsf@nosuchdomain.example.com>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Tue, 18 Jun 2024 01:39:39 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="aed299878570cb32e21d076f9aa05b90";
logging-data="1021804"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18vKDNGLP35MD0XerET/ycB"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:GkmtLc3j2YrjEpiDJo0z/fcUCxc=
sha1:Wj3JtfDwnhVNU2WpzKIMs699QaM=
View all headers

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
> We have a continuously growing buffer, and we want the best strategy
> for reallocations as the stream of characters comes at us. So, given
> we now how many characters have arrived, can we predict how many will
> arrive, and therefore ask for the best amount when we reallocate, so
> that we neither make too many reallocation (reallocate on every byte
> received) or ask for too much (demand SIZE_MAX memory when the first
> byte is received).?
[...]

That's going to depend on the behavior of the implementations
realloc() function.

malloc() and realloc() typically allocate more memory than they're
asked for, so that another realloc() call can often expand the
allocated memory in place.

I have a small test program that uses realloc() to expand a 1-byte
allocated buffer to 2 bytes, then 3, then 4, and so on. In one
implementation, in one test run, it reallocates at 25 bytes, and
then not again until just over 128 kbytes. Other implementations
behave differently.

A realloc() call that's able to return its first argument is likely
to be much quicker than one that does an actual reallocation.
If you want to fine tune for performance, you'll have to allow
for the implementation you're using, either by performing run-time
measurements or by analyzing the implementation (which could change
in the next release).

Multiplying the size by 2 or 1.5 as needed is likely to be good enough.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Malcolm McLean
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Tue, 18 Jun 2024 05:12 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: malcolm.arthur.mclean@gmail.com (Malcolm McLean)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
relocation?
Date: Tue, 18 Jun 2024 06:12:36 +0100
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <v4r504$16npo$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
<v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk>
<v4pb4v$lhgk$1@dont-email.me> <87h6dr16md.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 18 Jun 2024 07:12:37 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="cba6eb24be5a0db449e57ccf4e727e7c";
logging-data="1269560"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Ad20qp3LCI+KWC9pBx6yGe3EEimANu+4="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:4yZS0NC2wXBXxP72OJStLtEYbM4=
In-Reply-To: <87h6dr16md.fsf@nosuchdomain.example.com>
Content-Language: en-GB
View all headers

On 18/06/2024 00:39, Keith Thompson wrote:
> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
> [...]
>> We have a continuously growing buffer, and we want the best strategy
>> for reallocations as the stream of characters comes at us. So, given
>> we now how many characters have arrived, can we predict how many will
>> arrive, and therefore ask for the best amount when we reallocate, so
>> that we neither make too many reallocation (reallocate on every byte
>> received) or ask for too much (demand SIZE_MAX memory when the first
>> byte is received).?
> [...]
>
> That's going to depend on the behavior of the implementations
> realloc() function.
>
> malloc() and realloc() typically allocate more memory than they're
> asked for, so that another realloc() call can often expand the
> allocated memory in place.
>
> I have a small test program that uses realloc() to expand a 1-byte
> allocated buffer to 2 bytes, then 3, then 4, and so on. In one
> implementation, in one test run, it reallocates at 25 bytes, and
> then not again until just over 128 kbytes. Other implementations
> behave differently.
>
> A realloc() call that's able to return its first argument is likely
> to be much quicker than one that does an actual reallocation.
> If you want to fine tune for performance, you'll have to allow
> for the implementation you're using, either by performing run-time
> measurements or by analyzing the implementation (which could change
> in the next release).
>
> Multiplying the size by 2 or 1.5 as needed is likely to be good enough.
>
A small test program can hog all of memory for itself, and so isn't a
good test. What matters is where you're running something like a video
game, and the artists stuff the comupter's memory full of artwork until
it runs out, and then slowly and reluctantly remove their masterworks
until the other routines have enough memory to run. So the game won't
run out of memory, but only just. It's very tight.

--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
From: Tim Rentsch
Newsgroups: comp.lang.c
Organization: A noiseless patient Spider
Date: Tue, 18 Jun 2024 07:09 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about relocation?
Date: Tue, 18 Jun 2024 00:09:24 -0700
Organization: A noiseless patient Spider
Lines: 60
Message-ID: <86r0cuk9qz.fsf@linuxsc.com>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk> <v4ovqf$j5hq$1@dont-email.me> <87zfrjvqp6.fsf@bsb.me.uk> <v4pb4v$lhgk$1@dont-email.me> <87tthrvdtg.fsf@bsb.me.uk> <20240617181034.74fb4cca1f4a9a3ea032825e@g{oogle}mail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Tue, 18 Jun 2024 09:09:43 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d4aeba7b4aaab0d94b94221c1561c3e8";
logging-data="1308441"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+yOLNC0RjDvqKnhVRuT8FdbZHBFE8qdA4="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:zzu+i8AmkwEyN6BxS0HlLQGwnC0=
sha1:i53ICkcLz00HEMeAsaGS7HB40Vs=
View all headers

Anton Shepelev <anton.txt@g{oogle}mail.com> writes:

> Ben Bacarisse to Malcolm McLean:
>
>> [next is a comment from Malcolm]
>>
>>> Your strategy for avoiding these extremes is exponential
>>> growth.
>>
>> It's odd to call it mine. It's very widely know and used.
>> "The one I mentioned" might be less confusing description.
>
> I think it is a modern English idiom, which I dislike as
> well. StackOverflow is full of questions starting like:
> "How do you do this?" and "How do I do that?" They are
> informal ways of the more literary "How does one do this?"
> or "What is the way to do that?"

I have a different take here. First the "your" of "your
strategy" reads as a definite pronoun, meaning it refers
specifically to Ben and not to some unknown other party.
(And incidentally is subtly insulting because of that,
whether it was meant that way or not.)

Second the use of "you" to mean an unspecified other person
is not idiom but standard usage. The word "you" is both a
definite pronoun and an indefinite pronoun, depending on
context. The word "they" also has this property. Consider
these two examples:

The bank downtown was robbed. They haven't been caught
yet.

They say the sheriff isn't going to run for re-election.

In the first example "they" is a definite pronoun, referring
to the people who robbed the bank. In the second example,
"they" is an indefinite pronoun, referring to unspecified
people in general (perhaps but not necessarily everyone).
The word "you" is similar: it can mean specifically the
listener, or it can mean generically anyone in a broader
audience, even those who never hear or read the statement
with "you" in it.

The word "one" used as a pronoun is more formal, and to me
at least often sounds stilted. In US English "one" is most
often an indefinite pronoun, either second person or third
person. But "one" can also be used as a first person
definite pronoun (referring to the speaker), which an online
reference tells me is chiefly British English. (I would
guess that this usage predominates in "the Queen's English"
dialect of English, but I have very little experience in
such things.)

Finally I would normally read "I" as a first person definite
pronoun, and not an indefinite pronoun. So I don't have any
problem with someone saying "how should I ..." when asking
for advice. They aren't asking how someone else should ...
but how they should ..., and what advice I might give could
very well depend on who is doing the asking.

Pages:1234

rocksolid light 0.9.8
clearnet tor