Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

It may or may not be worthwhile, but it still has to be done.


comp / comp.os.linux.misc / Re: Script to conditionally find and compress files recursively

SubjectAuthor
* Script to conditionally find and compress files recursivelyJ Newman
+- Re: Script to conditionally find and compress files recursivelyJ Newman
+* Re: Script to conditionally find and compress files recursivelyGrant Taylor
|+* Re: Script to conditionally find and compress files recursivelyRichard Kettlewell
||`* Re: Script to conditionally find and compress files recursivelyD
|| `* Re: Script to conditionally find and compress files recursivelyJ Newman
||  `* Re: Script to conditionally find and compress files recursivelyD
||   `* Re: Script to conditionally find and compress files recursivelyGrant Taylor
||    `- Re: Script to conditionally find and compress files recursivelyD
|`* Re: Script to conditionally find and compress files recursivelyJ Newman
| `* Re: Script to conditionally find and compress files recursivelyAnssi Saari
|  +* Re: Script to conditionally find and compress files recursivelyComputer Nerd Kev
|  |`- Re: Script to conditionally find and compress files recursivelyComputer Nerd Kev
|  `- Re: Script to conditionally find and compress files recursivelyD
+- Re: Script to conditionally find and compress files recursivelyJoe Beanfish
`- Re: Script to conditionally find and compress files recursivelyD

1
Subject: Script to conditionally find and compress files recursively
From: J Newman
Newsgroups: comp.os.linux.misc
Organization: A noiseless patient Spider
Date: Tue, 11 Jun 2024 06:53 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jenniferkatenewman@gmail.com (J Newman)
Newsgroups: comp.os.linux.misc
Subject: Script to conditionally find and compress files recursively
Date: Tue, 11 Jun 2024 14:53:27 +0800
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <v48s96$u6fg$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 11 Jun 2024 08:53:26 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="266e61b684cc2eb7051d022a594540be";
logging-data="989680"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Qh3dnJcOnpFJRpGSU+y1j7y81tZrGwEdUFh/jrlgoow=="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:UNhoIlSc/v2g2L6b/S499Y9m2N0=
Content-Language: en-US
View all headers

Hi, I'm interested in writing a script that will:

1. Find and compress files recursively
2. After the first 5 seconds of compressing, if the compression ratio >1
(i.e. the compressed file will be larger than the uncompressed file), it
tries another compression algorithm.
3. If the other compression algorithm still has a ratio >1, it tries
another algorithm, until a list is exhausted.
4. If the list is exhausted, it skips compressing that file.

Any suggestions on how to proceed?

Subject: Re: Script to conditionally find and compress files recursively
From: D
Newsgroups: comp.os.linux.misc
Organization: i2pn2 (i2pn.org)
Date: Tue, 11 Jun 2024 08:51 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!panix!weretis.net!feeder9.news.weretis.net!i2pn.org!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: nospam@example.net (D)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Tue, 11 Jun 2024 10:51:45 +0200
Organization: i2pn2 (i2pn.org)
Message-ID: <2e0ae86d-ae03-5231-b2c3-1da13d22de72@example.net>
References: <v48s96$u6fg$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Injection-Info: i2pn2.org;
logging-data="3844917"; mail-complaints-to="usenet@i2pn2.org";
posting-account="w/4CleFT0XZ6XfSuRJzIySLIA6ECskkHxKUAYDZM66M";
In-Reply-To: <v48s96$u6fg$1@dont-email.me>
X-Spam-Checker-Version: SpamAssassin 4.0.0
View all headers

On Tue, 11 Jun 2024, J Newman wrote:

> Hi, I'm interested in writing a script that will:
>
> 1. Find and compress files recursively
> 2. After the first 5 seconds of compressing, if the compression ratio >1
> (i.e. the compressed file will be larger than the uncompressed file), it
> tries another compression algorithm.
> 3. If the other compression algorithm still has a ratio >1, it tries another
> algorithm, until a list is exhausted.
> 4. If the list is exhausted, it skips compressing that file.
>
> Any suggestions on how to proceed?
>

Difficult to estimate compression ratio without analyzing the entire file.
In theory you could say something based on the file type, but that's the
best I can come up with.

Subject: Re: Script to conditionally find and compress files recursively
From: Joe Beanfish
Newsgroups: comp.os.linux.misc
Organization: A noiseless patient Spider
Date: Tue, 11 Jun 2024 14:58 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: joebeanfish@nospam.duh (Joe Beanfish)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Tue, 11 Jun 2024 14:58:23 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <v49omf$12c3q$1@dont-email.me>
References: <v48s96$u6fg$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Jun 2024 16:58:23 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b9582056cd300859638960b2ad17292e";
logging-data="1126522"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AEF37CrBSy2PqgpJNxVFWR7HwjMl4xa4="
User-Agent: Pan/0.146 (Hic habitat felicitas; 8107378
git@gitlab.gnome.org:GNOME/pan.git)
Cancel-Lock: sha1:yKWB5Aej1ArX0+mCfegxqAaElC8=
View all headers

On Tue, 11 Jun 2024 14:53:27 +0800, J Newman wrote:

> Hi, I'm interested in writing a script that will:
>
> 1. Find and compress files recursively
> 2. After the first 5 seconds of compressing, if the compression ratio >1
> (i.e. the compressed file will be larger than the uncompressed file), it
> tries another compression algorithm.
> 3. If the other compression algorithm still has a ratio >1, it tries
> another algorithm, until a list is exhausted.
> 4. If the list is exhausted, it skips compressing that file.
>
> Any suggestions on how to proceed?

You could use dd to extract a representative chunk of the file to
compress and compare size.

uncompressedsize=$(dd status=none if="$file" bs=1M count=1|wc -c)
compressedsize=$(dd status=none if="$file" bs=1M count=1|$compresscmd|wc -c)

You could get fancy and try all the compression commands you have
and pick the one with smallest output for the actual compression.
That's all assuming the beginning of the file is representative of
the content throughout. If it's not, no way to tell without compressing
the whole thing.

Subject: Re: Script to conditionally find and compress files recursively
From: Grant Taylor
Newsgroups: comp.os.linux.misc
Organization: TNet Consulting
Date: Wed, 12 Jun 2024 03:21 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!tncsrv06.tnetconsulting.net!tncsrv09.home.tnetconsulting.net!.POSTED.omega.home.tnetconsulting.net!not-for-mail
From: gtaylor@tnetconsulting.net (Grant Taylor)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Tue, 11 Jun 2024 22:21:00 -0500
Organization: TNet Consulting
Message-ID: <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net>
References: <v48s96$u6fg$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 12 Jun 2024 03:21:00 -0000 (UTC)
Injection-Info: tncsrv09.home.tnetconsulting.net; posting-host="omega.home.tnetconsulting.net:198.18.1.11";
logging-data="7601"; mail-complaints-to="newsmaster@tnetconsulting.net"
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <v48s96$u6fg$1@dont-email.me>
View all headers

On 6/11/24 01:53, J Newman wrote:
> Any suggestions on how to proceed?

As others have said, it's very difficult to tell within the first five
seconds what the ultimate compression ratio will be.

If you have the disk space, compress using all of the compression
options and then remove all but the smallest file.

Then go on to the next file.

--
Grant. . . .

Subject: Re: Script to conditionally find and compress files recursively
From: Richard Kettlewell
Newsgroups: comp.os.linux.misc
Organization: terraraq NNTP server
Date: Wed, 12 Jun 2024 07:17 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.gegeweb.eu!gegeweb.org!nntp.terraraq.uk!.POSTED.tunnel.sfere.anjou.terraraq.org.uk!not-for-mail
From: invalid@invalid.invalid (Richard Kettlewell)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Wed, 12 Jun 2024 08:17:11 +0100
Organization: terraraq NNTP server
Message-ID: <wwvo7868waw.fsf@LkoBDZeT.terraraq.uk>
References: <v48s96$u6fg$1@dont-email.me>
<v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: innmantic.terraraq.uk; posting-host="tunnel.sfere.anjou.terraraq.org.uk:172.17.207.6";
logging-data="49231"; mail-complaints-to="usenet@innmantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:oF/uCnMFUHnl1ia3hSXpSchuvrc=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
View all headers

Grant Taylor <gtaylor@tnetconsulting.net> writes:
> On 6/11/24 01:53, J Newman wrote:
>> Any suggestions on how to proceed?
>
> As others have said, it's very difficult to tell within the first five
> seconds what the ultimate compression ratio will be.

Not just difficult but impossible in general: the input file could
change character in its second half, switching the overall result from
that that is (for example) a gzip win to an xz win.

--
https://www.greenend.org.uk/rjk/

Subject: Re: Script to conditionally find and compress files recursively
From: D
Newsgroups: comp.os.linux.misc
Organization: i2pn2 (i2pn.org)
Date: Wed, 12 Jun 2024 08:13 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: nospam@example.net (D)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Wed, 12 Jun 2024 10:13:43 +0200
Organization: i2pn2 (i2pn.org)
Message-ID: <083d0e35-e02d-8668-726f-7aa89980e9b2@example.net>
References: <v48s96$u6fg$1@dont-email.me> <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net> <wwvo7868waw.fsf@LkoBDZeT.terraraq.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Injection-Info: i2pn2.org;
logging-data="3943562"; mail-complaints-to="usenet@i2pn2.org";
posting-account="w/4CleFT0XZ6XfSuRJzIySLIA6ECskkHxKUAYDZM66M";
In-Reply-To: <wwvo7868waw.fsf@LkoBDZeT.terraraq.uk>
X-Spam-Checker-Version: SpamAssassin 4.0.0
View all headers

On Wed, 12 Jun 2024, Richard Kettlewell wrote:

> Grant Taylor <gtaylor@tnetconsulting.net> writes:
>> On 6/11/24 01:53, J Newman wrote:
>>> Any suggestions on how to proceed?
>>
>> As others have said, it's very difficult to tell within the first five
>> seconds what the ultimate compression ratio will be.
>
> Not just difficult but impossible in general: the input file could
> change character in its second half, switching the overall result from
> that that is (for example) a gzip win to an xz win.
>
>

This is true! The only thing I can imagine are parsing the file type, and
from that file type, drawing conclusions about the compressability of the
data, or doing a flawed statistical analysis, but as said, the end could
be vastly different from the start.

Subject: Re: Script to conditionally find and compress files recursively
From: J Newman
Newsgroups: comp.os.linux.misc
Organization: A noiseless patient Spider
Date: Thu, 13 Jun 2024 04:43 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jenniferkatenewman@gmail.com (J Newman)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Thu, 13 Jun 2024 12:43:43 +0800
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <v4dtdt$23kjq$1@dont-email.me>
References: <v48s96$u6fg$1@dont-email.me>
<v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 13 Jun 2024 06:43:42 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d01cce3e8d98c9e911112d89133c53f3";
logging-data="2216570"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+/0MLn+HayXMQBKbJB+nz83T42H1euwvN5+liNmKSoEQ=="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:OdSNP3yqHaGSejLLHbdMcKzM45Q=
In-Reply-To: <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net>
Content-Language: en-US
View all headers

On 12/06/2024 11:21, Grant Taylor wrote:
> On 6/11/24 01:53, J Newman wrote:
>> Any suggestions on how to proceed?
>
> As others have said, it's very difficult to tell within the first five
> seconds what the ultimate compression ratio will be.
>
> If you have the disk space, compress using all of the compression
> options and then remove all but the smallest file.
>
> Then go on to the next file.
>
>
>

It's true that you cannot tell within the first 5 seconds what the
ultimate compression ratio will be, but it seems to me (from compressing
avi/mp4/mov files with lzma -9evv) that you can tell within +/- 5% to a
high degree of confidence, what the ultimate compression ratio will be
given the first 5 seconds.

Subject: Re: Script to conditionally find and compress files recursively
From: J Newman
Newsgroups: comp.os.linux.misc
Organization: A noiseless patient Spider
Date: Thu, 13 Jun 2024 04:46 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jenniferkatenewman@gmail.com (J Newman)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Thu, 13 Jun 2024 12:46:10 +0800
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <v4dtih$23kjq$2@dont-email.me>
References: <v48s96$u6fg$1@dont-email.me>
<v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net>
<wwvo7868waw.fsf@LkoBDZeT.terraraq.uk>
<083d0e35-e02d-8668-726f-7aa89980e9b2@example.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 13 Jun 2024 06:46:10 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d01cce3e8d98c9e911112d89133c53f3";
logging-data="2216570"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18a8gk9aLxpG2/w/6qLAD6ReWJOXQFkNF6YYudxfeyeQw=="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:nlX7IFLkZVQgSbsSYglMFiBUQyU=
In-Reply-To: <083d0e35-e02d-8668-726f-7aa89980e9b2@example.net>
Content-Language: en-US
View all headers

On 12/06/2024 16:13, D wrote:
>
>
> On Wed, 12 Jun 2024, Richard Kettlewell wrote:
>
>> Grant Taylor <gtaylor@tnetconsulting.net> writes:
>>> On 6/11/24 01:53, J Newman wrote:
>>>> Any suggestions on how to proceed?
>>>
>>> As others have said, it's very difficult to tell within the first five
>>> seconds what the ultimate compression ratio will be.
>>
>> Not just difficult but impossible in general: the input file could
>> change character in its second half, switching the overall result from
>> that that is (for example) a gzip win to an xz win.
>>
>>
>
> This is true! The only thing I can imagine are parsing the file type,
> and from that file type, drawing conclusions about the compressability
> of the data, or doing a flawed statistical analysis, but as said, the
> end could be vastly different from the start.

OK good point...as mentioned elsewhere my experience is with compressing
video files with lzma.

But if we accept that the script will make mistakes sometimes in
choosing the right algorithm for compression, do you suggest parsing the
file type, or trying to compress each file for the first 5 seconds, as
the option with the least errors in choosing the right compression
algorithm?

Subject: Re: Script to conditionally find and compress files recursively
From: Anssi Saari
Newsgroups: comp.os.linux.misc
Organization: An impatient and LOUD arachnid
Date: Thu, 13 Jun 2024 07:13 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anssi.saari@usenet.mail.kapsi.fi (Anssi Saari)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Thu, 13 Jun 2024 10:13:20 +0300
Organization: An impatient and LOUD arachnid
Lines: 14
Message-ID: <sm05xudwc1b.fsf@lakka.kapsi.fi>
References: <v48s96$u6fg$1@dont-email.me>
<v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net>
<v4dtdt$23kjq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Thu, 13 Jun 2024 09:13:20 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c845ab9ef829f5f32e27a601c461a126";
logging-data="2265331"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19PYfbihKW1wDuhOycDzQpg"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:cpVSj2H6wHHLop4IWkzEi7cVXsU=
sha1:ONb3re38za9X4rjbxHK2czH1MFM=
View all headers

J Newman <jenniferkatenewman@gmail.com> writes:

> It's true that you cannot tell within the first 5 seconds what the
> ultimate compression ratio will be, but it seems to me (from
> compressing avi/mp4/mov files with lzma -9evv) that you can tell
> within +/- 5% to a high degree of confidence, what the ultimate
> compression ratio will be given the first 5 seconds.

Well then, I believe the solution was already posted. Grab 5% of your
files with dd and see how it compresses.

I'm a little curious, what kind of space savings do you expect to get by
doing this? And wouldn't it make more sense to re-encode for lower
bitrate if space saving is your goal?

Subject: Re: Script to conditionally find and compress files recursively
From: D
Newsgroups: comp.os.linux.misc
Organization: i2pn2 (i2pn.org)
Date: Thu, 13 Jun 2024 09:55 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!panix!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: nospam@example.net (D)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Thu, 13 Jun 2024 11:55:09 +0200
Organization: i2pn2 (i2pn.org)
Message-ID: <909e65ae-69f4-8619-e563-7d6565a48bc3@example.net>
References: <v48s96$u6fg$1@dont-email.me> <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net> <v4dtdt$23kjq$1@dont-email.me> <sm05xudwc1b.fsf@lakka.kapsi.fi>
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
Injection-Info: i2pn2.org;
logging-data="4054319"; mail-complaints-to="usenet@i2pn2.org";
posting-account="w/4CleFT0XZ6XfSuRJzIySLIA6ECskkHxKUAYDZM66M";
X-Spam-Checker-Version: SpamAssassin 4.0.0
In-Reply-To: <sm05xudwc1b.fsf@lakka.kapsi.fi>
View all headers

On Thu, 13 Jun 2024, Anssi Saari wrote:

> J Newman <jenniferkatenewman@gmail.com> writes:
>
>> It's true that you cannot tell within the first 5 seconds what the
>> ultimate compression ratio will be, but it seems to me (from
>> compressing avi/mp4/mov files with lzma -9evv) that you can tell
>> within +/- 5% to a high degree of confidence, what the ultimate
>> compression ratio will be given the first 5 seconds.
>
> Well then, I believe the solution was already posted. Grab 5% of your
> files with dd and see how it compresses.
>
> I'm a little curious, what kind of space savings do you expect to get by
> doing this? And wouldn't it make more sense to re-encode for lower
> bitrate if space saving is your goal?
>

If it's about space saving, don't forget deduplication, alternatively,
depending on yoru file system of choice, you could maybe use file system
functionality to save space as well, but caveat emptor, always have off
site (or off machine) backups.

Subject: Re: Script to conditionally find and compress files recursively
From: D
Newsgroups: comp.os.linux.misc
Organization: i2pn2 (i2pn.org)
Date: Thu, 13 Jun 2024 09:55 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: nospam@example.net (D)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Thu, 13 Jun 2024 11:55:23 +0200
Organization: i2pn2 (i2pn.org)
Message-ID: <647f0226-265e-2757-bd2a-3aa89de38107@example.net>
References: <v48s96$u6fg$1@dont-email.me> <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net> <wwvo7868waw.fsf@LkoBDZeT.terraraq.uk> <083d0e35-e02d-8668-726f-7aa89980e9b2@example.net> <v4dtih$23kjq$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
Injection-Info: i2pn2.org;
logging-data="4054350"; mail-complaints-to="usenet@i2pn2.org";
posting-account="w/4CleFT0XZ6XfSuRJzIySLIA6ECskkHxKUAYDZM66M";
X-Spam-Checker-Version: SpamAssassin 4.0.0
In-Reply-To: <v4dtih$23kjq$2@dont-email.me>
View all headers

On Thu, 13 Jun 2024, J Newman wrote:

> On 12/06/2024 16:13, D wrote:
>>
>>
>> On Wed, 12 Jun 2024, Richard Kettlewell wrote:
>>
>>> Grant Taylor <gtaylor@tnetconsulting.net> writes:
>>>> On 6/11/24 01:53, J Newman wrote:
>>>>> Any suggestions on how to proceed?
>>>>
>>>> As others have said, it's very difficult to tell within the first five
>>>> seconds what the ultimate compression ratio will be.
>>>
>>> Not just difficult but impossible in general: the input file could
>>> change character in its second half, switching the overall result from
>>> that that is (for example) a gzip win to an xz win.
>>>
>>>
>>
>> This is true! The only thing I can imagine are parsing the file type, and
>> from that file type, drawing conclusions about the compressability of the
>> data, or doing a flawed statistical analysis, but as said, the end could be
>> vastly different from the start.
>
> OK good point...as mentioned elsewhere my experience is with compressing
> video files with lzma.
>
> But if we accept that the script will make mistakes sometimes in choosing the
> right algorithm for compression, do you suggest parsing the file type, or
> trying to compress each file for the first 5 seconds, as the option with the
> least errors in choosing the right compression algorithm?
>

Hmm, I'd say parsing file types first, and perhaps have a little database
that maps file type to compression algorithm, and if that doesn't yield
anything, proceed with "brute force".

Subject: Re: Script to conditionally find and compress files recursively
From: Computer Nerd Kev
Newsgroups: comp.os.linux.misc
Organization: Ausics - https://newsgroups.ausics.net
Date: Thu, 13 Jun 2024 23:06 UTC
References: 1 2 3 4
Message-ID: <666b7b6c@news.ausics.net>
From: not@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Script to conditionally find and compress files recursively
Newsgroups: comp.os.linux.misc
References: <v48s96$u6fg$1@dont-email.me> <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net> <v4dtdt$23kjq$1@dont-email.me> <sm05xudwc1b.fsf@lakka.kapsi.fi>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 14 Jun 2024 09:06:21 +1000
Organization: Ausics - https://newsgroups.ausics.net
Lines: 30
X-Complaints: abuse@ausics.net
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.bbs.nz!news.ausics.net!not-for-mail
View all headers

Anssi Saari <anssi.saari@usenet.mail.kapsi.fi> wrote:
> J Newman <jenniferkatenewman@gmail.com> writes:
>
>> It's true that you cannot tell within the first 5 seconds what the
>> ultimate compression ratio will be, but it seems to me (from
>> compressing avi/mp4/mov files with lzma -9evv) that you can tell
>> within +/- 5% to a high degree of confidence, what the ultimate
>> compression ratio will be given the first 5 seconds.
>
> Well then, I believe the solution was already posted. Grab 5% of your
> files with dd and see how it compresses.

The solution that I see grabs the first 1MB, but it would make more
sense to sample eg. 1% of the file size in five places within the
file. 100MB file = 1MB sample, 100MB/5 = 20MB, so use dd to grab
one 1MB sample from the start of the file then four more at an
offset that increments by 20MB each time. Store these separately,
compress them separately, then average the compression ratio of all
the samples.

> I'm a little curious, what kind of space savings do you expect to get by
> doing this? And wouldn't it make more sense to re-encode for lower
> bitrate if space saving is your goal?

Maybe he's using lossless video compression? Otherwise yes it seems
like the wrong approach.

--
__ __
#_ < |\| |< _#

Subject: Re: Script to conditionally find and compress files recursively
From: Computer Nerd Kev
Newsgroups: comp.os.linux.misc
Organization: Ausics - https://newsgroups.ausics.net
Date: Fri, 14 Jun 2024 02:25 UTC
References: 1 2 3 4 5
Message-ID: <666baa01@news.ausics.net>
From: not@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Script to conditionally find and compress files recursively
Newsgroups: comp.os.linux.misc
References: <v48s96$u6fg$1@dont-email.me> <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net> <v4dtdt$23kjq$1@dont-email.me> <sm05xudwc1b.fsf@lakka.kapsi.fi> <666b7b6c@news.ausics.net>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 14 Jun 2024 12:25:06 +1000
Organization: Ausics - https://newsgroups.ausics.net
Lines: 26
X-Complaints: abuse@ausics.net
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.bbs.nz!news.ausics.net!not-for-mail
View all headers

Computer Nerd Kev <not@telling.you.invalid> wrote:
> Anssi Saari <anssi.saari@usenet.mail.kapsi.fi> wrote:
>>
>> Well then, I believe the solution was already posted. Grab 5% of your
>> files with dd and see how it compresses.
>
> The solution that I see grabs the first 1MB, but it would make more
> sense to sample eg. 1% of the file size in five places within the
> file. 100MB file = 1MB sample, 100MB/5 = 20MB, so use dd to grab
> one 1MB sample from the start of the file then four more at an
> offset that increments by 20MB each time. Store these separately,
> compress them separately, then average the compression ratio of all
> the samples.

Also for some types of data (if it's not all video), like text, some
more advanced compressors build a dictionary to better compress
larger files. But this requires a minimum file size, so the small
samples might not represent the compression ratio of the whole file
with a dictionary included. A solution is to pre-generate a
dictionary based on a collection of the same type of files you're
compressing, then you could compress the small samples using that
dictionary and get a more accurate result.

--
__ __
#_ < |\| |< _#

Subject: Re: Script to conditionally find and compress files recursively
From: Grant Taylor
Newsgroups: comp.os.linux.misc
Organization: TNet Consulting
Date: Fri, 14 Jun 2024 03:35 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!tncsrv06.tnetconsulting.net!tncsrv09.home.tnetconsulting.net!.POSTED.omega.home.tnetconsulting.net!not-for-mail
From: gtaylor@tnetconsulting.net (Grant Taylor)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Thu, 13 Jun 2024 22:35:26 -0500
Organization: TNet Consulting
Message-ID: <v4gdpu$cts$1@tncsrv09.home.tnetconsulting.net>
References: <v48s96$u6fg$1@dont-email.me>
<v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net>
<wwvo7868waw.fsf@LkoBDZeT.terraraq.uk>
<083d0e35-e02d-8668-726f-7aa89980e9b2@example.net>
<v4dtih$23kjq$2@dont-email.me>
<647f0226-265e-2757-bd2a-3aa89de38107@example.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 14 Jun 2024 03:35:26 -0000 (UTC)
Injection-Info: tncsrv09.home.tnetconsulting.net; posting-host="omega.home.tnetconsulting.net:198.18.1.11";
logging-data="13244"; mail-complaints-to="newsmaster@tnetconsulting.net"
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <647f0226-265e-2757-bd2a-3aa89de38107@example.net>
View all headers

On 6/13/24 04:55, D wrote:
> perhaps have a little database that maps file type to compression algorithm

case ${FILE##*.} in
txt)
#...
;;
jpg|jpeg)
# Jpeg
;;
*)
echo "unknown file type"
;;
esac

;-)

Subject: Re: Script to conditionally find and compress files recursively
From: D
Newsgroups: comp.os.linux.misc
Organization: i2pn2 (i2pn.org)
Date: Fri, 14 Jun 2024 09:07 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: nospam@example.net (D)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Fri, 14 Jun 2024 11:07:15 +0200
Organization: i2pn2 (i2pn.org)
Message-ID: <34907e1a-2413-bfc6-724b-f4798e73cd17@example.net>
References: <v48s96$u6fg$1@dont-email.me> <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net> <wwvo7868waw.fsf@LkoBDZeT.terraraq.uk> <083d0e35-e02d-8668-726f-7aa89980e9b2@example.net> <v4dtih$23kjq$2@dont-email.me> <647f0226-265e-2757-bd2a-3aa89de38107@example.net>
<v4gdpu$cts$1@tncsrv09.home.tnetconsulting.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Injection-Info: i2pn2.org;
logging-data="4153101"; mail-complaints-to="usenet@i2pn2.org";
posting-account="w/4CleFT0XZ6XfSuRJzIySLIA6ECskkHxKUAYDZM66M";
In-Reply-To: <v4gdpu$cts$1@tncsrv09.home.tnetconsulting.net>
X-Spam-Checker-Version: SpamAssassin 4.0.0
View all headers

On Thu, 13 Jun 2024, Grant Taylor wrote:

> On 6/13/24 04:55, D wrote:
>> perhaps have a little database that maps file type to compression algorithm
>
> case ${FILE##*.} in
> txt)
> #...
> ;;
> jpg|jpeg)
> # Jpeg
> ;;
> *)
> echo "unknown file type"
> ;;
> esac
>
> ;-)
>

See.. half way there! Just cut n' paste and fill in the details. =)

Subject: Re: Script to conditionally find and compress files recursively
From: J Newman
Newsgroups: comp.os.linux.misc
Organization: A noiseless patient Spider
Date: Sat, 15 Jun 2024 03:30 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jenniferkatenewman@gmail.com (J Newman)
Newsgroups: comp.os.linux.misc
Subject: Re: Script to conditionally find and compress files recursively
Date: Sat, 15 Jun 2024 11:30:35 +0800
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <v4j1sp$39rdv$1@dont-email.me>
References: <v48s96$u6fg$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 15 Jun 2024 05:30:34 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3b79a49ddc9335bb4a69a9bb0296020a";
logging-data="3468735"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KzHEXEBfQZuYlBcsNZ6/Nffvj4ai8zZcQBnnMstdpGQ=="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:77BhXNh3/8/bM9maLLg2VrLq7dE=
Content-Language: en-US
In-Reply-To: <v48s96$u6fg$1@dont-email.me>
View all headers

On 11/06/2024 14:53, J Newman wrote:
> Hi, I'm interested in writing a script that will:
>
> 1. Find and compress files recursively
> 2. After the first 5 seconds of compressing, if the compression ratio >1
> (i.e. the compressed file will be larger than the uncompressed file), it
> tries another compression algorithm.
> 3. If the other compression algorithm still has a ratio >1, it tries
> another algorithm, until a list is exhausted.
> 4. If the list is exhausted, it skips compressing that file.
>
> Any suggestions on how to proceed?

This is the script ChatGPT gives. After some thought, I decided to just
go with one compression algorithm for simplicity, and just not compress
the files if the compression ratio >1.

#!/bin/bash

# Function to compress a file with lzma and keep it only if compression
ratio is <1
compress_file() {
local file=$1
local orig_size=$(stat --printf="%s" "$file")

# Compress with lzma
lzma -z -k -c "$file" > "$file.lzma"
local lzma_size=$(stat --printf="%s" "$file.lzma")

# If the lzma compressed file is smaller than the original, keep it
if (( lzma_size < orig_size )); then
mv "$file.lzma" "$file.compressed"
echo "File compressed using lzma: $file -> $file.compressed"
else
rm -f "$file.lzma"
echo "No compression applied for $file as the compressed size
was not smaller than the original."
fi
}

# Export the function so it's available to find -exec
export -f compress_file

# Recursively find all files and compress them
find . -type f -exec bash -c 'compress_file "$0"' {} \;

echo "Compression process complete."

1

rocksolid light 0.9.8
clearnet tor