Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

You get along very well with everyone except animals and people.


comp / comp.unix.shell / Re: IFS=$'\n'

SubjectAuthor
* IFS=$'\n'Lawrence D'Oliveiro
`* Re: IFS=$'\n'Helmut Waitzmann
 `* Re: IFS=$'\n'Ralf Damaschke
  +- Re: IFS=$'\n'Lawrence D'Oliveiro
  `* Re: IFS=$'\n'Christian Weisgerber
   +* Re: IFS=$'\n'Ralf Damaschke
   |`- Re: IFS=$'\n'Ed Morton
   `- xargs -x (was: IFS=$'\n')Geoff Clare

1
Subject: IFS=$'\n'
From: Lawrence D'Oliv
Newsgroups: comp.unix.shell
Organization: A noiseless patient Spider
Date: Tue, 13 Aug 2024 08:26 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: comp.unix.shell
Subject: IFS=$'\n'
Date: Tue, 13 Aug 2024 08:26:41 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <v9f5c1$3q99m$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 13 Aug 2024 10:26:41 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="dcc5997f875115997406f45b29a1ceee";
logging-data="4007222"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Um9kI9EH0A8g53UQ5FpDu"
User-Agent: Pan/0.159 (Vovchansk; )
Cancel-Lock: sha1:pZEtGWaKjxcO2MIo+WACZyamcOU=
View all headers

I like having spaces in file/directory names, but I avoid putting newlines
in them.

This plays havoc with the shell’s word-splitting rules, because the
default value for IFS is

IFS=$' \t\n'

which means names with spaces in them get split into separate items,
triggering lots of errors about items not found (or the wrong items found/
created).

However, if you change this to

IFS=$'\n'

then this can make things much more convenient (provided you can be sure
there are no newlines in your file names). For example, I can do

ls -lt $(find . -type f -iname \*fred\*)

to search for all filenames containing “fred” in the hierarchy rooted at
the current directory, and display them in reverse chronological order.

Subject: Re: IFS=$'\n'
From: Helmut Waitzmann
Newsgroups: comp.unix.shell
Organization: A noiseless patient Spider
Date: Tue, 13 Aug 2024 11:14 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nn.throttle@xoxy.net (Helmut Waitzmann)
Newsgroups: comp.unix.shell
Subject: Re: IFS=$'\n'
Date: Tue, 13 Aug 2024 13:14:21 +0200
Organization: A noiseless patient Spider
Lines: 54
Sender: Helmut Waitzmann <12f7e638@mail.de>
Message-ID: <83a5hgad4i.fsf@helmutwaitzmann.news.arcor.de>
References: <v9f5c1$3q99m$1@dont-email.me>
Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Tue, 13 Aug 2024 13:14:28 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f7c9b9bcec49b0f9ca39587ab8c93f76";
logging-data="4064294"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX185iwASsPxD0l10Of5S8E7uhGQkZM+7A8E="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
Cancel-Lock: sha1:kCK94ESoMG2we8Uf29vja3wnZAM=
sha1:NBkSoxfxsFyzlpHNtxgBHMh2eUU=
Mail-Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>
Mail-Copies-To: nobody
View all headers

Lawrence D'Oliveiro <ldo@nz.invalid>:
> However, if you change this to
>
>
> IFS=$'\n'
>
> then this can make things much more convenient (provided you can
> be sure there are no newlines in your file names). For example,
> I can do
>
>
> ls -lt $(find . -type f -iname \*fred\*)
>
> to search for all filenames containing “fred” in the hierarchy
> rooted at the current directory, and display them in reverse
> chronological order.
>

Even if one is sure there are no linefeeds in the file names a
cautious design could do

(
IFS=$'\n' &&
ls -lt -- $(find . -name \*$'\n'\* ! -prune -o \
! -name \*$'\n'\* -iname \*fred\* -type f -print
)
)

That will work if there are really no linefeeds in file names. 
And if there inadvertently were any linefeeds in file names it
would at least not list file names that don't match the
“\*fred\*“ file name pattern.

But – according to the new POSIX standard
(<https://pubs.opengroup.org/onlinepubs/9799919799/utilities/find.html#top>
and
<https://pubs.opengroup.org/onlinepubs/9799919799/utilities/xargs.html#top>) –
one can use “xargs“ to get rid of any linefeed trouble at all:

find . -iname \*fred\* -type f -print0 |
xargs -0 -r -x -- ls -lt --

That will work with any file names, even those containing a
linefeed character.

Subject: Re: IFS=$'\n'
From: Ralf Damaschke
Newsgroups: comp.unix.shell
Organization: C.H.A.O.S.
Date: Tue, 13 Aug 2024 21:56 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rwspam@gmx.de (Ralf Damaschke)
Newsgroups: comp.unix.shell
Subject: Re: IFS=$'\n'
Date: 13 Aug 2024 21:56:56 GMT
Organization: C.H.A.O.S.
Lines: 18
Message-ID: <pan$989b5$e7369b52$2a19adc3$14541b89@y2plugh.fqdn.th-h.de>
References: <v9f5c1$3q99m$1@dont-email.me>
<83a5hgad4i.fsf@helmutwaitzmann.news.arcor.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net uOcmAPrTN0G1MEmEKA5oOA8wdaQU24JPZrxEyMk/OFnybKRJXR
Cancel-Lock: sha1:WkGirXCgt4ASxydA0IbJemQMS1I= sha256:WSShuO94T5RHRjkbuyg65bzSSzRFWMLJQ+6ts9m8rSI=
User-Agent: Pan/0.144 (Time is the enemy; 28ab3ba git.gnome.org/pan2)
View all headers

Helmut Waitzmann wrote:

> But – according to the new POSIX standard
> ([links to 2024 opengroup specs for find and xargs}) –
> one can use “xargs“ to get rid of any linefeed trouble at all:
>
> find . -iname \*fred\* -type f -print0 |
> xargs -0 -r -x -- ls -lt --
>
> That will work with any file names, even those containing a linefeed
> character.

OK, print0 is going to become standard, but nowadays I already prefer
(when I use iname for my comfort)

find . -iname \*fred\* -type f -exec ls -lt -- {} +

I don't see any advantage using print0 and xargs -0.

Subject: Re: IFS=$'\n'
From: Lawrence D'Oliv
Newsgroups: comp.unix.shell
Organization: A noiseless patient Spider
Date: Tue, 13 Aug 2024 22:13 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: comp.unix.shell
Subject: Re: IFS=$'\n'
Date: Tue, 13 Aug 2024 22:13:10 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 6
Message-ID: <v9glpm$30as$11@dont-email.me>
References: <v9f5c1$3q99m$1@dont-email.me>
<83a5hgad4i.fsf@helmutwaitzmann.news.arcor.de>
<pan$989b5$e7369b52$2a19adc3$14541b89@y2plugh.fqdn.th-h.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 14 Aug 2024 00:13:11 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="09768462440b3e073b404311a341494e";
logging-data="98652"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX198lTbZIYeZhMyUx6eCNKyv"
User-Agent: Pan/0.159 (Vovchansk; )
Cancel-Lock: sha1:UYQoOQUgBdgC7kHsyEXeJA4voUk=
View all headers

On 13 Aug 2024 21:56:56 GMT, Ralf Damaschke wrote:

> OK, print0 is going to become standard ...

Sure, but there needs to be a more convenient way to use NUL as a word
delimiter.

Subject: Re: IFS=$'\n'
From: Christian Weisgerber
Newsgroups: comp.unix.shell
Date: Wed, 14 Aug 2024 13:59 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.szaf.org!inka.de!mips.inka.de!.POSTED.localhost!not-for-mail
From: naddy@mips.inka.de (Christian Weisgerber)
Newsgroups: comp.unix.shell
Subject: Re: IFS=$'\n'
Date: Wed, 14 Aug 2024 13:59:39 -0000 (UTC)
Message-ID: <slrnvbpe2b.20ve.naddy@lorvorc.mips.inka.de>
References: <v9f5c1$3q99m$1@dont-email.me>
<83a5hgad4i.fsf@helmutwaitzmann.news.arcor.de>
<pan$989b5$e7369b52$2a19adc3$14541b89@y2plugh.fqdn.th-h.de>
Injection-Date: Wed, 14 Aug 2024 13:59:39 -0000 (UTC)
Injection-Info: lorvorc.mips.inka.de; posting-host="localhost:::1";
logging-data="66543"; mail-complaints-to="usenet@mips.inka.de"
User-Agent: slrn/1.0.3 (FreeBSD)
View all headers

On 2024-08-13, Ralf Damaschke <rwspam@gmx.de> wrote:

>> find . -iname \*fred\* -type f -print0 |
>> xargs -0 -r -x -- ls -lt --
>
> OK, print0 is going to become standard, but nowadays I already prefer
> (when I use iname for my comfort)
>
> find . -iname \*fred\* -type f -exec ls -lt -- {} +

If sufficiently many files accrue, find(1) will invoke ls(1) several
times, which will not produce the expected result. That may be
unlikely in this specific example, but it can happen in the general
case.

Wait, you say, xargs(1) will also split its input across multiple
invocations. I mean, that's very much the point of xargs. Which
is why Helmut added the -x flag, which is supposed to prevent this
behavior.

On BSD, that will be a syntax error because -x is only available
in combination with -n.

--
Christian "naddy" Weisgerber naddy@mips.inka.de

Subject: Re: IFS=$'\n'
From: Ralf Damaschke
Newsgroups: comp.unix.shell
Organization: C.H.A.O.S.
Date: Wed, 14 Aug 2024 22:55 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rwspam@gmx.de (Ralf Damaschke)
Newsgroups: comp.unix.shell
Subject: Re: IFS=$'\n'
Date: 14 Aug 2024 22:55:40 GMT
Organization: C.H.A.O.S.
Lines: 15
Message-ID: <pan$2d437$c79a548c$945396fd$527603cd@y2plugh.fqdn.th-h.de>
References: <v9f5c1$3q99m$1@dont-email.me>
<83a5hgad4i.fsf@helmutwaitzmann.news.arcor.de>
<pan$989b5$e7369b52$2a19adc3$14541b89@y2plugh.fqdn.th-h.de>
<slrnvbpe2b.20ve.naddy@lorvorc.mips.inka.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net ZioMIYENfrlNp96DhSiRfgXNNZmyW1bEv9wmVDe00ySHJtJLZD
Cancel-Lock: sha1:2oBWYRLMyBkZkKzm76h2qGuwZGw= sha256:/+b4Fay/TObK4HzjPbCo1lIotPFTDE2GUDf6axrDK3w=
User-Agent: Pan/0.144 (Time is the enemy; 28ab3ba git.gnome.org/pan2)
View all headers

Christian Weisgerber wrote:

> If sufficiently many files accrue, find(1) will invoke ls(1) several
> times, which will not produce the expected result. That may be unlikely
> in this specific example, but it can happen in the general case.
>
> Wait, you say, xargs(1) will also split its input across multiple
> invocations. I mean, that's very much the point of xargs. Which is why
> Helmut added the -x flag, which is supposed to prevent this behavior.

I see the point, but I hope I never meet a use case that says
"do something with the files found, but throw the list away if it can't
be done all at once". I would rather first assemble the list, try to execute
the command with it and if needed switch to some different approach of
handling the files.

Subject: Re: IFS=$'\n'
From: Ed Morton
Newsgroups: comp.unix.shell
Organization: A noiseless patient Spider
Date: Thu, 15 Aug 2024 11:30 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mortonspam@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: IFS=$'\n'
Date: Thu, 15 Aug 2024 06:30:18 -0500
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <v9kosa$udgv$1@dont-email.me>
References: <v9f5c1$3q99m$1@dont-email.me>
<83a5hgad4i.fsf@helmutwaitzmann.news.arcor.de>
<pan$989b5$e7369b52$2a19adc3$14541b89@y2plugh.fqdn.th-h.de>
<slrnvbpe2b.20ve.naddy@lorvorc.mips.inka.de>
<pan$2d437$c79a548c$945396fd$527603cd@y2plugh.fqdn.th-h.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 15 Aug 2024 13:30:19 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="44bec365f1f2253c46953dcad205fd20";
logging-data="996895"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+gFbEj5P9XDopIMFV1YlMh"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:/S0V0a6SF/B517YjjDeNdjhfoeg=
X-Antivirus-Status: Clean
X-Antivirus: Avast (VPS 240815-0, 8/14/2024), Outbound message
Content-Language: en-US
In-Reply-To: <pan$2d437$c79a548c$945396fd$527603cd@y2plugh.fqdn.th-h.de>
View all headers

On 8/14/2024 5:55 PM, Ralf Damaschke wrote:
> Christian Weisgerber wrote:
>
>> If sufficiently many files accrue, find(1) will invoke ls(1) several
>> times, which will not produce the expected result. That may be unlikely
>> in this specific example, but it can happen in the general case.
>>
>> Wait, you say, xargs(1) will also split its input across multiple
>> invocations. I mean, that's very much the point of xargs. Which is why
>> Helmut added the -x flag, which is supposed to prevent this behavior.
>
> I see the point, but I hope I never meet a use case that says
> "do something with the files found, but throw the list away if it can't
> be done all at once". I would rather first assemble the list, try to execute
> the command with it and if needed switch to some different approach of
> handling the files.

Needing to process all of the files at once happens more often than you
might think, e.g. to merge CSVs we need to retain the header line from
just the first one read so the naive approach would be:

find . -type f -name '*.csv' -exec awk 'NR==1; FNR>1' {} +

but that would fail if `find` had to call awk for multiple batches of
files at a time as `NR==1` would then be true multiple times during the
execution of `find` and so the header lines from multiple files would be
printed. The solution is something like (untested):

awk -v RS='\0' '
NR == FNR { ARGV[ARGC++]=$0; next }
(FNR == 1) && !doneHdr++
FNR > 1
' < <(find . -type f -name '*.csv' -print0) RS='\n'

We have to read the output of `find` in `awk` to populate `ARGV[]`
instead of calling `awk` with the output of `find` as an argument list
because if that output is so long that `find` has to split it up in the
first script above, then it's also too long for `awk` to be passed as an
argument list. Having `-print0` is obviously useful in that situation.

Regards,

Ed.

Subject: xargs -x (was: IFS=$'\n')
From: Geoff Clare
Newsgroups: comp.unix.shell
Date: Thu, 15 Aug 2024 12:53 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: geoff@clare.See-My-Signature.invalid (Geoff Clare)
Newsgroups: comp.unix.shell
Subject: xargs -x (was: IFS=$'\n')
Date: Thu, 15 Aug 2024 13:53:04 +0100
Lines: 31
Message-ID: <gaa1pk-lia.ln1@ID-313840.user.individual.net>
References: <v9f5c1$3q99m$1@dont-email.me>
<83a5hgad4i.fsf@helmutwaitzmann.news.arcor.de>
<pan$989b5$e7369b52$2a19adc3$14541b89@y2plugh.fqdn.th-h.de>
<slrnvbpe2b.20ve.naddy@lorvorc.mips.inka.de>
Reply-To: netnews@gclare.org.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net vmbwvHomS5dddZfW0MimhwY5hxeDF/sDjO5PQv/maz6PSnXDeV
X-Orig-Path: ID-313840.user.individual.net!not-for-mail
Cancel-Lock: sha1:3xpJcilrux5j3UPtYmZOKn4LMEY= sha256:TMbaQkkupF5yJu2BEjc7az97k2q9Iv/1Rbmxrpct+GQ=
User-Agent: Pan/0.154 (Izium; 517acf4)
View all headers

Christian Weisgerber wrote:

> On 2024-08-13, Ralf Damaschke <rwspam@gmx.de> wrote:
>
>> find . -iname \*fred\* -type f -exec ls -lt -- {} +
>
> If sufficiently many files accrue, find(1) will invoke ls(1) several
> times, which will not produce the expected result. That may be
> unlikely in this specific example, but it can happen in the general
> case.
>
> Wait, you say, xargs(1) will also split its input across multiple
> invocations. I mean, that's very much the point of xargs. Which
> is why Helmut added the -x flag, which is supposed to prevent this
> behavior.

It isn't supposed to do that, and it doesn't.

$ echo 1234567890 1234567890 | xargs -s 50 echo
1234567890 1234567890
$ echo 1234567890 1234567890 | xargs -s 20 echo
1234567890
1234567890
$ echo 1234567890 1234567890 | xargs -x -s 20 echo
1234567890
1234567890

Tested with GNU, macOS (with -n 10 added), and Solaris versions of xargs.

--
Geoff Clare <netnews@gclare.org.uk>

1

rocksolid light 0.9.8
clearnet tor