Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #137: User was distributing pornography on server; system seized by FBI.


comp / comp.unix.shell / Re: Command Languages Versus Programming Languages

SubjectAuthor
* Re: Command Languages Versus Programming LanguagesBozo User
+* Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
|`* Re: Command Languages Versus Programming Languagesusuario
| `* Re: Command Languages Versus Programming LanguagesMuttley
|  `* Re: Command Languages Versus Programming Languagesusuario
|   `- Re: Command Languages Versus Programming LanguagesMuttley
`* Re: Command Languages Versus Programming LanguagesRainer Weikusat
 `* Re: Command Languages Versus Programming LanguagesMuttley
  +* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  |+* Re: Command Languages Versus Programming LanguagesMuttley
  ||+* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  |||+* Re: Command Languages Versus Programming LanguagesKaz Kylheku
  ||||`* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  |||| `* Re: Command Languages Versus Programming LanguagesBart
  ||||  `* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  ||||   `* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    +* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |+* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    ||+* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||`* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    ||| `* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||  `* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    |||   +* Re: Command Languages Versus Programming LanguagesScott Lurndal
  ||||    |||   |`- Re: Command Languages Versus Programming LanguagesMuttley
  ||||    |||   `* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||    `* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    |||     +* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||     |+* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    |||     ||+* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||     |||`* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    |||     ||| +* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||     ||| |`* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    |||     ||| | `* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||     ||| |  `* Re: Command Languages Versus Programming LanguagesMuttley
  ||||    |||     ||| |   `* Re: Command Languages Versus Programming LanguagesScott Lurndal
  ||||    |||     ||| |    `- Re: Command Languages Versus Programming LanguagesDavid Brown
  ||||    |||     ||| `- Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
  ||||    |||     ||`- Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
  ||||    |||     |`* Re: Command Languages Versus Programming LanguagesScott Lurndal
  ||||    |||     | `* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||     |  `* Re: Command Languages Versus Programming LanguagesBart
  ||||    |||     |   `* Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |||     |    `* Re: Command Languages Versus Programming LanguagesBart
  ||||    |||     |     `- Re: On overly rigid definitions (was Re: Command Languages Versus Programming LaDan Cross
  ||||    |||     `- Re: Command Languages Versus Programming LanguagesScott Lurndal
  ||||    ||`* Re: Command Languages Versus Programming LanguagesKaz Kylheku
  ||||    || +- Re: Command Languages Versus Programming LanguagesBart
  ||||    || `- Re: Command Languages Versus Programming LanguagesDan Cross
  ||||    |`- Re: Command Languages Versus Programming LanguagesScott Lurndal
  ||||    +* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  ||||    |`- Re: Command Languages Versus Programming LanguagesMuttley
  ||||    `* Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
  ||||     `* Re: Command Languages Versus Programming LanguagesMuttley
  ||||      +* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  ||||      |+* Re: Command Languages Versus Programming LanguagesChristian Weisgerber
  ||||      ||+- Re: Command Languages Versus Programming LanguagesMuttley
  ||||      ||`- Re: Command Languages Versus Programming LanguagesRainer Weikusat
  ||||      |`- Re: Command Languages Versus Programming LanguagesBart
  ||||      `* Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
  ||||       `* Re: Command Languages Versus Programming LanguagesMuttley
  ||||        `- Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
  |||+* Re: Command Languages Versus Programming LanguagesBart
  ||||`- Re: Command Languages Versus Programming LanguagesRainer Weikusat
  |||`* Re: Command Languages Versus Programming LanguagesMuttley
  ||| `- Re: Command Languages Versus Programming LanguagesRainer Weikusat
  ||`- Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
  |`* Re: Command Languages Versus Programming LanguagesEric Pozharski
  | `* Re: Command Languages Versus Programming LanguagesMuttley
  |  +- Re: Command Languages Versus Programming LanguagesJanis Papanagnou
  |  +* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  |  |`* Re: Command Languages Versus Programming LanguagesMuttley
  |  | `* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  |  |  `* Re: Command Languages Versus Programming LanguagesMuttley
  |  |   `* Re: Command Languages Versus Programming LanguagesRainer Weikusat
  |  |    `- Re: Command Languages Versus Programming LanguagesMuttley
  |  `- Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
  `* Re: Command Languages Versus Programming LanguagesSebastian
   +* Re: Command Languages Versus Programming LanguagesMuttley
   |+* Re: Command Languages Versus Programming LanguagesWolfgang Agnes
   ||`- Re: Command Languages Versus Programming LanguagesMuttley
   |+- Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
   |`* Re: Command Languages Versus Programming LanguagesJanis Papanagnou
   | `* Re: Command Languages Versus Programming LanguagesMuttley
   |  +* Re: Command Languages Versus Programming LanguagesJanis Papanagnou
   |  |`* Re: Command Languages Versus Programming LanguagesMuttley
   |  | `* Re: Command Languages Versus Programming LanguagesJanis Papanagnou
   |  |  `- Re: Command Languages Versus Programming LanguagesMuttley
   |  `- Re: Command Languages Versus Programming LanguagesWolfgang Agnes
   `* Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
    +* Re: Command Languages Versus Programming LanguagesJanis Papanagnou
    |+- Re: Command Languages Versus Programming LanguagesWolfgang Agnes
    |`- Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
    `* Re: Command Languages Versus Programming LanguagesRandal L. Schwartz
     +- Re: Command Languages Versus Programming LanguagesLawrence D'Oliveiro
     `* Re: Command Languages Versus Programming LanguagesMuttley
      +* Re: Command Languages Versus Programming LanguagesJanis Papanagnou
      |+* Re: Command Languages Versus Programming LanguagesMuttley
      ||`* Re: Command Languages Versus Programming LanguagesJanis Papanagnou
      || +* Re: Command Languages Versus Programming LanguagesMuttley
      || |`* Re: Command Languages Versus Programming LanguagesRainer Weikusat
      || | +* Re: Command Languages Versus Programming LanguagesJohn Ames
      || | `* Re: Command Languages Versus Programming LanguagesMuttley
      || `* Re: Command Languages Versus Programming LanguagesRainer Weikusat
      |`* Re: Command Languages Versus Programming LanguagesKaz Kylheku
      +* Re: Command Languages Versus Programming LanguagesEd Morton
      `* Re: Command Languages Versus Programming LanguagesRainer Weikusat

Pages:1234567
Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 13:30 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 13:30:34 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhq11q$nq7$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <875xohbxre.fsf@doppelsaurus.mobileactivedefense.com> <vhngoi$2p6$1@reader2.panix.com> <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 13:30:34 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="24391"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>
>>>[...]
>>>
>>>> Personally I think that writing bulky procedural stuff for something
>>>> like [0-9]+ can only be much worse, and that further abbreviations
>>>> like \d+ are the better direction to go if targeting a good interface.
>>>> YMMV.
>>>
>>>Assuming that p is a pointer to the current position in a string, e is a
>>>pointer to the end of it (ie, point just past the last byte) and -
>>>that's important - both are pointers to unsigned quantities, the 'bulky'
>>>C equivalent of [0-9]+ is
>>>
>>>while (p < e && *p - '0' < 10) ++p;
>>>
>>>That's not too bad. And it's really a hell lot faster than a
>>>general-purpose automaton programmed to recognize the same pattern
>>>(which might not matter most of the time, but sometimes, it does).
>>
>> It's also not exactly right. `[0-9]+` would match one or more
>> characters; this possibly matches 0 (ie, if `p` pointed to
>> something that wasn't a digit).
>
>The regex won't match any digits if there aren't any. In this case, the
>match will fail. I didn't include the code for handling that because it
>seemed pretty pointless for the example.

That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.

But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits. And then there are other matters of context; does the
user intend for the regexp to match the _whole_ string? Or any
portion of the string (a la `grep`)? So, for example, does the
string "aaa1234aaa" match `[0-9]+`? As written, the above
snippet is actually closer to advancing `p` over `^[0-9]*`. One
might differentiate between `*` and `+` after the fact, by
examining `p` against some (presumably saved) source value, but
that's more code.

These are just not equivalent. That's not to say that your
snippet is not _useful_ in context, but to pretend that it's the
same as the regular expression is pointlessly reductive.

By the way, something that _would_ match `^[0-9]+$` might be:

term% cat mdp.c
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static bool
mdigit(unsigned int c)
{ return c - '0' < 10;
}

bool
mdp(const char *str, const char *estr)
{ if (str == NULL || estr == NULL || str == estr)
return false;
if (!mdigit(*str))
return false;
while (str < estr && mdigit(*str))
str++;
return str == estr;
}

bool
probe(const char *s, bool expected)
{ if (mdp(s, s + strlen(s)) != expected) {
fprintf(stderr, "test failure: `%s` (expected %s)\n",
s, expected ? "true" : "false");
return false;
}
return true;
}

int
main(void)
{ bool success = true;

success = probe("1234", true) && success;
success = probe("", false) && success;
success = probe("ab", false) && success;
success = probe("0", true) && success;
success = probe("0123456789", true) && success;
success = probe("a0123456", false) && success;
success = probe("0123456b", false) && success;
success = probe("0123c456", false) && success;
success = probe("0123#456", false) && success;

return success ? EXIT_SUCCESS : EXIT_FAILURE;
} term% cc -Wall -Wextra -Werror -pedantic -std=c11 mdp.c -o mdp
term% ./mdp
term% echo $?
0 term%

Granted the test scaffolding and `#include` boilerplate makes
this appear rather longer than it would be in context, but it's
still not nearly as succinct.

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Rainer Weikusat
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Date: Fri, 22 Nov 2024 15:41 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rweikusat@talktalk.net (Rainer Weikusat)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 15:41:09 +0000
Lines: 79
Message-ID: <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
<875xohbxre.fsf@doppelsaurus.mobileactivedefense.com>
<vhngoi$2p6$1@reader2.panix.com>
<874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>
<vhq11q$nq7$1@reader2.panix.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net aYUphB09PBy6lcoaSj34Swm3i0lwPf03FmwSb7ble34V6ED2M=
Cancel-Lock: sha1:h9PE3JBvHCiqY64ZP/coYDEUmXk= sha1:CEuBJr0uRnd6WR1Cq99Ai8M9hxU= sha256:+AdCiP/4Fm8VtwatV2sOxScePGqkkSde0Lk3PYua570=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
View all headers

cross@spitfire.i.gajendra.net (Dan Cross) writes:
> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>
>>>>[...]
>>>>
>>>>> Personally I think that writing bulky procedural stuff for something
>>>>> like [0-9]+ can only be much worse, and that further abbreviations
>>>>> like \d+ are the better direction to go if targeting a good interface.
>>>>> YMMV.
>>>>
>>>>Assuming that p is a pointer to the current position in a string, e is a
>>>>pointer to the end of it (ie, point just past the last byte) and -
>>>>that's important - both are pointers to unsigned quantities, the 'bulky'
>>>>C equivalent of [0-9]+ is
>>>>
>>>>while (p < e && *p - '0' < 10) ++p;
>>>>
>>>>That's not too bad. And it's really a hell lot faster than a
>>>>general-purpose automaton programmed to recognize the same pattern
>>>>(which might not matter most of the time, but sometimes, it does).
>>>
>>> It's also not exactly right. `[0-9]+` would match one or more
>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>> something that wasn't a digit).
>>
>>The regex won't match any digits if there aren't any. In this case, the
>>match will fail. I didn't include the code for handling that because it
>>seemed pretty pointless for the example.
>
> That's rather the point though, isn't it? The program snippet
> (modulo the promotion to signed int via the "usual arithmetic
> conversions" before the subtraction and comparison giving you
> unexpected values; nothing to do with whether `char` is signed
> or not) is a snippet that advances a pointer while it points to
> a digit, starting at the current pointer position; that is, it
> just increments a pointer over a run of digits.

That's the core part of matching someting equivalent to the regex [0-9]+
and the only part of it is which is at least remotely interesting.

> But that's not the same as a regex matcher, which has a semantic
> notion of success or failure. I could run your snippet against
> a string such as, say, "ZZZZZZ" and it would "succeed" just as
> it would against an empty string or a string of one or more
> digits.

Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?

[...]

> By the way, something that _would_ match `^[0-9]+$` might be:

[too much code]

Something which would match [0-9]+ in its first argument (if any) would
be:

#include "string.h"
#include "stdlib.h"

int main(int argc, char **argv)
{ char *p;
unsigned c;

p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}

but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.

Subject: Re: Command Languages Versus Programming Languages
From: Rainer Weikusat
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Date: Fri, 22 Nov 2024 15:52 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rweikusat@talktalk.net (Rainer Weikusat)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 15:52:41 +0000
Lines: 23
Message-ID: <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
<875xohbxre.fsf@doppelsaurus.mobileactivedefense.com>
<vhngoi$2p6$1@reader2.panix.com>
<874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>
<vhq11q$nq7$1@reader2.panix.com>
<877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net RKpHm73RGsdSmv5ebKSNDQUh1YKgCkbw9Ud6bFKo+f8L2VXP8=
Cancel-Lock: sha1:nu23iFHOJs/HQQlVIFQW+yxDvjk= sha1:X+HXhanEoqdfRmFSQ+zjszjqeoI= sha256:OXifvwpxIIJU/o71cuCm6SFHxz4DV5hjUzGwDavMCdc=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
View all headers

Rainer Weikusat <rweikusat@talktalk.net> writes:

[...]

> Something which would match [0-9]+ in its first argument (if any) would
> be:
>
> #include "string.h"
> #include "stdlib.h"
>
> int main(int argc, char **argv)
> {
> char *p;
> unsigned c;
>
> p = argv[1];
> if (!p) exit(1);
> while (c = *p, c && c - '0' > 10) ++p;

This needs to be

while (c = *p, c && c - '0' > 9) ++p

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 17:17 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 17:17:46 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqebq$c71$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com> <vhq11q$nq7$1@reader2.panix.com> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 17:17:46 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="12513"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>> [snip]
>>>> It's also not exactly right. `[0-9]+` would match one or more
>>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>>> something that wasn't a digit).
>>>
>>>The regex won't match any digits if there aren't any. In this case, the
>>>match will fail. I didn't include the code for handling that because it
>>>seemed pretty pointless for the example.
>>
>> That's rather the point though, isn't it? The program snippet
>> (modulo the promotion to signed int via the "usual arithmetic
>> conversions" before the subtraction and comparison giving you
>> unexpected values; nothing to do with whether `char` is signed
>> or not) is a snippet that advances a pointer while it points to
>> a digit, starting at the current pointer position; that is, it
>> just increments a pointer over a run of digits.
>
>That's the core part of matching someting equivalent to the regex [0-9]+
>and the only part of it is which is at least remotely interesting.

Not really, no. The interesting thing in this case appears to
be knowing whether or not the match succeeded, but you omited
that part.

>> But that's not the same as a regex matcher, which has a semantic
>> notion of success or failure. I could run your snippet against
>> a string such as, say, "ZZZZZZ" and it would "succeed" just as
>> it would against an empty string or a string of one or more
>> digits.
>
>Why do you believe that p being equivalent to the starting position
>would be considered a "successful match", considering that this
>obviously doesn't make any sense?

Because absent any surrounding context, there's no indication
that the source is even saved. You'll note that I did mention
that as a means to differentiate later on, but that's not the
snippet you posted.

>[...]
>
>> By the way, something that _would_ match `^[0-9]+$` might be:
>
>[too much code]
>
>Something which would match [0-9]+ in its first argument (if any) would
>be:
>
>#include "string.h"
>#include "stdlib.h"
>
>int main(int argc, char **argv)
>{
> char *p;
> unsigned c;
>
> p = argv[1];
> if (!p) exit(1);
> while (c = *p, c && c - '0' > 10) ++p;
> if (!c) exit(1);
> return 0;
>}
>
>but that's 14 lines of text, 13 of which have absolutely no relation to
>the problem of recognizing a digit.

This is wrong in many ways. Did you actually test that program?

First of all, why `"string.h"` and not `<string.h>`? Ok, that's
not technically an error, but it's certainly unconventional, and
raises questions that are ultimately a distraction.

Second, suppose that `argc==0` (yes, this can happen under
POSIX).

Third, the loop: why `> 10`? Don't you mean `< 10`? You are
trying to match digits, not non-digits.

Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
at the end, but `!c` there means you've reached the end of the
string; which should be success.

Fifth and finally, you `return 0;` which is EXIT_SUCCESS, in the
failure case.

Compare:

#include <regex.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{ regex_t reprog;
int ret;

if (argc != 2) {
fprintf(stderr, "Usage: regexp pattern\n");
return(EXIT_FAILURE);
}
(void)regcomp(&reprog, "^[0-9]+$", REG_EXTENDED | REG_NOSUB);
ret = regexec(&reprog, argv[1], 0, NULL, 0);
regfree(&reprog);

return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
}

This is only marginally longer, but is correct.

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 17:18 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 17:18:26 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqed2$c71$2@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <vhq11q$nq7$1@reader2.panix.com> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 17:18:26 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="12513"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>Rainer Weikusat <rweikusat@talktalk.net> writes:
>
>[...]
>
>
>> Something which would match [0-9]+ in its first argument (if any) would
>> be:
>>
>> #include "string.h"
>> #include "stdlib.h"
>>
>> int main(int argc, char **argv)
>> {
>> char *p;
>> unsigned c;
>>
>> p = argv[1];
>> if (!p) exit(1);
>> while (c = *p, c && c - '0' > 10) ++p;
>
>This needs to be
>
>while (c = *p, c && c - '0' > 9) ++p

No, that's still wrong. Try actually running it.

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Rainer Weikusat
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Date: Fri, 22 Nov 2024 17:35 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rweikusat@talktalk.net (Rainer Weikusat)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 17:35:29 +0000
Lines: 31
Message-ID: <87v7wfrx26.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me> <vhq11q$nq7$1@reader2.panix.com>
<877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
<87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>
<vhqed2$c71$2@reader2.panix.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net P9e1qs6w9Yd9RbHSWz8NUAkxhn8fsv7f/4sXNsweia24JjxbU=
Cancel-Lock: sha1:uMyXjHzIe9ePdISeRVexBDKlsr8= sha1:ovfgefFPHIcuD6sm0yYExG40mkI= sha256:s9QyuRg5Z7b0ASxV2UeYtWek/oGh+JPiPL9SZUNHNH0=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
View all headers

cross@spitfire.i.gajendra.net (Dan Cross) writes:
> In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>Rainer Weikusat <rweikusat@talktalk.net> writes:
>>
>>[...]
>>
>>
>>> Something which would match [0-9]+ in its first argument (if any) would
>>> be:
>>>
>>> #include "string.h"
>>> #include "stdlib.h"
>>>
>>> int main(int argc, char **argv)
>>> {
>>> char *p;
>>> unsigned c;
>>>
>>> p = argv[1];
>>> if (!p) exit(1);
>>> while (c = *p, c && c - '0' > 10) ++p;
>>
>>This needs to be
>>
>>while (c = *p, c && c - '0' > 9) ++p
>
> No, that's still wrong. Try actually running it.

If you know something that's wrong with that, why not write it instead
of utilizing the claim for pointless (and wrong) snide remarks?

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 17:43 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 17:43:24 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqfrs$bit$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com> <vhqed2$c71$2@reader2.panix.com> <87v7wfrx26.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 17:43:24 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="11869"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <87v7wfrx26.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>Rainer Weikusat <rweikusat@talktalk.net> writes:
>>>
>>>[...]
>>>
>>>
>>>> Something which would match [0-9]+ in its first argument (if any) would
>>>> be:
>>>>
>>>> #include "string.h"
>>>> #include "stdlib.h"
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> char *p;
>>>> unsigned c;
>>>>
>>>> p = argv[1];
>>>> if (!p) exit(1);
>>>> while (c = *p, c && c - '0' > 10) ++p;
>>>
>>>This needs to be
>>>
>>>while (c = *p, c && c - '0' > 9) ++p
>>
>> No, that's still wrong. Try actually running it.
>
>If you know something that's wrong with that, why not write it instead
>of utilizing the claim for pointless (and wrong) snide remarks?

I did, at length, in my other post.

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 17:43 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 17:43:59 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqfsv$bit$2@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <vhqed2$c71$2@reader2.panix.com> <87v7wfrx26.fsf@doppelsaurus.mobileactivedefense.com> <vhqfrs$bit$1@reader2.panix.com>
Injection-Date: Fri, 22 Nov 2024 17:43:59 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="11869"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <vhqfrs$bit$1@reader2.panix.com>,
Dan Cross <cross@spitfire.i.gajendra.net> wrote:
>In article <87v7wfrx26.fsf@doppelsaurus.mobileactivedefense.com>,
>Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>> In article <87zflrs1ti.fsf@doppelsaurus.mobileactivedefense.com>,
>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>Rainer Weikusat <rweikusat@talktalk.net> writes:
>>>>
>>>>[...]
>>>>
>>>>
>>>>> Something which would match [0-9]+ in its first argument (if any) would
>>>>> be:
>>>>>
>>>>> #include "string.h"
>>>>> #include "stdlib.h"
>>>>>
>>>>> int main(int argc, char **argv)
>>>>> {
>>>>> char *p;
>>>>> unsigned c;
>>>>>
>>>>> p = argv[1];
>>>>> if (!p) exit(1);
>>>>> while (c = *p, c && c - '0' > 10) ++p;
>>>>
>>>>This needs to be
>>>>
>>>>while (c = *p, c && c - '0' > 9) ++p
>>>
>>> No, that's still wrong. Try actually running it.
>>
>>If you know something that's wrong with that, why not write it instead
>>of utilizing the claim for pointless (and wrong) snide remarks?
>
>I did, at length, in my other post.

Cf. <vhqebq$c71$1@reader2.panix.com>

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Rainer Weikusat
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Date: Fri, 22 Nov 2024 17:48 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rweikusat@talktalk.net (Rainer Weikusat)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 17:48:37 +0000
Lines: 100
Message-ID: <87o727rwga.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
<874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>
<vhq11q$nq7$1@reader2.panix.com>
<877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
<vhqebq$c71$1@reader2.panix.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net hfohPWedBstS1vxq6TSp9AJJRp2sKj1S2yXzvVrOkcOYErMl0=
Cancel-Lock: sha1:0otNO/ubBQK1Q70qb59n/1CodL8= sha1:YlOlUMnxHyitMH63oAAiChGNM/Q= sha256:3lBx0tIXs2ZdbLlEJ9byiSMpnlISnEtT9ucvh1kgUcc=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
View all headers

cross@spitfire.i.gajendra.net (Dan Cross) writes:
> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>>> [snip]
>>>>> It's also not exactly right. `[0-9]+` would match one or more
>>>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>>>> something that wasn't a digit).
>>>>
>>>>The regex won't match any digits if there aren't any. In this case, the
>>>>match will fail. I didn't include the code for handling that because it
>>>>seemed pretty pointless for the example.
>>>
>>> That's rather the point though, isn't it? The program snippet
>>> (modulo the promotion to signed int via the "usual arithmetic
>>> conversions" before the subtraction and comparison giving you
>>> unexpected values; nothing to do with whether `char` is signed
>>> or not) is a snippet that advances a pointer while it points to
>>> a digit, starting at the current pointer position; that is, it
>>> just increments a pointer over a run of digits.
>>
>>That's the core part of matching someting equivalent to the regex [0-9]+
>>and the only part of it is which is at least remotely interesting.
>
> Not really, no. The interesting thing in this case appears to
> be knowing whether or not the match succeeded, but you omited
> that part.

This of interest to you as it enables you to base an 'argumentation'
(sarcasm) on arbitrary assumptions you've chosen to make. It's not
something I consider interesting and it's besides the point of the
example I posted.

>>> But that's not the same as a regex matcher, which has a semantic
>>> notion of success or failure. I could run your snippet against
>>> a string such as, say, "ZZZZZZ" and it would "succeed" just as
>>> it would against an empty string or a string of one or more
>>> digits.
>>
>>Why do you believe that p being equivalent to the starting position
>>would be considered a "successful match", considering that this
>>obviously doesn't make any sense?
>
> Because absent any surrounding context, there's no indication
> that the source is even saved.

A text usually doesn't contain information about things which aren't
part of its content. I congratulate you to this rather obvious observation.

[...]

>>Something which would match [0-9]+ in its first argument (if any) would
>>be:
>>
>>#include "string.h"
>>#include "stdlib.h"
>>
>>int main(int argc, char **argv)
>>{
>> char *p;
>> unsigned c;
>>
>> p = argv[1];
>> if (!p) exit(1);
>> while (c = *p, c && c - '0' > 10) ++p;
>> if (!c) exit(1);
>> return 0;
>>}
>>
>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>the problem of recognizing a digit.
>
> This is wrong in many ways. Did you actually test that program?
>
> First of all, why `"string.h"` and not `<string.h>`? Ok, that's
> not technically an error, but it's certainly unconventional, and
> raises questions that are ultimately a distraction.

Such as your paragraph above.

> Second, suppose that `argc==0` (yes, this can happen under
> POSIX).

It can happen in case of some piece of functionally hostile software
intentionally creating such a situation. Tangential, irrelevant
point. If you break it, you get to keep the parts.

> Third, the loop: why `> 10`? Don't you mean `< 10`? You are
> trying to match digits, not non-digits.

Mistake I made. The opposite of < 10 is > 9.

> Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
> at the end, but `!c` there means you've reached the end of the
> string; which should be success.

Mistake you made: [0-9]+ matches if there's at least one digit in the
string. That's why the loop terminates once one was found. In this case,
c cannot be 0.

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 18:12 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 18:12:34 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqhii$d5e$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> <vhqebq$c71$1@reader2.panix.com> <87o727rwga.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 18:12:34 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="13486"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <87o727rwga.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>Something which would match [0-9]+ in its first argument (if any) would
>>>be:
>>>
>>>#include "string.h"
>>>#include "stdlib.h"
>>>
>>>int main(int argc, char **argv)
>>>{
>>> char *p;
>>> unsigned c;
>>>
>>> p = argv[1];
>>> if (!p) exit(1);
>>> while (c = *p, c && c - '0' > 10) ++p;
>>> if (!c) exit(1);
>>> return 0;
>>>}
>>>
>>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>>the problem of recognizing a digit.
>>
>> This is wrong in many ways. Did you actually test that program?
>>
>> First of all, why `"string.h"` and not `<string.h>`? Ok, that's
>> not technically an error, but it's certainly unconventional, and
>> raises questions that are ultimately a distraction.
>
>Such as your paragraph above.
>
>> Second, suppose that `argc==0` (yes, this can happen under
>> POSIX).
>
>It can happen in case of some piece of functionally hostile software
>intentionally creating such a situation. Tangential, irrelevant
>point. If you break it, you get to keep the parts.
>
>> Third, the loop: why `> 10`? Don't you mean `< 10`? You are
>> trying to match digits, not non-digits.
>
>Mistake I made. The opposite of < 10 is > 9.

I see. So you want to skip non-digits and exit the first time
you see a digit. Ok, fair enough, though that program has
already been written, and is called `grep`.

>> Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
>> at the end, but `!c` there means you've reached the end of the
>> string; which should be success.
>
>Mistake you made: [0-9]+ matches if there's at least one digit in the
>string. That's why the loop terminates once one was found. In this case,
>c cannot be 0.

Ah, you are trying to match `[0-9]` (though you're calling it
`[0-9]+`). Yeah, your program was not at all equivalent to one
I wrote, though this is what you posted in response to mine, so
I assumed you were trying to emulate that behavior (matching
`^[0-9]+$`).

But I see above that you mentioned `[0-9]+`. But as I mentioned
above, really you're just matching any digit, so you may as well
be matching `[0-9]`; again, this not the same as the actual
regexp, because you are ignoring the semantics of what regular
expressions actually describe.

In any event, this seems simpler than what you posted:

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{ if (argc != 2) {
fprintf(stderr, "Usage: matchd <str>\n");
return EXIT_FAILURE;
}

for (const char *p = argv[1]; *p != '\0'; p++)
if ('0' <= *p && *p <= '9')
return EXIT_SUCCESS;

return EXIT_FAILURE;
}

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Scott Lurndal
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 22 Nov 2024 18:14 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx40.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Command Languages Versus Programming Languages
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
References: <uu54la$3su5b$6@dont-email.me> <875xohbxre.fsf@doppelsaurus.mobileactivedefense.com> <vhngoi$2p6$1@reader2.panix.com> <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com> <vhq11q$nq7$1@reader2.panix.com> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
Lines: 109
Message-ID: <sS30P.4663$YSkc.427@fx40.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 22 Nov 2024 18:14:48 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 22 Nov 2024 18:14:48 GMT
X-Received-Bytes: 4396
View all headers

Rainer Weikusat <rweikusat@talktalk.net> writes:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>>
>>>>>[...]
>>>>>
>>>>>> Personally I think that writing bulky procedural stuff for something
>>>>>> like [0-9]+ can only be much worse, and that further abbreviations
>>>>>> like \d+ are the better direction to go if targeting a good interface.
>>>>>> YMMV.
>>>>>
>>>>>Assuming that p is a pointer to the current position in a string, e is a
>>>>>pointer to the end of it (ie, point just past the last byte) and -
>>>>>that's important - both are pointers to unsigned quantities, the 'bulky'
>>>>>C equivalent of [0-9]+ is
>>>>>
>>>>>while (p < e && *p - '0' < 10) ++p;
>>>>>
>>>>>That's not too bad. And it's really a hell lot faster than a
>>>>>general-purpose automaton programmed to recognize the same pattern
>>>>>(which might not matter most of the time, but sometimes, it does).
>>>>
>>>> It's also not exactly right. `[0-9]+` would match one or more
>>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>>> something that wasn't a digit).
>>>
>>>The regex won't match any digits if there aren't any. In this case, the
>>>match will fail. I didn't include the code for handling that because it
>>>seemed pretty pointless for the example.
>>
>> That's rather the point though, isn't it? The program snippet
>> (modulo the promotion to signed int via the "usual arithmetic
>> conversions" before the subtraction and comparison giving you
>> unexpected values; nothing to do with whether `char` is signed
>> or not) is a snippet that advances a pointer while it points to
>> a digit, starting at the current pointer position; that is, it
>> just increments a pointer over a run of digits.
>
>That's the core part of matching someting equivalent to the regex [0-9]+
>and the only part of it is which is at least remotely interesting.
>
>> But that's not the same as a regex matcher, which has a semantic
>> notion of success or failure. I could run your snippet against
>> a string such as, say, "ZZZZZZ" and it would "succeed" just as
>> it would against an empty string or a string of one or more
>> digits.
>
>Why do you believe that p being equivalent to the starting position
>would be considered a "successful match", considering that this
>obviously doesn't make any sense?
>
>[...]
>
>> By the way, something that _would_ match `^[0-9]+$` might be:
>
>[too much code]
>
>Something which would match [0-9]+ in its first argument (if any) would
>be:
>
>#include "string.h"
>#include "stdlib.h"
>
>int main(int argc, char **argv)
>{
> char *p;
> unsigned c;
>
> p = argv[1];
> if (!p) exit(1);
> while (c = *p, c && c - '0' > 10) ++p;
> if (!c) exit(1);
> return 0;
>}
>
>but that's 14 lines of text, 13 of which have absolutely no relation to
>the problem of recognizing a digit.

Personally, I'd use:

$ cat /tmp/a.c
#include <stdint.h>
#include <string.h>

int
main(int argc, const char **argv)
{ char *cp;
uint64_t value;

if (argc < 2) return 1;

value = strtoull(argv[1], &cp, 10);
if ((cp == argv[1])
|| (*cp != '\0')) {
return 1;
}
return 0;
} $ cc -o /tmp/a /tmp/a.c
$ /tmp/a 13254
$ echo $?
0 $ /tmp/a 23v23
$ echo $?
1

Subject: Re: Command Languages Versus Programming Languages
From: Kaz Kylheku
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: A noiseless patient Spider
Date: Fri, 22 Nov 2024 18:18 UTC
References: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 643-408-1753@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 18:18:04 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 70
Message-ID: <20241122101217.134@kylheku.com>
References: <uu54la$3su5b$6@dont-email.me> <87edbtz43p.fsf@tudado.org>
<0d2cnVzOmbD6f4z7nZ2dnZfqnPudnZ2d@brightview.co.uk>
<uusur7$2hm6p$1@dont-email.me> <vdf096$2c9hb$8@dont-email.me>
<87a5fdj7f2.fsf@doppelsaurus.mobileactivedefense.com>
<ve83q2$33dfe$1@dont-email.me> <vgsbrv$sko5$1@dont-email.me>
<vgtslt$16754$1@dont-email.me> <86frnmmxp7.fsf@red.stonehenge.com>
<vhk65t$o5i$1@dont-email.me> <vhkev7$29sc$1@dont-email.me>
<20241121110710.49@kylheku.com> <vhpl9c$14mdr$1@dont-email.me>
Injection-Date: Fri, 22 Nov 2024 19:18:04 +0100 (CET)
Injection-Info: dont-email.me; posting-host="9c13f6f155aa81285f56a101d8a781a7";
logging-data="1355154"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+1YjAEdL3pBFp3AfMoZy2zKLSkP+/qOCU="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:XoG/QI4HVnBAoehaicgMevaqX3g=
View all headers

On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
> On Thu, 21 Nov 2024 19:12:03 -0000 (UTC)
> Kaz Kylheku <643-408-1753@kylheku.com> boring babbled:
>>On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>>> I'm curious what you mean by Regexps presented in a "procedural" form.
>>> Can you give some examples?
>>
>>Here is an example: using a regex match to capture a C comment /* ... */
>>in Lex compared to just recognizing the start sequence /* and handling
>>the discarding of the comment in the action.
>>
>>Without non-greedy repetition matching, the regex for a C comment is
>>quite obtuse. The procedural handling is straightforward: read
>>characters until you see a * immediately followed by a /.
>
> Its not that simple I'm afraid since comments can be commented out.

Umm, no.
>
> eg:
>
> // int i; /*

This /* sequence is inside a // comment, and so the machinery that
recognizes /* as the start of a comment would never see it.

Just like "int i;" is in a string literal and so not recognized
as a keyword, whitespace, identifier and semicolon.

> int j;
> /*
> int k;
> */
> ++j;
>
> A C99 and C++ compiler would see "int j" and compile it, a regex would
> simply remove everything from the first /* to */.

No, it won't, because that's not how regexes are used in a lexical
analyzer. At the start of the input, the lexical analyzer faces
the characters "// int i; /*\n". This will trigger the pattern match
for // comments. Essentially that entire sequence through the newline
is treated as a kind of token, equivalent to a space.

Once a token is recognized and removed from the input, it is gone;
no other regular expression can match into it.

> Also the same probably applies to #ifdef's.

Lexically analyzing C requires implementing the translation phases
as described in the standard. There are preprocessor phases which
delimit the input into preprocessor tokens (pp-tokens). Comments
are stripped in preprocessing. But logical lines (backslash
continuations) are recognized below comments; i.e. this is one
comment:

\\ comment \
split \
into \
physical \
lines

A lexical scanner can have an input routine which transparently handles
this low-level detail, so that it doesn't have to deal with the
line continuations in every token pattern.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Subject: Re: Command Languages Versus Programming Languages
From: Kaz Kylheku
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: A noiseless patient Spider
Date: Fri, 22 Nov 2024 18:19 UTC
References: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 643-408-1753@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 18:19:30 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <20241122101819.161@kylheku.com>
References: <uu54la$3su5b$6@dont-email.me> <87edbtz43p.fsf@tudado.org>
<0d2cnVzOmbD6f4z7nZ2dnZfqnPudnZ2d@brightview.co.uk>
<uusur7$2hm6p$1@dont-email.me> <vdf096$2c9hb$8@dont-email.me>
<87a5fdj7f2.fsf@doppelsaurus.mobileactivedefense.com>
<ve83q2$33dfe$1@dont-email.me> <vgsbrv$sko5$1@dont-email.me>
<vgtslt$16754$1@dont-email.me> <86frnmmxp7.fsf@red.stonehenge.com>
<vhk65t$o5i$1@dont-email.me> <vhkev7$29sc$1@dont-email.me>
<20241121110710.49@kylheku.com> <vhpp96$15bjl$1@dont-email.me>
Injection-Date: Fri, 22 Nov 2024 19:19:30 +0100 (CET)
Injection-Info: dont-email.me; posting-host="9c13f6f155aa81285f56a101d8a781a7";
logging-data="1355154"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18LDuchuLzlP4azkg+rMyMbF9wSR2pfxNI="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:JOnqzSI8MBDTaVDSvoM2qsqIgU8=
View all headers

On 2024-11-22, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> On 21.11.2024 20:12, Kaz Kylheku wrote:
>> [...]
>>
>> In the wild, you see regexes being used for all sorts of stupid stuff,
>
> No one can prevent folks using features for stupid things. Yes.

But the thing is that "modern" regular expressions (Perl regex and its
progeny) have features that are designed to exclusively cater to these
folks.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Subject: Re: Command Languages Versus Programming Languages
From: Scott Lurndal
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 22 Nov 2024 18:22 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!news.quux.org!weretis.net!feeder9.news.weretis.net!news.cmpublishers.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!news-out.netnews.com!netnews.com!s1-4.netnews.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx40.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Command Languages Versus Programming Languages
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
References: <uu54la$3su5b$6@dont-email.me> <875xohbxre.fsf@doppelsaurus.mobileactivedefense.com> <vhngoi$2p6$1@reader2.panix.com> <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com> <vhq11q$nq7$1@reader2.panix.com> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> <sS30P.4663$YSkc.427@fx40.iad>
Lines: 115
Message-ID: <VZ30P.4664$YSkc.1894@fx40.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 22 Nov 2024 18:22:45 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 22 Nov 2024 18:22:45 GMT
X-Received-Bytes: 4669
View all headers

scott@slp53.sl.home (Scott Lurndal) writes:
>Rainer Weikusat <rweikusat@talktalk.net> writes:
>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>>>
>>>>>>[...]
>>>>>>
>>>>>>> Personally I think that writing bulky procedural stuff for something
>>>>>>> like [0-9]+ can only be much worse, and that further abbreviations
>>>>>>> like \d+ are the better direction to go if targeting a good interface.
>>>>>>> YMMV.
>>>>>>
>>>>>>Assuming that p is a pointer to the current position in a string, e is a
>>>>>>pointer to the end of it (ie, point just past the last byte) and -
>>>>>>that's important - both are pointers to unsigned quantities, the 'bulky'
>>>>>>C equivalent of [0-9]+ is
>>>>>>
>>>>>>while (p < e && *p - '0' < 10) ++p;
>>>>>>
>>>>>>That's not too bad. And it's really a hell lot faster than a
>>>>>>general-purpose automaton programmed to recognize the same pattern
>>>>>>(which might not matter most of the time, but sometimes, it does).
>>>>>
>>>>> It's also not exactly right. `[0-9]+` would match one or more
>>>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>>>> something that wasn't a digit).
>>>>
>>>>The regex won't match any digits if there aren't any. In this case, the
>>>>match will fail. I didn't include the code for handling that because it
>>>>seemed pretty pointless for the example.
>>>
>>> That's rather the point though, isn't it? The program snippet
>>> (modulo the promotion to signed int via the "usual arithmetic
>>> conversions" before the subtraction and comparison giving you
>>> unexpected values; nothing to do with whether `char` is signed
>>> or not) is a snippet that advances a pointer while it points to
>>> a digit, starting at the current pointer position; that is, it
>>> just increments a pointer over a run of digits.
>>
>>That's the core part of matching someting equivalent to the regex [0-9]+
>>and the only part of it is which is at least remotely interesting.
>>
>>> But that's not the same as a regex matcher, which has a semantic
>>> notion of success or failure. I could run your snippet against
>>> a string such as, say, "ZZZZZZ" and it would "succeed" just as
>>> it would against an empty string or a string of one or more
>>> digits.
>>
>>Why do you believe that p being equivalent to the starting position
>>would be considered a "successful match", considering that this
>>obviously doesn't make any sense?
>>
>>[...]
>>
>>> By the way, something that _would_ match `^[0-9]+$` might be:
>>
>>[too much code]
>>
>>Something which would match [0-9]+ in its first argument (if any) would
>>be:
>>
>>#include "string.h"
>>#include "stdlib.h"
>>
>>int main(int argc, char **argv)
>>{
>> char *p;
>> unsigned c;
>>
>> p = argv[1];
>> if (!p) exit(1);
>> while (c = *p, c && c - '0' > 10) ++p;
>> if (!c) exit(1);
>> return 0;
>>}
>>
>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>the problem of recognizing a digit.
>
>Personally, I'd use:

Albeit this is limited to strings of digits that sum to less than
ULONG_MAX...

>
>$ cat /tmp/a.c
>#include <stdint.h>
>#include <string.h>
>
>int
>main(int argc, const char **argv)
>{
> char *cp;
> uint64_t value;
>
> if (argc < 2) return 1;
>
> value = strtoull(argv[1], &cp, 10);
> if ((cp == argv[1])
> || (*cp != '\0')) {
> return 1;
> }
> return 0;
>}
>$ cc -o /tmp/a /tmp/a.c
>$ /tmp/a 13254
>$ echo $?
>0
>$ /tmp/a 23v23
>$ echo $?
>1

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 18:30 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 18:30:31 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqik7$nn0$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> <sS30P.4663$YSkc.427@fx40.iad> <VZ30P.4664$YSkc.1894@fx40.iad>
Injection-Date: Fri, 22 Nov 2024 18:30:31 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="24288"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <VZ30P.4664$YSkc.1894@fx40.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:
>scott@slp53.sl.home (Scott Lurndal) writes:
>>Rainer Weikusat <rweikusat@talktalk.net> writes:
>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>>>>
>>>>>>>[...]
>>>>>>>
>>>>>>>> Personally I think that writing bulky procedural stuff for something
>>>>>>>> like [0-9]+ can only be much worse, and that further abbreviations
>>>>>>>> like \d+ are the better direction to go if targeting a good interface.
>>>>>>>> YMMV.
>>>>>>>
>>>>>>>Assuming that p is a pointer to the current position in a string, e is a
>>>>>>>pointer to the end of it (ie, point just past the last byte) and -
>>>>>>>that's important - both are pointers to unsigned quantities, the 'bulky'
>>>>>>>C equivalent of [0-9]+ is
>>>>>>>
>>>>>>>while (p < e && *p - '0' < 10) ++p;
>>>>>>>
>>>>>>>That's not too bad. And it's really a hell lot faster than a
>>>>>>>general-purpose automaton programmed to recognize the same pattern
>>>>>>>(which might not matter most of the time, but sometimes, it does).
>>>>>>
>>>>>> It's also not exactly right. `[0-9]+` would match one or more
>>>>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>>>>> something that wasn't a digit).
>>>>>
>>>>>The regex won't match any digits if there aren't any. In this case, the
>>>>>match will fail. I didn't include the code for handling that because it
>>>>>seemed pretty pointless for the example.
>>>>
>>>> That's rather the point though, isn't it? The program snippet
>>>> (modulo the promotion to signed int via the "usual arithmetic
>>>> conversions" before the subtraction and comparison giving you
>>>> unexpected values; nothing to do with whether `char` is signed
>>>> or not) is a snippet that advances a pointer while it points to
>>>> a digit, starting at the current pointer position; that is, it
>>>> just increments a pointer over a run of digits.
>>>
>>>That's the core part of matching someting equivalent to the regex [0-9]+
>>>and the only part of it is which is at least remotely interesting.
>>>
>>>> But that's not the same as a regex matcher, which has a semantic
>>>> notion of success or failure. I could run your snippet against
>>>> a string such as, say, "ZZZZZZ" and it would "succeed" just as
>>>> it would against an empty string or a string of one or more
>>>> digits.
>>>
>>>Why do you believe that p being equivalent to the starting position
>>>would be considered a "successful match", considering that this
>>>obviously doesn't make any sense?
>>>
>>>[...]
>>>
>>>> By the way, something that _would_ match `^[0-9]+$` might be:
>>>
>>>[too much code]
>>>
>>>Something which would match [0-9]+ in its first argument (if any) would
>>>be:
>>>
>>>#include "string.h"
>>>#include "stdlib.h"
>>>
>>>int main(int argc, char **argv)
>>>{
>>> char *p;
>>> unsigned c;
>>>
>>> p = argv[1];
>>> if (!p) exit(1);
>>> while (c = *p, c && c - '0' > 10) ++p;
>>> if (!c) exit(1);
>>> return 0;
>>>}
>>>
>>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>>the problem of recognizing a digit.
>>
>>Personally, I'd use:
>
>Albeit this is limited to strings of digits that sum to less than
>ULONG_MAX...

It's not quite equivalent to his program, which just exit's with
success if it sees any input string with a digit in it; your's
is closer to what I wrote, which matches `^[0-9]+$`. His is not
an interesting program and certainly not a recognizable
equivalent a regular expression matcher in any reasonable sense,
but I think the cognitive dissonance is too strong to get that
across.

- Dan C.

>>$ cat /tmp/a.c
>>#include <stdint.h>
>>#include <string.h>
>>
>>int
>>main(int argc, const char **argv)
>>{
>> char *cp;
>> uint64_t value;
>>
>> if (argc < 2) return 1;
>>
>> value = strtoull(argv[1], &cp, 10);
>> if ((cp == argv[1])
>> || (*cp != '\0')) {
>> return 1;
>> }
>> return 0;
>>}
>>$ cc -o /tmp/a /tmp/a.c
>>$ /tmp/a 13254
>>$ echo $?
>>0
>>$ /tmp/a 23v23
>>$ echo $?
>>1

Subject: Re: Command Languages Versus Programming Languages
From: Rainer Weikusat
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Date: Fri, 22 Nov 2024 18:48 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rweikusat@talktalk.net (Rainer Weikusat)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 18:48:55 +0000
Lines: 49
Message-ID: <87h67zrtns.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
<877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
<vhqebq$c71$1@reader2.panix.com>
<87o727rwga.fsf@doppelsaurus.mobileactivedefense.com>
<vhqhii$d5e$1@reader2.panix.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net YS8e0gRyMBEiEaWSUSJgKA2jTsJfxfYCXyVcDLZ7+KlY4Q+aE=
Cancel-Lock: sha1:23pSVbbu9KcP3hkWBk7RGGWeqSU= sha1:HTufyZ15xopopEWP8hDFvBeqp7k= sha256:WHu9+MCUSSUEUP8DKBABPjCN2ErMg76Bm4gM46hZR3E=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
View all headers

cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

> In any event, this seems simpler than what you posted:
>
> #include <stddef.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> int
> main(int argc, char *argv[])
> {
> if (argc != 2) {
> fprintf(stderr, "Usage: matchd <str>\n");
> return EXIT_FAILURE;
> }
>
> for (const char *p = argv[1]; *p != '\0'; p++)
> if ('0' <= *p && *p <= '9')
> return EXIT_SUCCESS;
>
> return EXIT_FAILURE;
> }

It's not only 4 lines longer but in just about every individual aspect
syntactically more complicated and more messy and functionally more
clumsy. This is particularly noticable in the loop

for (const char *p = argv[1]; *p != '\0'; p++)
if ('0' <= *p && *p <= '9')
return EXIT_SUCCESS;

the loop header containing a spuriously qualified variable declaration,
the loop body and half of the termination condition. The other half then
follows as special-case in the otherwise useless loop body.

It looks like a copy of my code which each individual bit redesigned
under the guiding principle of "Can we make this more complicated?", eg,

char **argv

declares an array of pointers (as each pointer in C points to an array)
and

char *argv[]

accomplishes exactly the same but uses both more characters and more
different kinds of characters.

Subject: Re: Command Languages Versus Programming Languages
From: Rainer Weikusat
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Date: Fri, 22 Nov 2024 18:59 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rweikusat@talktalk.net (Rainer Weikusat)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 18:59:43 +0000
Lines: 58
Message-ID: <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
<875xohbxre.fsf@doppelsaurus.mobileactivedefense.com>
<vhngoi$2p6$1@reader2.panix.com>
<874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>
<vhq11q$nq7$1@reader2.panix.com>
<877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
<sS30P.4663$YSkc.427@fx40.iad>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net +Q0OL9El3NdnKOAkupJSKg0V6ZEwdHAq48dmXtlEdkGq5g5Wk=
Cancel-Lock: sha1:ILq2vul4SsPFvOKlRGJSZ9o1KA8= sha1:7hJ4qW9RkmiEklspC40RVrctrck= sha256:1htV1YK6ai/38YEq1g1uWnYsJenlFg/dzXyoPAYpCkk=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
View all headers

scott@slp53.sl.home (Scott Lurndal) writes:
> Rainer Weikusat <rweikusat@talktalk.net> writes:

[...]

>>Something which would match [0-9]+ in its first argument (if any) would
>>be:
>>
>>#include "string.h"
>>#include "stdlib.h"
>>
>>int main(int argc, char **argv)
>>{
>> char *p;
>> unsigned c;
>>
>> p = argv[1];
>> if (!p) exit(1);
>> while (c = *p, c && c - '0' > 10) ++p;
>> if (!c) exit(1);
>> return 0;
>>}
>>
>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>the problem of recognizing a digit.
>
> Personally, I'd use:
>
> $ cat /tmp/a.c
> #include <stdint.h>
> #include <string.h>
>
> int
> main(int argc, const char **argv)
> {
> char *cp;
> uint64_t value;
>
> if (argc < 2) return 1;
>
> value = strtoull(argv[1], &cp, 10);
> if ((cp == argv[1])
> || (*cp != '\0')) {
> return 1;
> }
> return 0;
> }

This will accept a string of digits whose numerical value is <=
ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
content limits.

return !strstr(argv[1], "0123456789");

would be a better approximation, just a much more complicated algorithm
than necessary. Even in strictly conforming ISO-C "digitness" of a
character can be determined by a simple calculation instead of some kind
of search loop.

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 19:05 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 19:05:42 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqkm6$7dv$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <87o727rwga.fsf@doppelsaurus.mobileactivedefense.com> <vhqhii$d5e$1@reader2.panix.com> <87h67zrtns.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 19:05:42 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="7615"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <87h67zrtns.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>
>[...]
>
>> In any event, this seems simpler than what you posted:
>>
>> #include <stddef.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int
>> main(int argc, char *argv[])
>> {
>> if (argc != 2) {
>> fprintf(stderr, "Usage: matchd <str>\n");
>> return EXIT_FAILURE;
>> }
>>
>> for (const char *p = argv[1]; *p != '\0'; p++)
>> if ('0' <= *p && *p <= '9')
>> return EXIT_SUCCESS;
>>
>> return EXIT_FAILURE;
>> }
>
>It's not only 4 lines longer but in just about every individual aspect
>syntactically more complicated and more messy and functionally more
>clumsy.

That's a lot of opinion, and not particularly well-founded
opinion at that, given that your code was incorrect to begin
with.

>This is particularly noticable in the loop
>
> for (const char *p = argv[1]; *p != '\0'; p++)
> if ('0' <= *p && *p <= '9')
> return EXIT_SUCCESS;
>
>the loop header containing a spuriously qualified variable declaration,

Ibid. Const qualifying a pointer that I'm not going to assign
through is just good hygiene, IMHO.

>the loop body and half of the termination condition.

I think you're trying to project a value judgement onto that
loop in order to make it fit a particular world view, but I
think this is an odd way to look at it.

Another way to loop at it is that the loop is only concerned
with the iteration over the string, while the body is concerned
with applying some predicate to the element, and doing something
if that predicate evaluates it to true.

>The other half then
>follows as special-case in the otherwise useless loop body.

That's a way to look at it, but I submit that's an outlier point
of view.

>It looks like a copy of my code which each individual bit redesigned
>under the guiding principle of "Can we make this more complicated?", eg,

Uh, no.

>char **argv
>
>declares an array of pointers

No, it declares a pointer to a pointer to char.

>(as each pointer in C points to an array)

That's absolutely not true. A pointer in C may refer to
an array, or a scalar. Consider,

char c;
char *p = &c;
char **pp = &p;

For a concrete example of how this works in a real function,
consider the second argument to `strtol` et al in the standard
library.

>and
>
>char *argv[]
>
>accomplishes exactly the same but uses both more characters and more
>different kinds of characters.

"more characters" is a poor metric.

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 19:15 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 19:15:07 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhql7r$7dv$2@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> <sS30P.4663$YSkc.427@fx40.iad> <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 19:15:07 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="7615"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>scott@slp53.sl.home (Scott Lurndal) writes:
>> Rainer Weikusat <rweikusat@talktalk.net> writes:
>
>[...]
>
>>>Something which would match [0-9]+ in its first argument (if any) would
>>>be:
>>>
>>>#include "string.h"
>>>#include "stdlib.h"
>>>
>>>int main(int argc, char **argv)
>>>{
>>> char *p;
>>> unsigned c;
>>>
>>> p = argv[1];
>>> if (!p) exit(1);
>>> while (c = *p, c && c - '0' > 10) ++p;
>>> if (!c) exit(1);
>>> return 0;
>>>}
>>>
>>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>>the problem of recognizing a digit.
>>
>> Personally, I'd use:
>>
>> $ cat /tmp/a.c
>> #include <stdint.h>
>> #include <string.h>
>>
>> int
>> main(int argc, const char **argv)
>> {
>> char *cp;
>> uint64_t value;
>>
>> if (argc < 2) return 1;
>>
>> value = strtoull(argv[1], &cp, 10);
>> if ((cp == argv[1])
>> || (*cp != '\0')) {
>> return 1;
>> }
>> return 0;
>> }
>
>This will accept a string of digits whose numerical value is <=
>ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
>content limits.

He acknowledged this already.

>return !strstr(argv[1], "0123456789");
>
>would be a better approximation,

No it wouldn't. That's not even close. `strstr` looks for an
instance of its second argument in its first, not an instance of
any character in it's second argument in its first. Perhaps you
meant something with `strspn` or similar. E.g.,

const char *p = argv[1] + strspn(argv[1], "0123456789");
return *p != '\0';

>just a much more complicated algorithm
>than necessary. Even in strictly conforming ISO-C "digitness" of a
>character can be determined by a simple calculation instead of some kind
>of search loop.

Yes, one can do that, but why bother?

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Janis Papanagnou
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: A noiseless patient Spider
Date: Fri, 22 Nov 2024 19:20 UTC
References: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 20:20:06 +0100
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <vhqlh7$1a8kb$1@dont-email.me>
References: <uu54la$3su5b$6@dont-email.me> <87edbtz43p.fsf@tudado.org>
<0d2cnVzOmbD6f4z7nZ2dnZfqnPudnZ2d@brightview.co.uk>
<uusur7$2hm6p$1@dont-email.me> <vdf096$2c9hb$8@dont-email.me>
<87a5fdj7f2.fsf@doppelsaurus.mobileactivedefense.com>
<ve83q2$33dfe$1@dont-email.me> <vgsbrv$sko5$1@dont-email.me>
<vgtslt$16754$1@dont-email.me> <86frnmmxp7.fsf@red.stonehenge.com>
<vhk65t$o5i$1@dont-email.me> <vhkev7$29sc$1@dont-email.me>
<20241121110710.49@kylheku.com> <vhpp96$15bjl$1@dont-email.me>
<20241122101819.161@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Nov 2024 20:20:07 +0100 (CET)
Injection-Info: dont-email.me; posting-host="05d615c761813a70ef328ddc0419c718";
logging-data="1385099"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/HM1/2EYCQMv/LsCmgBTfM"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:i3bTUmNdbrzGzyB24mko7TvtTVk=
In-Reply-To: <20241122101819.161@kylheku.com>
X-Enigmail-Draft-Status: N1110
View all headers

On 22.11.2024 19:19, Kaz Kylheku wrote:
> On 2024-11-22, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> On 21.11.2024 20:12, Kaz Kylheku wrote:
>>> [...]
>>>
>>> In the wild, you see regexes being used for all sorts of stupid stuff,
>>
>> No one can prevent folks using features for stupid things. Yes.
>
> But the thing is that "modern" regular expressions (Perl regex and its
> progeny) have features that are designed to exclusively cater to these
> folks.

Which ones are you specifically thinking of?

Since I'm not using Perl I don't know all the Perl RE details. Besides
the basic REs I'm aware of the abbreviations (like '\d') (that I like),
then extensions of Chomsky-3 (like back-references) (that I also like
to have in cases I need them; but one must know what we buy with them),
then the minimum-match (as opposed to matching the longest substring)
(which I think is useful to simplify some types of expressions), and
there was another one that evades my memories, something like context
dependent patterns (also useful), and wasn't there also some syntax to
match subexpression-hierarchies (useful as well) (similar like in GNU
Awk's gensub() (probably in a more primitive variant there), and also
existing in Kornshell patterns that also supports some more from above
[Perl-]features, like the abbreviations).

Janis

Subject: Re: Command Languages Versus Programming Languages
From: Rainer Weikusat
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Date: Fri, 22 Nov 2024 19:24 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rweikusat@talktalk.net (Rainer Weikusat)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 19:24:23 +0000
Lines: 38
Message-ID: <878qtbrs0o.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
<87o727rwga.fsf@doppelsaurus.mobileactivedefense.com>
<vhqhii$d5e$1@reader2.panix.com>
<87h67zrtns.fsf@doppelsaurus.mobileactivedefense.com>
<vhqkm6$7dv$1@reader2.panix.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net UEJH8Ea+V6CMLPjq5cNdbQr145npXssFUWBNmxRcUThEF7zVo=
Cancel-Lock: sha1:a1+leBdFrlYTHnUZg1Ib2Lq3TZ4= sha1:KTTXPD4cOTzop+F8zu8yg9v8kOA= sha256:0XPdoWFHGOMuBJYm4trWjTcz39feQICJoHxWYN0F5P0=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
View all headers

cross@spitfire.i.gajendra.net (Dan Cross) writes:
> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>
>>[...]
>>
>>> In any event, this seems simpler than what you posted:
>>>
>>> #include <stddef.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>>
>>> int
>>> main(int argc, char *argv[])
>>> {
>>> if (argc != 2) {
>>> fprintf(stderr, "Usage: matchd <str>\n");
>>> return EXIT_FAILURE;
>>> }
>>>
>>> for (const char *p = argv[1]; *p != '\0'; p++)
>>> if ('0' <= *p && *p <= '9')
>>> return EXIT_SUCCESS;
>>>
>>> return EXIT_FAILURE;
>>> }
>>
>>It's not only 4 lines longer but in just about every individual aspect
>>syntactically more complicated and more messy and functionally more
>>clumsy.
>
> That's a lot of opinion, and not particularly well-founded
> opinion at that, given that your code was incorrect to begin
> with.

That's not at all an opinion but an observation. My opinion on this is
that this is either a poor man's attempt at winning an obfuscation
context or - simpler - exemplary bad code.

Subject: Re: Command Languages Versus Programming Languages
From: Rainer Weikusat
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Date: Fri, 22 Nov 2024 19:26 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rweikusat@talktalk.net (Rainer Weikusat)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 19:26:07 +0000
Lines: 70
Message-ID: <874j3zrrxs.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
<877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
<sS30P.4663$YSkc.427@fx40.iad>
<87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>
<vhql7r$7dv$2@reader2.panix.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net AvLdivSdaGhmXbrSatpxdwGXF3VXAETXGp3+g3aS8Ghsxbb6E=
Cancel-Lock: sha1:TRbBKA8cIQBeM3CihgOGC8HFc/I= sha1:I3Xbk/LjmPyejb/X/pmUVUeOzts= sha256:shAz6KU+RkcTJQEqnUqgpmFdlKTFQAHKqjpNVVM0wwE=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
View all headers

cross@spitfire.i.gajendra.net (Dan Cross) writes:
> In article <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>scott@slp53.sl.home (Scott Lurndal) writes:
>>> Rainer Weikusat <rweikusat@talktalk.net> writes:
>>
>>[...]
>>
>>>>Something which would match [0-9]+ in its first argument (if any) would
>>>>be:
>>>>
>>>>#include "string.h"
>>>>#include "stdlib.h"
>>>>
>>>>int main(int argc, char **argv)
>>>>{
>>>> char *p;
>>>> unsigned c;
>>>>
>>>> p = argv[1];
>>>> if (!p) exit(1);
>>>> while (c = *p, c && c - '0' > 10) ++p;
>>>> if (!c) exit(1);
>>>> return 0;
>>>>}
>>>>
>>>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>>>the problem of recognizing a digit.
>>>
>>> Personally, I'd use:
>>>
>>> $ cat /tmp/a.c
>>> #include <stdint.h>
>>> #include <string.h>
>>>
>>> int
>>> main(int argc, const char **argv)
>>> {
>>> char *cp;
>>> uint64_t value;
>>>
>>> if (argc < 2) return 1;
>>>
>>> value = strtoull(argv[1], &cp, 10);
>>> if ((cp == argv[1])
>>> || (*cp != '\0')) {
>>> return 1;
>>> }
>>> return 0;
>>> }
>>
>>This will accept a string of digits whose numerical value is <=
>>ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
>>content limits.
>
> He acknowledged this already.
>
>>return !strstr(argv[1], "0123456789");
>>
>>would be a better approximation,
>
> No it wouldn't. That's not even close. `strstr` looks for an
> instance of its second argument in its first, not an instance of
> any character in it's second argument in its first. Perhaps you
> meant something with `strspn` or similar. E.g.,
>
> const char *p = argv[1] + strspn(argv[1], "0123456789");
> return *p != '\0';

My bad.

Subject: Re: Command Languages Versus Programming Languages
From: Janis Papanagnou
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: A noiseless patient Spider
Date: Fri, 22 Nov 2024 19:33 UTC
References: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 20:33:24 +0100
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <vhqma5$1adbr$1@dont-email.me>
References: <uu54la$3su5b$6@dont-email.me> <87edbtz43p.fsf@tudado.org>
<0d2cnVzOmbD6f4z7nZ2dnZfqnPudnZ2d@brightview.co.uk>
<uusur7$2hm6p$1@dont-email.me> <vdf096$2c9hb$8@dont-email.me>
<87a5fdj7f2.fsf@doppelsaurus.mobileactivedefense.com>
<ve83q2$33dfe$1@dont-email.me> <vgsbrv$sko5$1@dont-email.me>
<vgtslt$16754$1@dont-email.me> <86frnmmxp7.fsf@red.stonehenge.com>
<vhk65t$o5i$1@dont-email.me> <vhkev7$29sc$1@dont-email.me>
<vhkh94$2oi3$1@dont-email.me> <vhkvpi$5h8v$1@dont-email.me>
<875xohbxre.fsf@doppelsaurus.mobileactivedefense.com>
<vhpp2q$15aen$1@dont-email.me>
<87wmgvzdlh.fsf@doppelsaurus.mobileactivedefense.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 22 Nov 2024 20:33:25 +0100 (CET)
Injection-Info: dont-email.me; posting-host="a15370d329738e752ba59c760f877a0e";
logging-data="1389947"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/pFwURTbMTeciZpoWWvCj/"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:juEqifc6lyiqSRWM505ya3hpEOY=
In-Reply-To: <87wmgvzdlh.fsf@doppelsaurus.mobileactivedefense.com>
X-Enigmail-Draft-Status: N1110
View all headers

On 22.11.2024 12:56, Rainer Weikusat wrote:
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>> On 20.11.2024 18:50, Rainer Weikusat wrote:
>>>[...]
>>> while (p < e && *p - '0' < 10) ++p;
>>>
>>> That's not too bad. And it's really a hell lot faster than a
>>> general-purpose automaton programmed to recognize the same pattern
>>> (which might not matter most of the time, but sometimes, it does).
>>
>> Okay, I see where you're coming from (and especially in that simple
>> case).
>>
>> Personally (and YMMV), even here in this simple case I think that
>> using pointers is not better but worse - and anyway isn't [in this
>> form] available in most languages;
>
> That's a question of using the proper tool for the job. In C, that's
> pointer and pointer arithmetic because it's the simplest way to express
> something like this.

Yes, in "C" you'd use that primitive (error-prone) pointer feature.
That's what I said. And that in other languages it's less terse than
in "C" but equally error-prone if you have to create all the parsing
code yourself (without an existing engine and in a non-standard way).
And if you extend the expression to parse it's IME much simpler done
in Regex than adjusting the algorithm of the ad hoc procedural code.

>
>> in other cases (and languages)
>> such constructs get yet more clumsy, and for my not very complex
>> example - /[0-9]+(ABC)?x*foo/ - even a "catastrophe" concerning
>> readability, error-proneness, and maintainability.
>
> Procedural code for matching strings constructed in this way is
> certainly much simplerĀ¹ than the equally procedural code for a
> programmable automaton capable of interpreting regexes.

The point is that Regexps and the equivalence to FSA (with guaranteed
runtime complexity) is an [efficient] abstraction with a formalized
syntax; that are huge advantages compared to ad hoc parsing code in C
(or in any other language).

> Your statement
> is basically "If we assume that the code interpreting regexes doesn't
> exist, regexes need much less code than something equivalent which does
> exist." Without this assumption, the picture becomes a different one
> altogether.

I don't speak of assumptions. I speak about the fact that there's a
well-understood model with existing [parsing-]implementations already
available to handle a huge class of algorithms in a standardized way
with a guaranteed runtime-efficiency and in an error-resilient way.

Janis

> [...]

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 19:46 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 19:46:31 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqn2n$7dc$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <87h67zrtns.fsf@doppelsaurus.mobileactivedefense.com> <vhqkm6$7dv$1@reader2.panix.com> <878qtbrs0o.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 19:46:31 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="7596"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <878qtbrs0o.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>
>>>[...]
>>>
>>>> In any event, this seems simpler than what you posted:
>>>>
>>>> #include <stddef.h>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>>
>>>> int
>>>> main(int argc, char *argv[])
>>>> {
>>>> if (argc != 2) {
>>>> fprintf(stderr, "Usage: matchd <str>\n");
>>>> return EXIT_FAILURE;
>>>> }
>>>>
>>>> for (const char *p = argv[1]; *p != '\0'; p++)
>>>> if ('0' <= *p && *p <= '9')
>>>> return EXIT_SUCCESS;
>>>>
>>>> return EXIT_FAILURE;
>>>> }
>>>
>>>It's not only 4 lines longer but in just about every individual aspect
>>>syntactically more complicated and more messy and functionally more
>>>clumsy.
>>
>> That's a lot of opinion, and not particularly well-founded
>> opinion at that, given that your code was incorrect to begin
>> with.
>
>That's not at all an opinion but an observation. My opinion on this is
>that this is either a poor man's attempt at winning an obfuscation
>context or - simpler - exemplary bad code.

Opinion (noun)
a view or judgment formed about something, not necessarily based on
fact or knowledge. "I'm writing to voice my opinion on an issue of
little importance"

You mentioned snark earlier. Physician, heal thyself.

- Dan C.

Subject: Re: Command Languages Versus Programming Languages
From: Dan Cross
Newsgroups: comp.unix.shell, comp.unix.programmer, comp.lang.misc
Organization: PANIX Public Access Internet and UNIX, NYC
Date: Fri, 22 Nov 2024 19:51 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 19:51:18 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqnbm$7dc$2@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com> <vhql7r$7dv$2@reader2.panix.com> <874j3zrrxs.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 19:51:18 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
logging-data="7596"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
View all headers

In article <874j3zrrxs.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> In article <87cyinrt5s.fsf@doppelsaurus.mobileactivedefense.com>,
>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>scott@slp53.sl.home (Scott Lurndal) writes:
>>>> Rainer Weikusat <rweikusat@talktalk.net> writes:
>>>
>>>[...]
>>>
>>>>>Something which would match [0-9]+ in its first argument (if any) would
>>>>>be:
>>>>>
>>>>>#include "string.h"
>>>>>#include "stdlib.h"
>>>>>
>>>>>int main(int argc, char **argv)
>>>>>{
>>>>> char *p;
>>>>> unsigned c;
>>>>>
>>>>> p = argv[1];
>>>>> if (!p) exit(1);
>>>>> while (c = *p, c && c - '0' > 10) ++p;
>>>>> if (!c) exit(1);
>>>>> return 0;
>>>>>}
>>>>>
>>>>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>>>>the problem of recognizing a digit.
>>>>
>>>> Personally, I'd use:
>>>>
>>>> $ cat /tmp/a.c
>>>> #include <stdint.h>
>>>> #include <string.h>
>>>>
>>>> int
>>>> main(int argc, const char **argv)
>>>> {
>>>> char *cp;
>>>> uint64_t value;
>>>>
>>>> if (argc < 2) return 1;
>>>>
>>>> value = strtoull(argv[1], &cp, 10);
>>>> if ((cp == argv[1])
>>>> || (*cp != '\0')) {
>>>> return 1;
>>>> }
>>>> return 0;
>>>> }
>>>
>>>This will accept a string of digits whose numerical value is <=
>>>ULLONG_MAX, ie, it's basically ^[0-9]+$ with unobvious length and
>>>content limits.
>>
>> He acknowledged this already.
>>
>>>return !strstr(argv[1], "0123456789");
>>>
>>>would be a better approximation,
>>
>> No it wouldn't. That's not even close. `strstr` looks for an
>> instance of its second argument in its first, not an instance of
>> any character in it's second argument in its first. Perhaps you
>> meant something with `strspn` or similar. E.g.,
>>
>> const char *p = argv[1] + strspn(argv[1], "0123456789");
>> return *p != '\0';
>
>My bad.

You've made a lot of "bad"s in this thread, and been rude about
it to boot, crying foul when someone's pointed out ways that
your code is deficient; claiming offense at what you perceive as
"snark" while dishing the same out in kind, making basic errors
that show you haven't done the barest minimum of testing, and
making statements that show you have, at best, a limited grasp
on the language you're choosing to use.

I'm done being polite. My conclusion is that perhaps you are
not as up on these things as you seem to think that you are.

- Dan C.

Pages:1234567

rocksolid light 0.9.8
clearnet tor