Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

Repartee is something we think of twenty-four hours too late. -- Mark Twain


comp / comp.lang.python / Re: Correct syntax for pathological re.search()

SubjectAuthor
* Correct syntax for pathological re.search()Michael F. Stemper
+* Re: Correct syntax for pathological re.search()Stefan Ram
|`* Re: Correct syntax for pathological re.search()Michael F. Stemper
| `* Re: Correct syntax for pathological re.search()Stefan Ram
|  +- Re: Correct syntax for pathological re.search()Jon Ribbens
|  `* Re: Correct syntax for pathological re.search()Pieter van Oostrum
|   `- Re: Correct syntax for re.search() (Posting On Python-List Prohibited)Lawrence D'Oliveiro
+- Re: Correct syntax for pathological re.search()Karsten Hilbert
+- Re: Correct syntax for pathological re.search()MRAB
+* Re: Correct syntax for pathological re.search()MRAB
|`* Re: Correct syntax for pathological re.search()Stefan Ram
| `- Re: Correct syntax for pathological re.search()Stefan Ram
+* Re: Correct syntax for pathological re.search()Karsten Hilbert
|`* Re: Correct syntax for pathological re.search()Alan Bawden
| +- Re: Correct syntax for pathological re.search()MRAB
| `- Re: Correct syntax for pathological re.search()Karsten Hilbert
`* Re: Correct syntax for pathological re.search()Gilmeh Serda
 +- RE: Correct syntax for pathological re.search()<avi.e.gross
 +- Re: Correct syntax for pathological re.search()MRAB
 +* Re: Correct syntax for pathological re.search()Peter J. Holzer
 |`- Re: Correct syntax for pathological re.search()Stefan Ram
 +- Re: Correct syntax for pathological re.search()Thomas Passin
 +- RE: Correct syntax for pathological re.search()<avi.e.gross
 +- Re: Correct syntax for pathological re.search()Thomas Passin
 +- Re: Correct syntax for pathological re.search()Stefan Ram
 `* Re: Correct syntax for pathological re.search()Peter J. Holzer
  `* Re: Correct syntax for pathological re.search()jak
   `* Re: Correct syntax for pathological re.search()Peter J. Holzer
    `- Re: Correct syntax for pathological re.search()Stefan Ram

Pages:12
Subject: Correct syntax for pathological re.search()
From: Michael F. Stemper
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Mon, 7 Oct 2024 13:35 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: michael.stemper@gmail.com (Michael F. Stemper)
Newsgroups: comp.lang.python
Subject: Correct syntax for pathological re.search()
Date: Mon, 7 Oct 2024 08:35:32 -0500
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <ve0o34$1nep4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 07 Oct 2024 15:35:32 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2fb237cfcd73a2d67e8da345097d3cab";
logging-data="1817380"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Muubx+7RpYUm/66uvPyIiCcJiJKovOfw="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:CambQ+HgEWeuspn1phOU5f16kHU=
Content-Language: en-US
View all headers

I'm trying to discard lines that include the string "\sout{" (which is TeX, for
those who are curious. I have tried:
if not re.search("\sout{", line):
if not re.search("\sout\{", line):
if not re.search("\\sout{", line):
if not re.search("\\sout\{", line):

But the lines with that string keep coming through. What is the right syntax to
properly escape the backslash and the left curly bracket?

--
Michael F. Stemper
No animals were harmed in the composition of this message.

Subject: Re: Correct syntax for pathological re.search()
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Mon, 7 Oct 2024 13:56 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: 7 Oct 2024 13:56:51 GMT
Organization: Stefan Ram
Lines: 38
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <backslashes-20241007145600@ram.dialup.fu-berlin.de>
References: <ve0o34$1nep4$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de 6/jxDRujriLniuIHYBO5Ug/Tdry8fSUzId9mmkj6i8P6Gl
Cancel-Lock: sha1:Jb+YL5wH5SYtMCaN0ea7vUpIiuA= sha256:fBA4cTfCnf/S4CJP6LmsRqred3jDihrZLWirdEIXP7w=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

"Michael F. Stemper" <michael.stemper@gmail.com> wrote or quoted:
> if not re.search("\\sout\{", line):

So, if you're not down to slap an "r" before your string literals,
you're going to end up doubling down on every backslash.

Long story short, those double backslashes in your regex?
They'll be quadrupling up in your Python string literal!

main.py

import re

lines = r'''
abcdef
\sout{abcdef
abcdef
abc\sout{def
abcdef
abcdef\sout{
abcdef
'''.strip().split( '\n' )

for line in lines:
product = re.search( "\\\\sout\\{", line )
if not product:
print( line )

stdout

abcdef
abcdef
abcdef
abcdef

Subject: Re: Correct syntax for pathological re.search()
From: Michael F. Stemper
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Mon, 7 Oct 2024 14:14 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: michael.stemper@gmail.com (Michael F. Stemper)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Mon, 7 Oct 2024 09:14:53 -0500
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <ve0qct$1o839$1@dont-email.me>
References: <ve0o34$1nep4$1@dont-email.me>
<backslashes-20241007145600@ram.dialup.fu-berlin.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 07 Oct 2024 16:14:53 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2fb237cfcd73a2d67e8da345097d3cab";
logging-data="1843305"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19/NlHuFrCqxwRzEaDZTbM9MlWjHAXyyXY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:neq/pUcQF0MElTtNErV+fSNvsw0=
In-Reply-To: <backslashes-20241007145600@ram.dialup.fu-berlin.de>
Content-Language: en-US
View all headers

On 07/10/2024 08.56, Stefan Ram wrote:
> "Michael F. Stemper" <michael.stemper@gmail.com> wrote or quoted:
>> if not re.search("\\sout\{", line):
>
> So, if you're not down to slap an "r" before your string literals,
> you're going to end up doubling down on every backslash.

Never heard of that before, but it did the trick.

> Long story short, those double backslashes in your regex?
> They'll be quadrupling up in your Python string literal!

> for line in lines:
> product = re.search( "\\\\sout\\{", line )

This also worked.

For now, I'll use the "r" in a cargo-cult fashion, until I decide which
syntax I prefer. (Is there any reason that one or the other is preferable?)

Thanks for your help,
Mike
--
Michael F. Stemper
Economists have correctly predicted seven of the last three recessions.

Subject: Re: Correct syntax for pathological re.search()
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Mon, 7 Oct 2024 14:32 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: 7 Oct 2024 14:32:06 GMT
Organization: Stefan Ram
Lines: 29
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <backslash-20241007152827@ram.dialup.fu-berlin.de>
References: <ve0o34$1nep4$1@dont-email.me> <backslashes-20241007145600@ram.dialup.fu-berlin.de> <ve0qct$1o839$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de 6JCbnLYYm4xMXI1x6R/M2A9J/h9mlvdZ3v17g409aFAvvr
Cancel-Lock: sha1:BdBuwZwMQqPNsWTN70cfjBnWf1w= sha256:JyeBoJVDHbOvROy1s1ZqICN1YfgSD+ooKGXVk5oSNFc=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

"Michael F. Stemper" <michael.stemper@gmail.com> wrote or quoted:
>For now, I'll use the "r" in a cargo-cult fashion, until I decide which
>syntax I prefer. (Is there any reason that one or the other is preferable?)

I'd totally go with the r-style notation!

It's got one bummer though - you can't end such a string literal with
a backslash. But hey, no biggie, you could use one of those notations:

main.py

path = r'C:\Windows\example' + '\\'

print( path )

path = r'''
C:\Windows\example\
'''.strip()

print( path )

stdout

C:\Windows\example\
C:\Windows\example\

.

Subject: Re: Correct syntax for pathological re.search()
From: Jon Ribbens
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Mon, 7 Oct 2024 15:43 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jon+usenet@unequivocal.eu (Jon Ribbens)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Mon, 7 Oct 2024 15:43:59 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <slrnvg80dv.5tq2.jon+usenet@raven.unequivocal.eu>
References: <ve0o34$1nep4$1@dont-email.me>
<backslashes-20241007145600@ram.dialup.fu-berlin.de>
<ve0qct$1o839$1@dont-email.me>
<backslash-20241007152827@ram.dialup.fu-berlin.de>
Injection-Date: Mon, 07 Oct 2024 17:44:00 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="82502c7b71e38751ed0a47bd08141775";
logging-data="1869812"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18gKjcBLCMC9+L/85A9wtQ8SEILmX5EXQY="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:bpZNzSFsynJJ2vnr10sILYjvqo0=
View all headers

On 2024-10-07, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
> "Michael F. Stemper" <michael.stemper@gmail.com> wrote or quoted:
>>For now, I'll use the "r" in a cargo-cult fashion, until I decide which
>>syntax I prefer. (Is there any reason that one or the other is preferable?)
>
> I'd totally go with the r-style notation!
>
> It's got one bummer though - you can't end such a string literal with
> a backslash. But hey, no biggie, you could use one of those notations:
>
> main.py
>
> path = r'C:\Windows\example' + '\\'
>
> print( path )
>
> path = r'''
> C:\Windows\example\
> '''.strip()
>
> print( path )
>
> stdout
>
> C:\Windows\example\
> C:\Windows\example\
>
> .

.... although of course in this example you should probably do neither of
those things, and instead do:

from pathlib import Path
path = Path(r'C:\Windows\example')

since in a Path the trailing '\' or '/' is unnecessary. Which leaves
very few remaining uses for a raw-string with a trailing '\'...

Subject: Re: Correct syntax for pathological re.search()
From: Pieter van Oostrum
Newsgroups: comp.lang.python
Date: Tue, 8 Oct 2024 17:50 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: pieter-l@vanoostrum.org (Pieter van Oostrum)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Tue, 08 Oct 2024 19:50:14 +0200
Lines: 11
Message-ID: <m2cyka8ox5.fsf@cochabamba.home>
References: <ve0o34$1nep4$1@dont-email.me>
<backslashes-20241007145600@ram.dialup.fu-berlin.de>
<ve0qct$1o839$1@dont-email.me>
<backslash-20241007152827@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net VlLsFWOmrhTZ4IJJdM5Tgw2SLuJX+lxPlQtIizELhWvTOO2e5o
Cancel-Lock: sha1:DKPqgxCoAJRgl9/Gj7QFBzI32A8= sha1:w/Vh1FUcSbnRsYOfnHPs8+ef0Kw= sha256:VlU00b7Z4XQpQnq0aceBmwq+v792UKzqG+IPj2oD9Y4=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (darwin)
View all headers

ram@zedat.fu-berlin.de (Stefan Ram) writes:

> "Michael F. Stemper" <michael.stemper@gmail.com> wrote or quoted:
>
> path = r'C:\Windows\example' + '\\'
>
You could even omit the '+'. Then the concatenation is done at parsing time instead of run time.
--
Pieter van Oostrum <pieter@vanoostrum.org>
www: http://pieter.vanoostrum.org/
PGP key: [8DAE142BE17999C4]

Subject: Re: Correct syntax for pathological re.search()
From: Karsten Hilbert
Newsgroups: comp.lang.python
Date: Tue, 8 Oct 2024 18:30 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: Karsten.Hilbert@gmx.net (Karsten Hilbert)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Tue, 8 Oct 2024 20:30:34 +0200
Lines: 17
Sender: <karsten.hilbert@gmx.net>
Message-ID: <mailman.7.1728412237.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de usy8+uLSuJHAk1/t1oCh1QqaesRuG/Urdte6xnChSuGg==
Cancel-Lock: sha1:LHWDmOa0E/z5Ud2GKf/HgxxVetU= sha256:jozlwFZlTvheqVMp2YeM6q2G+XqlldAJleFdmi3Alls=
Return-Path: <karsten.hilbert@gmx.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmx.net header.i=karsten.hilbert@gmx.net
header.b=RMDzYrb3; dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.003
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '(which': 0.04;
'received:212.227': 0.07; 'string': 0.07; 'gpg': 0.09; 'karsten':
0.09; 'line:': 0.09; 'schrieb': 0.09; '1713': 0.16; '2024': 0.16;
'discard': 0.16; 'subject:syntax': 0.16; 'to:addr:python-list':
0.20; 'lines': 0.23; "i'm": 0.33; 'subject:for': 0.33; 'header:In-
Reply-To:1': 0.34; 'trying': 0.35; 'mon,': 0.36; 'those': 0.36;
'michael': 0.60; 'received:212': 0.62; 'skip:r 20': 0.64; 'skip:d
30': 0.86
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net;
s=s31663417; t=1728412235; x=1729017035; i=karsten.hilbert@gmx.net;
bh=i4B00AP1eroUYGsN7WnjgybdaQtiBWi3vfUeAq2ttek=;
h=X-UI-Sender-Class:Date:From:To:Subject:Message-ID:References:
MIME-Version:Content-Type:In-Reply-To:Content-Transfer-Encoding:
cc:content-transfer-encoding:content-type:date:from:message-id:
mime-version:reply-to:subject:to;
b=RMDzYrb3VigNhMr5DuzTjmaqng+dMVRuMIrHp1VoatpPHfJfyASLnQWjDAMyPcVf
NyJYUjUUSsSZJ8MlErc5bYGJcjZl1i1cUi0HEFJ+5RO1Mw9X1kNTUXwBb/YER29zD
trvW6kALIIOioaXu2DmE9cuSyUsr+RnyZHwEerDmSmsSRg7ExKwzD/yB6484ngk29
iS1YTkgR6jHB2mdimAzrKTCkuEW1i+JTG4uo833TJkWxJp0f+ne80TPJ5KQmNcjFW
tqGBKXYVmO89THBiGYo+/J22tdzhvL+3aNZp1pDIbW+/qNy98n7X2DYWfjhd/x7iB
FcCj48eICjScEz30UA==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Content-Disposition: inline
In-Reply-To: <ve0o34$1nep4$1@dont-email.me>
Ma_X_il-Followup-to: d
Re_X_turn-receipt-to: Karsten.Hilbert@gmx.net
Di_X_sposition-Notification-To: Karsten.Hilbert@gmx.net
X-Confi_X_rm-Reading-To: Karsten.Hilbert@gmx.net
X-Pri_X_ority: 2 (High)
X-Provags-ID: V03:K1:7BLJHKJHGTdS6k7dFtsnO5VvdktoHaNKgjIZ8mGD+9vKnuyK0kz
C7FgBuAx20GwKlaUcq2pyBLQl35DVcd8TSBfi4nPvsXMdo+E6w2jXvMIo214r3uCNtTA0xv
oOisRV8oN5c0fymLQS08BK9f83kf2nLBMwPu1/e7oxAE3N7I9qLwS72ANIGFfqg+WM4UMEU
ZyYAtXayZHnF/QS+kGFlw==
X-Spam-Flag: NO
UI-OutboundReport: notjunk:1;M01:P0:x/SId0kDbcA=;LX+m6XvBrL2nJk+xtKt7xv1wpr4
WscAR9F2Mmkh+7zmq05PDPCnaHfxqHJ6XM8jR5sxF2HKxc/xGUs5hvy/RuNxHQS4PeCJyBPGQ
rXb2yv9dQ6E9YsE1uS0KJw8b2kpaz1sM+N+85bvkZoGdkbCBHOt4OGcEPHK5F06Gqp6WOLB2d
Mxsj2abUv0aIOE/J48OLOIQCuntk+otlF3LDJ9XOk7i80jASxtJMK+zM7PVIsHFPhtAgRTJX0
q2FLlOawozM7m777A0QGKGyEUMKWzE/zJAXCH6zNkpOovGPDQf4DoPKagssZKn3Refo1zSmUa
qh3DvmqHUbrhsmr1NS0nZdwjsigsEWCVtZMlkF2oZ8KV7d0zHmbD6JzJmKDt8xI4A+4tR+wm1
jKuSR2Sz/lPcL108jazgKMTFexsJpTtjIRCaasVi82GeEDCH6NltXnEFrYsT2jO2yQ485OcXN
PxI0gx2wq/k2Nmlg1rtfgIz9Hj8Dwn3qWkRnfovUfW5hsr2Iy3B9jeQzw1En6i/VkrCEEDV7Y
WAjgWr57cueHtUtrzNOrWPWL1Ew/BjMFlOx3hK14uZBO0oAnWq7m+nz3hG/J30GEfoVwxqRwf
TjeiFbw0VA/zHX3zdAqOBG5tv8g3lH3UspGnPxkHY6GfuhjN6CSMCNnbkA2KxR8X2NuvO4Ks2
MUUKMpHKeS3rS7GXRQb2xP6PbEhmVOcokB8006QuSXwR0gXQXCxpdLqlVlzeoW6z416hibAb/
vfK5uUVb99aQMjmP0kuRf5MMGemkD6kyxjxfiFA4aGAWLbSme+CamjBOerbrZ2Tic1YpNh+g7
V+t4joVfG0Ab7eLpvBESBveg==
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
View all headers

Am Mon, Oct 07, 2024 at 08:35:32AM -0500 schrieb Michael F. Stemper via Python-list:

> I'm trying to discard lines that include the string "\sout{" (which is TeX, for
> those who are curious. I have tried:
> if not re.search("\sout{", line):
> if not re.search("\sout\{", line):
> if not re.search("\\sout{", line):
> if not re.search("\\sout\{", line):

unwanted_tex = '\sout{'
if unwanted_tex not in line: do_something_with_libreoffice()

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

Subject: Re: Correct syntax for pathological re.search()
From: MRAB
Newsgroups: comp.lang.python
Date: Tue, 8 Oct 2024 19:07 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: python@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Tue, 8 Oct 2024 20:07:04 +0100
Lines: 21
Message-ID: <mailman.8.1728414612.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
<1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de U82hT1ZEICATcgnqu26A7AybtOHTgMPJAEkD8m1KkpXA==
Cancel-Lock: sha1:UUzhUPBLx3QqO2O0xoZs7v0PLjQ= sha256:kog8G46LTOb8hJqVFc41BS86yyZmdXO1JiIyG0Br+8Y=
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=hBBGa8xJ;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04; 'string':
0.07; 'from:addr:python': 0.09; 'karsten': 0.09; 'line:': 0.09;
'received:192.168.1.64': 0.09; 'schrieb': 0.09; '2024': 0.16;
'discard': 0.16; 'from:addr:mrabarnett.plus.com': 0.16;
'from:name:mrab': 0.16; 'hilbert': 0.16; 'message-
id:@mrabarnett.plus.com': 0.16; 'received:plus.net': 0.16;
'subject:syntax': 0.16; 'wrote:': 0.16; 'to:addr:python-list':
0.20; 'lines': 0.23; 'header:User-Agent:1': 0.30; 'python-list':
0.32; 'received:192.168.1': 0.32; "i'm": 0.33; 'subject:for':
0.33; 'header:In-Reply-To:1': 0.34; 'trying': 0.35; 'mon,': 0.36;
'those': 0.36; 'received:192.168': 0.37; 'should': 0.40;
'michael': 0.60; 'received:212': 0.62; 'skip:r 20': 0.64; 'skip:d
30': 0.86
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1728414425; bh=lXF565XwzhjIFZmAU7mjMy3qNxEZJ0RkjWJ8KzA9fw4=;
h=Date:Subject:To:References:From:In-Reply-To;
b=hBBGa8xJRjj2RisPUJHbsne94RJYz2GiTxK2YFRdmVogysIdnM0lIXt/+xkl/X4kD
/D2qlAwixoIV4A9GA+PIZ4OpYJQ1o4nn6q/R0upRcgPYYaFvcvxjvkSTk2IBgAGa2t
qObcBEI+BCcDPI+Ab6wyi7/E5N8PsMrFx2bE16EkSMhZyXz06BTz6xdX4Becamm4YG
Fp1dNpMkEFEKcdi/4jpBLndRsX7gOiORfSNVUBO3stx9E21GkKryQt0uj6PNgqsBbY
mbL/JpIbH+yYFw2B/L8/FRrb9iQDgv/zkYg4gyLxlZMs9iofeVXV85vKP/DAPrDyv4
2MHofVEMjdTUg==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=GMarEfNK c=1 sm=1 tr=0 ts=670582d9
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=KgR_ETGeKby0tAqDUmAA:9 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
X-CMAE-Envelope: MS4xfOGN8lmKCuzbmuzHvAgjtGjwzUSFOCUtL5h1akYrEq+NuhljVxzC7sxt3iP2Md6QdYd1aLto9Lk1OAob4oGsktMRjWrjvjBWHZVS92iiMlRdk8Nzyzqs
rgCoNTXpz4qhV5KCzs1qESJIYsaGbyPuSzgTqimd3kek/QQhM6booNLw4P2POnD1/3SExdDFMy+oofTEJv25SzECwaNVMVjE2VY=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
View all headers

On 2024-10-08 19:30, Karsten Hilbert via Python-list wrote:
> Am Mon, Oct 07, 2024 at 08:35:32AM -0500 schrieb Michael F. Stemper via Python-list:
>
>> I'm trying to discard lines that include the string "\sout{" (which is TeX, for
>> those who are curious. I have tried:
>> if not re.search("\sout{", line):
>> if not re.search("\sout\{", line):
>> if not re.search("\\sout{", line):
>> if not re.search("\\sout\{", line):
>
> unwanted_tex = '\sout{'
> if unwanted_tex not in line: do_something_with_libreoffice()
>
That should be:

unwanted_tex = r'\sout{'

or:

unwanted_tex = '\\sout{'

Subject: Re: Correct syntax for pathological re.search()
From: MRAB
Newsgroups: comp.lang.python
Date: Tue, 8 Oct 2024 19:11 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: python@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Tue, 8 Oct 2024 20:11:40 +0100
Lines: 19
Message-ID: <mailman.9.1728414704.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<c56c0689-e248-488f-af52-e9de2a454318@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de SkuT7v7KAJdq/XgxaLZHVQIkwPAHFup40hqMS5MrK89w==
Cancel-Lock: sha1:CG0tBkJeplBqwoBy14W0a1bCK/I= sha256:/Kle2zLZCou0brsjzxtDhHbMA98TbR/dKrfhXHTXlR8=
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=nFyfPcpB;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.006
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '(which': 0.04; 'string':
0.07; 'from:addr:python': 0.09; 'received:192.168.1.64': 0.09;
'regex': 0.09; 'syntax': 0.15; 'backslash': 0.16; 'discard': 0.16;
'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16;
'literals': 0.16; 'message-id:@mrabarnett.plus.com': 0.16;
'received:plus.net': 0.16; 'subject:syntax': 0.16; 'wrote:': 0.16;
'uses': 0.19; 'to:addr:python-list': 0.20; 'lines': 0.23;
'coming': 0.27; 'header:User-Agent:1': 0.30; 'python-list': 0.32;
'received:192.168.1': 0.32; 'but': 0.32; "i'm": 0.33;
'subject:for': 0.33; 'header:In-Reply-To:1': 0.34; 'trying': 0.35;
'those': 0.36; 'received:192.168': 0.37; 'means': 0.38; 'use':
0.39; 'michael': 0.60; 'received:212': 0.62; 'once': 0.63; 'skip:r
20': 0.64; 'right': 0.68; 'through.': 0.69; 'left': 0.83
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1728414701; bh=GowGM2hjwwMafa/LMO7yHDqY+SziUxpndafD2zXvtCg=;
h=Date:Subject:To:References:From:In-Reply-To;
b=nFyfPcpBJ3+RDr/OYfsRxHs5mk5sBGZ1GN4maSVXkO0BUyzCe/08GCRlgtPRbgIZe
4lyd2/d7JI4JQ4Sys2prUXnQkqkzzJ3LM6lfW8zyz7IxxuTnsqgT8IKviMvp8cP2gd
Dpga3CkMLM6VJZsWLFy65zQrTuX7paeZpa2MgH8M8H/Do+T3bM81CHitc5RGDxYOYc
vBVLTL/4V0sAUphXQUN/Rk3OOAU++LkoMz7yLt9TkuqJ3wtpf2WChIy1etJ15nf4la
FErTnu/fb2XzAL7QoHnvtp1ebRKZRorxBAERf8FPzG7pdZO/HgLeBAXcK7GcRAXCEz
DNlnLZhdzQPBw==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=GMarEfNK c=1 sm=1 tr=0 ts=670583ed
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=ldp9Uc8XTLB9PhiD4Q0A:9 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <ve0o34$1nep4$1@dont-email.me>
X-CMAE-Envelope: MS4xfAWKoO4b9W3LiH+OQ402MkRtn1hWwvZzpZVx3SqYGhW54HNiY/adpU3Y5QfwUCklZjxpfdR6CKQRUf31M67u45W6shfYGirRuzPmJprMcncKAQUQ+n3B
BDFE4LLrZjUgCPOuEqPoGMWUT6EAGJqP4ezaF3z2VG29NZNhptDpJ1S/3h+AxGaBowQ9u5wpW+BCMr7yrsC0lr9iIDdOMMvAd04=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <c56c0689-e248-488f-af52-e9de2a454318@mrabarnett.plus.com>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
View all headers

On 2024-10-07 14:35, Michael F. Stemper via Python-list wrote:
> I'm trying to discard lines that include the string "\sout{" (which is TeX, for
> those who are curious. I have tried:
> if not re.search("\sout{", line):
> if not re.search("\sout\{", line):
> if not re.search("\\sout{", line):
> if not re.search("\\sout\{", line):
>
> But the lines with that string keep coming through. What is the right syntax to
> properly escape the backslash and the left curly bracket?
>
String literals use backslash is an escape character, so it needs to be
escaped, or you need to use a "raw" string.

However, regex also uses backslash as an escape character.

That means that a literal backslash in a regex that's in a plain string
literal needs to be doubly-escaped, once for the string literal and
again for the regex.

Subject: Re: Correct syntax for pathological re.search()
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Tue, 8 Oct 2024 19:32 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: 8 Oct 2024 19:32:04 GMT
Organization: Stefan Ram
Lines: 18
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <backslashes-20241008203026@ram.dialup.fu-berlin.de>
References: <ve0o34$1nep4$1@dont-email.me> <c56c0689-e248-488f-af52-e9de2a454318@mrabarnett.plus.com> <mailman.9.1728414704.4695.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de vnKDXBCzZ4QJSqzm20z33A1T5LSLHwdL4bxF/MidnjiHVC
Cancel-Lock: sha1:+hk7RMk6YB2hZpt8eAGR7CdhM+0= sha256:w3ibQX23dsRZNmgY7EHLCfhCeAqMZPhjnK7pa/B58vo=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

MRAB <python@mrabarnett.plus.com> wrote or quoted:
>However, regex also uses backslash as an escape character.

TeX also uses the backslash as an escape character:

\chardef \\ = '\\

, the regular expression to search exactly this:

\\chardef \\\\ = '\\\\

, and the Python string literal for that regular expression:

"\\\\chardef \\\\\\\\ = '\\\\\\\\".

. Must be a reason Markdown started to use the backtick!

Subject: Re: Correct syntax for pathological re.search()
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Tue, 8 Oct 2024 19:57 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: 8 Oct 2024 19:57:45 GMT
Organization: Stefan Ram
Lines: 37
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <repr-20241008205700@ram.dialup.fu-berlin.de>
References: <ve0o34$1nep4$1@dont-email.me> <c56c0689-e248-488f-af52-e9de2a454318@mrabarnett.plus.com> <mailman.9.1728414704.4695.python-list@python.org> <backslashes-20241008203026@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de DFHk/+W2fCXrAqTXfJxB8wUHjoxrWAjzbV14KuD0jishGd
Cancel-Lock: sha1:KR46P8/AN2MwbqBqNPSq1QCUuo8= sha256:ay8bhNBZYUAO3TmK8tRveztCt9KX41AQ1GOKnxhJCts=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
>"\\\\chardef \\\\\\\\ = '\\\\\\\\".

However, one can rewrite this as follows:

"`chardef `` = '``".replace( "`", "\\"*4 )

. One can also use "repr" to find how to represent something:

main.py

text = input( "What do you want me to represent as a literal? " )
print( repr( text ))

transcript

What do you want me to represent as a literal? \\sout\{
'\\\\sout\\{'

. We can use "escape" and "repr" to find how to represent
a regular expression for a literal text:

main.py

import re

text = input( "Want the literal of an re for what text? " )
print( repr( re.escape( text )))

transcript

Want the literal of an re for what text? \sout{
'\\\\sout\\{'

.

Subject: Re: Correct syntax for pathological re.search()
From: Karsten Hilbert
Newsgroups: comp.lang.python
Date: Tue, 8 Oct 2024 20:17 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: Karsten.Hilbert@gmx.net (Karsten Hilbert)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Tue, 8 Oct 2024 22:17:49 +0200
Lines: 23
Sender: <karsten.hilbert@gmx.net>
Message-ID: <mailman.10.1728418673.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
<1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
<ZwWTbePlZQodnPO7@hermes.hilbert.loc>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de IRSJU4wvMZ7DVFbRUSAezwvXpEJz2qBaw15rx3IGY0qg==
Cancel-Lock: sha1:b0VO8MSSE+IaY9HiUb4SV19IC2o= sha256:2usZFkYSyfvR3zcjtOfcastjXdLPBOLPQJvN/xKx0Fk=
Return-Path: <karsten.hilbert@gmx.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmx.net header.i=karsten.hilbert@gmx.net
header.b=ZZDUCBav; dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.002
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'aug': 0.07;
'received:212.227': 0.07; 'gpg': 0.09; 'karsten': 0.09; 'line:':
0.09; 'linux': 0.09; 'schrieb': 0.09; '1713': 0.16; '2024': 0.16;
'skip:> 10': 0.16; 'subject:syntax': 0.16; 'python': 0.16; 'tue,':
0.19; 'to:addr:python-list': 0.20; '>>>': 0.28; 'subject:for':
0.33; 'header:In-Reply-To:1': 0.34; 'missing': 0.37; 'something':
0.40; 'should': 0.40; 'received:212': 0.62; 'skip:d 30': 0.86
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net;
s=s31663417; t=1728418671; x=1729023471; i=karsten.hilbert@gmx.net;
bh=kT/y1T1XHE4Eam5m6TXwlzBlkxytQbd/nnpc9GlJEqU=;
h=X-UI-Sender-Class:Date:From:To:Subject:Message-ID:References:
MIME-Version:Content-Type:In-Reply-To:Content-Transfer-Encoding:
cc:content-transfer-encoding:content-type:date:from:message-id:
mime-version:reply-to:subject:to;
b=ZZDUCBav0neTa7myOCagb6NkJoyInysfMxQ1m2EryzHkkbDEAIYBSHRabEwKNARQ
d7ze4jqUUTV7Aa/9ZheuGUtCKFpDGmMS+5H9W3emghk/GB3L1ORTiLEkHTpjPlA3q
bNRP1NeHb2JunL5X7Au2a0CCPXMtO5eWNK5VhYx5bLJP8xTHwIthhMxpQn2opXMAp
k2jhQqa82WczT44L7XLROoRHMFwRVFty1KMtKQ4o8oekeskK5YncQQVEX7QkHEHSw
DQK7d7crnTKI+fsRQxk/k7XHWXnqIoAHiTk4kx3y1iyr554fc23TYeZCHhe1LA9UL
5PejX5gtVawQkQIFUA==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Content-Disposition: inline
In-Reply-To: <1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
Ma_X_il-Followup-to: d
Re_X_turn-receipt-to: Karsten.Hilbert@gmx.net
Di_X_sposition-Notification-To: Karsten.Hilbert@gmx.net
X-Confi_X_rm-Reading-To: Karsten.Hilbert@gmx.net
X-Pri_X_ority: 2 (High)
X-Provags-ID: V03:K1:JK/kTeDxpErjm78nJXWkCgfKXvhJk4KbViXftfsIJLU6IepxMGe
U60TsNozYFE64cHHoZaicd3pQDmWML/4MmBE+dG3/82EBAAmK00ChnQOzAA1bVVOcOr/UHC
stMb+HaCL4XbEv7XdzaU2AN98YoI2JpbGTrDw0j3IpzNbhstE2thahv06yu77bRDQAbXgkc
sAWIrw9hbQJVgOXrDlT2w==
X-Spam-Flag: NO
UI-OutboundReport: notjunk:1;M01:P0:Pl4gG8W60kQ=;6c0kOYpuXKcyaDGBEzBu7/hJZkW
qB73f+wOjqCtqGNTFf4iJIoPDG4OejuV985RHd8NV8U6HG8r6EJd20/c4q9ZB5LG+KsPsMw/v
et3TDp4xaKzkqqHjJwWV+Ni1o+/catSbXajhpeFeoSR3Vcy3+fpDpIby1zhvaUnucfLG8nJRN
cuendB5I/wov9rV1clSiufbl69xHWD0/grvFgI6UYxmM2GEgLi4ggrxTwWe5aXnC0gQrFw+qa
QrzX1Xnp0pvv/cQXayABiPyo1BubOJ6P0TN/6klY7ZyuYQdN0r/H2rqlj7stYvIil3pqQdyFC
aeY1wvslSWV/u80AK3drWdZIyoGsapAP5sJGpa5dfMoqK70HmBY+JExRAemcDt4+L9rljNdeF
chKdZ0M//Yn0xJk5ho8Wts1wcSVEwLYjLd1rrINp31b7a0VkXWQC44/BiQVejDZ1PWQqcXNQy
diau7p5E+qKIBzHvXYhScc5ZfF5BSSNX1k9nTmWpOMjoFlfW0pULHVAjb3Ltc+1QYJOBENAze
EKicu+c0Ot7sChGnJlmJY4xi5gNNGz2l5glDVayxGCj4wN44ITjQcIyhJd6vx0mnO5GjRvd2e
++rS2RaHmcQMVShW7PJWsaVDWE2WJs7TlCnB1HEWK9pYyTmKdrbNK5dhIGjYRTIAIuiFxEqcq
YR8zr5jmWgusKiRf+aEMGb7BxjRbrrvVkKAvvNBBcyvPs9c1VOn9AIutPhIkpIf6w/YjaEXdQ
nVERZ1RGUQhdX0QSNjUg9eQFnNfycoZKplfkq6WO3mlhWKl+dMp54e4iVlO+JciXsK5hSyUwq
n7tTYRiDBN4xObeVNYEhEbGw==
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <ZwWTbePlZQodnPO7@hermes.hilbert.loc>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
<1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
View all headers

Am Tue, Oct 08, 2024 at 08:07:04PM +0100 schrieb MRAB via Python-list:

> >unwanted_tex = '\sout{'
> >if unwanted_tex not in line: do_something_with_libreoffice()
> >
> That should be:
>
> unwanted_tex = r'\sout{'

Hm.

Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> tex = '\sout{'
>>> tex
'\\sout{'
>>>

Am I missing something ?

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

Subject: Re: Correct syntax for pathological re.search()
From: Alan Bawden
Newsgroups: comp.lang.python
Organization: ITS Preservation Society
Date: Tue, 8 Oct 2024 20:59 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!bawden.eternal-september.org!.POSTED!not-for-mail
From: alan@csail.mit.edu (Alan Bawden)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Tue, 08 Oct 2024 16:59:48 -0400
Organization: ITS Preservation Society
Lines: 19
Message-ID: <864j5mfgzf.fsf@williamsburg.bawden.org>
References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
<1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
<ZwWTbePlZQodnPO7@hermes.hilbert.loc>
<mailman.10.1728418673.4695.python-list@python.org>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Tue, 08 Oct 2024 22:59:49 +0200 (CEST)
Injection-Info: bawden.eternal-september.org; posting-host="4ae45db47ca9e6a079bedee79894c981";
logging-data="2444638"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX188rKVRF/Kzysos0oABXGD4"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)
Cancel-Lock: sha1:57lmMGQRfMVHry5xnqrmFdTpcFw=
sha1:6atn+IuTCqOgwg8qvnHaELg6iNM=
View all headers

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:

Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> tex = '\sout{'
>>> tex
'\\sout{'
>>>

Am I missing something ?

You're missing the warning it generates:

> python -E -Wonce
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> tex = '\sout{'
<stdin>:1: DeprecationWarning: invalid escape sequence '\s'
>>>

Subject: Re: Correct syntax for pathological re.search()
From: MRAB
Newsgroups: comp.lang.python
Date: Tue, 8 Oct 2024 22:10 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: python@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Tue, 8 Oct 2024 23:10:03 +0100
Lines: 30
Message-ID: <mailman.11.1728425592.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
<1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
<ZwWTbePlZQodnPO7@hermes.hilbert.loc>
<mailman.10.1728418673.4695.python-list@python.org>
<864j5mfgzf.fsf@williamsburg.bawden.org>
<3ab03165-185b-45f7-9fba-1918b83afdd8@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de cKZjYbFtfauapRr+BX0PTghY0bBbrbPx96VN4a9Arx8A==
Cancel-Lock: sha1:UdcFc/xIVrxh6JhoGo4LNiuGIKM= sha256:v5U0nxJSULisyQLTs5ZMHhcbRBBK/77PqWs6l+5TjUY=
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=Mprho9zd;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.002
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'aug': 0.07; 'string':
0.07; 'tab': 0.07; 'from:addr:python': 0.09; 'karsten': 0.09;
'linux': 0.09; 'received:192.168.1.64': 0.09; 'treated': 0.09;
'writes:': 0.09; 'backslash': 0.16;
'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16;
'hides': 0.16; 'hilbert': 0.16; 'message-id:@mrabarnett.plus.com':
0.16; 'paths': 0.16; 'received:plus.net': 0.16; 'subject:syntax':
0.16; 'wrote:': 0.16; 'python': 0.16; 'to:addr:python-list': 0.20;
'written': 0.22; '>>>': 0.28; 'error': 0.29; 'header:User-
Agent:1': 0.30; '(this': 0.32; 'python-list': 0.32;
'received:192.168.1': 0.32; 'but': 0.32; 'subject:for': 0.33;
'windows': 0.34; 'header:In-Reply-To:1': 0.34; 'invalid': 0.35;
'missing': 0.37; "it's": 0.37; 'received:192.168': 0.37; 'file':
0.38; 'alan': 0.40; 'something': 0.40; 'skip:h 10': 0.61;
'received:212': 0.62; 'lucky': 0.69; 'sequence': 0.69; 'within':
0.69; 'future': 0.72; 'invalid.': 0.84; 'warning': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1728425404; bh=xdO0SiN76lVoQ6za6VPVSGFFIXpRy9u7dVPUx4DPPOs=;
h=Date:Subject:To:References:From:In-Reply-To;
b=Mprho9zd95yBM6Yj8TSH7YtT/vAFSWGq03WRlR/4oNy+KL5iQ9uVvZsBNOV+ulX8Z
DAjE+UVqM3Uq2TWDOuxeDtw8NpSLDvvVv4JWkOP1KSNmbcT7ogG0LDYqq2MsTbgaqe
o/kOpOHDalb7dykvex7pc7UVVNQJ7LzPy6GBcHWITfQrKPrgaPfkRoEBb6dFB5taYb
hWdEVa4SZL4welz8eTYjqN3RaGUCa8XWPBygLFfZyudXud/hnZcl6gc2LI1I0i9LOB
/67u+nqhYfyeVOc8YSnpdbbmJ86FXJteN0ZEET8Qm0+fNThFAyBZP8UShs2qOr8jUD
11iREohWchFfg==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=JP6YsNKb c=1 sm=1 tr=0 ts=6705adbc
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=VVlED5B4AAAA:8 a=sXJ8zsCDC_m5LIMt_BcA:9 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <864j5mfgzf.fsf@williamsburg.bawden.org>
X-CMAE-Envelope: MS4xfN1QcvDUAIKIxDwi//frsc+XUFgp0mXNOcZXt4OpOvY0rWYcWlE6zNV8SPc27qtYfLW/H5NT+wdm1Wx3Nm+FfUDs3d2lpQxCcCfk5bGexebb+wXZ1RPh
fSqStyz+MmZKm5TYWvoZBtybiM7rJ9nY16LSyHbD8ws0WxWIidy49upFp1BrarX5AQRvs0ChOVSPssZi+Ak43NwsW+E1ubGI85I=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <3ab03165-185b-45f7-9fba-1918b83afdd8@mrabarnett.plus.com>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
<1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
<ZwWTbePlZQodnPO7@hermes.hilbert.loc>
<mailman.10.1728418673.4695.python-list@python.org>
<864j5mfgzf.fsf@williamsburg.bawden.org>
View all headers

On 2024-10-08 21:59, Alan Bawden via Python-list wrote:
> Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
>
> Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> tex = '\sout{'
> >>> tex
> '\\sout{'
> >>>
>
> Am I missing something ?
>
> You're missing the warning it generates:
>
> > python -E -Wonce
> Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> tex = '\sout{'
> <stdin>:1: DeprecationWarning: invalid escape sequence '\s'
> >>>

You got lucky that \s in invalid. If it had been \t you would've got a
tab character.

Historically, Python treated invalid escape sequences as literals, but
it's deprecated now and will become an outright error in the future
(probably) because it often hides a mistake, such as the aforementioned
\t being treated as a tab character when the user expected it to be a
literal backslash followed by letter t. (This can occur within Windows
file paths written in plain string literals.)

Subject: Re: Correct syntax for re.search() (Posting On Python-List Prohibited)
From: Lawrence D'Oliv
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Tue, 8 Oct 2024 23:15 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for re.search() (Posting On Python-List
Prohibited)
Date: Tue, 8 Oct 2024 23:15:23 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 8
Message-ID: <ve4eeb$2blnu$5@dont-email.me>
References: <ve0o34$1nep4$1@dont-email.me>
<backslashes-20241007145600@ram.dialup.fu-berlin.de>
<ve0qct$1o839$1@dont-email.me>
<backslash-20241007152827@ram.dialup.fu-berlin.de>
<m2cyka8ox5.fsf@cochabamba.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 09 Oct 2024 01:15:23 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="38bd9beb7663d91d20c567a30615575e";
logging-data="2479870"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+xxGrlVZF0lEHCEkGc5PUq"
User-Agent: Pan/0.160 (Toresk; )
Cancel-Lock: sha1:7JUMx9YRFF1FpWoqDq0sq6kbfTA=
View all headers

On Tue, 08 Oct 2024 19:50:14 +0200, Pieter van Oostrum wrote:

> You could even omit the '+'. Then the concatenation is done at parsing
> time instead of run time.

Surprising how few people know about this. It’s a feature copied from C,
and present in C++. But for some reason Java, JavaScript and PHP dropped
it. Python is one of the few other languages that has it.

Subject: Re: Correct syntax for pathological re.search()
From: Karsten Hilbert
Newsgroups: comp.lang.python
Date: Wed, 9 Oct 2024 18:06 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: Karsten.Hilbert@gmx.net (Karsten Hilbert)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Wed, 9 Oct 2024 20:06:10 +0200
Lines: 25
Sender: <karsten.hilbert@gmx.net>
Message-ID: <mailman.13.1728497173.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
<1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
<ZwWTbePlZQodnPO7@hermes.hilbert.loc>
<mailman.10.1728418673.4695.python-list@python.org>
<864j5mfgzf.fsf@williamsburg.bawden.org>
<ZwbGEj0U5kmrE7ys@hermes.hilbert.loc>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de 8s2TUGuZGw9B7fG/ZLbt0Ap2C4bicEo5ZF0DYInvasLg==
Cancel-Lock: sha1:9Mf271WAikyyc2kzrOorUbn9EAY= sha256:ZnTyDhpDGCiwPuVBAvP01sDb+XMXYYN91qCQvMAQrAg=
Return-Path: <karsten.hilbert@gmx.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmx.net header.i=karsten.hilbert@gmx.net
header.b=EbUUhlIQ; dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.002
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'aug': 0.07;
'received:212.227': 0.07; 'gpg': 0.09; 'karsten': 0.09; 'linux':
0.09; 'schrieb': 0.09; 'writes:': 0.09; '1713': 0.16; '2024':
0.16; 'hilbert': 0.16; "it'd": 0.16; 'subject:syntax': 0.16;
'python': 0.16; 'tue,': 0.19; 'to:addr:python-list': 0.20; '>>>':
0.28; 'subject:for': 0.33; 'header:In-Reply-To:1': 0.34;
'invalid': 0.35; 'missing': 0.37; 'alan': 0.40; 'something': 0.40;
'received:212': 0.62; 'sequence': 0.69; 'warning': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net;
s=s31663417; t=1728497171; x=1729101971; i=karsten.hilbert@gmx.net;
bh=Sf21+pH8tVMtbiA8GvqfQ29WSdPcNdEGMxjdd6i/0ls=;
h=X-UI-Sender-Class:Date:From:To:Subject:Message-ID:References:
MIME-Version:Content-Type:In-Reply-To:Content-Transfer-Encoding:
cc:content-transfer-encoding:content-type:date:from:message-id:
mime-version:reply-to:subject:to;
b=EbUUhlIQYBulEpiqw/EO1KdMEYBhUByjOmkvlUTI8dcpyWfPd3YzvnoBKTdw9Jm+
bW52IL572dJ11ekGV5Zkpl/BCRDBwj8JUTD1O6D3F2GDx0QctCDQScvCi+pJtrPGi
IduXQRrP4Gt7r2DK6+151Vckns++8l56XBzpR/o+j5YExt8dcTQ2gGGglXBNB2mK1
msDHsHC50peBNAwtsii688JGUFEovDdfqG+x2xlskOB5h0O+7/Jxpr8htTv1OLeI+
zQJ4mMlGvks2RMowASIt/5zmtO23DrI4X6f4tyffU9uV098HDxy4MRW9Vm4TRdO2U
bKIQEg+zt+qdrgYZTw==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Content-Disposition: inline
In-Reply-To: <864j5mfgzf.fsf@williamsburg.bawden.org>
Ma_X_il-Followup-to: d
Re_X_turn-receipt-to: Karsten.Hilbert@gmx.net
Di_X_sposition-Notification-To: Karsten.Hilbert@gmx.net
X-Confi_X_rm-Reading-To: Karsten.Hilbert@gmx.net
X-Pri_X_ority: 2 (High)
X-Provags-ID: V03:K1:1ZyVbuQqE1EVXBtxc3Y4mkW/h70qibE1lXtk0UQtS61P11K/JFO
vSczTmNQR5KzJeAb90/iy2Edb0VKDBGA2p1WCIUdSeHG8GqjY+BqUnyWmNxihm/n4PTgHm7
+pE8OZBPPyvAsR70+eHGwkN+ftB2Q9JVMK8uKJTw0rCCyLg0Q56WyKZyl2aGyxxwZwOp0uC
oJFpFyIj414LlGXSzuphw==
X-Spam-Flag: NO
UI-OutboundReport: notjunk:1;M01:P0:BaVnpJAIdGA=;7vovOXxi4xNLZGSj5g8nQqYAWlG
5niyfAytMeI8E8VLvODJLRjKYqA1/93HLq7CkO2OM+gpDP5hro1Sb0FqDkycBu2E0g3rwbKe/
+mp7E/H8QqHCLLh8NHveUgbh3GM92C3n7RBws38Tv1K/yP85RSOsYDz6ayvZkEhMVaSvfWfsn
IOf6sS0kTihys2QVke7vW6UCQUq87vz1zQZRfZajac4fkqGxo/y4payi4fRl4YTqbXSHvtsUu
xq6GZRGTVReWf2Wgc2TGMbUB03pP5FdJI4TatI4Stvlr3Ozykg6s881ABaWo2m69xMqLPjbCD
GKwK/wCGnbhYXDnlXrYKLpGB8TqfQX9muNZVY5Ooibzx659DJyqgd26B+at9C9dpONzXiBfqx
+reGFfefVHClAP62g+A7qZia1uxXwH0FbtkAJ5CYu9/rUxhmF8i/TMMR/Smy77OlgbPji/XAD
rL167p4OV3LlUV1BSfjBSaNXFHWFJ8lRid9yMCvnQGvw2H6xDGC4l758+3aYnNT1IRLFVY/A+
+hKb2ly41/l8/N92t7dEdCV+eTZDFsdQaI91mhzFCE6Ccgf2NbF/EkIp0MPCAr2K9TgycFeDw
SonBRGGfcv8a8VPqI8XVbmqcIAULeNNwOQ0IxgRCuF2mDE+enH+gdECGEmbCA7Z/to1Wjf4FW
/qkix5+FqxxbwnrKEFhKMWUo20w8kNLDTQ4xx7RWRK/kS0UGvoh5z4L6HYwBdqF58SNRsVyp8
6GpjItVBoftW1teDCzs+UCBMBSkgDSdCXXKS9a+ct0yJuErI1+yHSeV5XcGNbR06cu0WxoeJL
yOPT2hL/SBQcz2wg0Yjw0Xew==
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <ZwbGEj0U5kmrE7ys@hermes.hilbert.loc>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<ZwV6SuYEdcf42RFA@hermes.hilbert.loc>
<1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com>
<ZwWTbePlZQodnPO7@hermes.hilbert.loc>
<mailman.10.1728418673.4695.python-list@python.org>
<864j5mfgzf.fsf@williamsburg.bawden.org>
View all headers

Am Tue, Oct 08, 2024 at 04:59:48PM -0400 schrieb Alan Bawden via Python-list:

> Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
>
> Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> tex = '\sout{'
> >>> tex
> '\\sout{'
> >>>
>
> Am I missing something ?
>
> You're missing the warning it generates:
>
> <stdin>:1: DeprecationWarning: invalid escape sequence '\s'

I knew it'd be good to ask :-D

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

Subject: Re: Correct syntax for pathological re.search()
From: Gilmeh Serda
Newsgroups: comp.lang.python
Organization: Easynews - www.easynews.com
Date: Fri, 11 Oct 2024 14:43 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.mixmin.net!news.neodome.net!feeder2.feed.ams11.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx04.ams4.POSTED!not-for-mail
From: gilmeh.serda@nothing.here.invalid (Gilmeh Serda)
Subject: Re: Correct syntax for pathological re.search()
Newsgroups: comp.lang.python
References: <ve0o34$1nep4$1@dont-email.me>
MIME-Version: 1.0
x-no-archive: yes
User-Agent: Pan/0.160 (Toresk; 9d04e24)
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Lines: 41
Message-ID: <MQaOO.3313338$EVn.2054758@fx04.ams4>
X-Complaints-To: abuse@easynews.com
Organization: Easynews - www.easynews.com
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Fri, 11 Oct 2024 14:43:56 GMT
X-Received-Bytes: 2343
View all headers

On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:

> I'm trying to discard lines that include the string "\sout{" (which is
> TeX, for those who are curious. I have tried:
> if not re.search("\sout{", line): if not re.search("\sout\{", line):
> if not re.search("\\sout{", line): if not re.search("\\sout\{",
> line):
>
> But the lines with that string keep coming through. What is the right
> syntax to properly escape the backslash and the left curly bracket?

$ python
Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = r"testing \sout{WHADDEVVA}"
>>> re.search(r"\\sout{", s)
<re.Match object; span=(8, 14), match='\\sout{'>

You want a literal backslash, hence, you need to escape everything.

It is not enough to escape the "\s" as "\\s", because that only takes care
of Python's demands for escaping "\". You also need to escape the "\" for
the RegEx as well, or it will read it like it means "\s", which is the
RegEx for a space character and therefore your search doesn't match,
because it reads it like you want to search for " out{".

Therefore, you need to escape it either as per my example, or by using
four "\" and no "r" in front of the first quote, which also works:

>>> re.search("\\\\sout{", s)
<re.Match object; span=(8, 14), match='\\sout{'>

You don't need to escape the curly braces. We call them "seagull wings"
where I live.

--
Gilmeh

Sometimes I simply feel that the whole world is a cigarette and I'm the
only ashtray.

Subject: RE: Correct syntax for pathological re.search()
From: <avi.e.gross@gmail.com>
Newsgroups: comp.lang.python
Date: Fri, 11 Oct 2024 21:13 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: <avi.e.gross@gmail.com>
Newsgroups: comp.lang.python
Subject: RE: Correct syntax for pathological re.search()
Date: Fri, 11 Oct 2024 17:13:07 -0400
Lines: 62
Message-ID: <mailman.19.1728681189.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de K6qnwbbOCDV28+GpO/CFWQpA5NwIb6YfNrLsDyCdSJiw==
Cancel-Lock: sha1:WJ/bfFhnOHz9ugDpVZxJndPIg64= sha256:LpHiqJ3XV9+pDKAbGnAh6E5K9iVmJO9nIKibBUN7Ldw=
Return-Path: <avi.e.gross@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=jSwz1yXA;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.008
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; '(which': 0.04; "python's":
0.05; 'demands': 0.07; 'string': 0.07; 'expression': 0.09;
'linux': 0.09; 'obviously,': 0.09; 'regex': 0.09; 'skip:\\ 10':
0.09; 'utility': 0.09; 'import': 0.15; 'url:mailman': 0.15;
'syntax': 0.15; '2024': 0.16; 'backslash': 0.16; 'cases,': 0.16;
'discard': 0.16; 'layers': 0.16; 'subject:syntax': 0.16; 'wrote:':
0.16; 'python': 0.16; 'october': 0.17; 'message-id:@gmail.com':
0.18; 'to:addr:python-list': 0.20; 'lines': 0.23; 'skip:- 10':
0.25; 'url-ip:188.166.95.178/32': 0.25; 'url-ip:188.166.95/24':
0.25; 'url:listinfo': 0.25; 'url-ip:188.166/16': 0.25; 'space':
0.26; '11,': 0.26; 'friday,': 0.26; 'coming': 0.27; 'function':
0.27; '>>>': 0.28; 'example,': 0.28; 'whole': 0.30; 'takes': 0.31;
'url-ip:188/8': 0.31; "doesn't": 0.32; 'python-list': 0.32; 'sep':
0.32; 'but': 0.32; "i'm": 0.33; 'subject:for': 0.33; 'there':
0.33; 'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34;
'trying': 0.35; 'from:addr:gmail.com': 0.35; 'mon,': 0.36;
'those': 0.36; 'using': 0.37; 'means': 0.38; 'read': 0.38;
'enough': 0.39; 'received:100': 0.39; 'want': 0.40; 'four': 0.60;
'michael': 0.60; 'search': 0.61; 'from:': 0.62; 'to:': 0.62;
'simply': 0.63; 'feel': 0.63; 'skip:r 20': 0.64; 're:': 0.64;
'your': 0.64; 'look': 0.65; 'per': 0.68; 'right': 0.68;
'through.': 0.69; 'front': 0.70; 'care': 0.71; 'life': 0.77;
'sent:': 0.78; 'left': 0.83; 'live.': 0.84; 'hence,': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1728681187; x=1729285987; darn=python.org;
h=thread-index:content-language:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:from:to:cc:subject:date:message-id:reply-to;
bh=gg9Mur6L5TwFRmnTwJ/9pf4TTmwevjuTeKndcJzoRxI=;
b=jSwz1yXAfqEDet2yiQoeMcYF5sps6U3DOuPi9xwDua4/y2wRNzchATqqVHozMxLqAr
UulN/M4xUiMjAzS3BOm40/SvjfujCChRoJaaMCY5p+NkUGubEDdG66UgmbGAxh5qD0lD
G9SLuraCDTy+lMrDf+4oh7U3598aYAf3TuPvuidVK6vz1ga9QFgtlWTOKxx0lL55/2wN
YjXphZbnOb+GONr2zAb3XJbcYTBO7DneveuMiEFDvpX/iNlk5oerGdYu1h06QJ5tmhkV
lyMiSdZuT9pjdVHamAd6reeQNi7SByT6+EpLnVXOmGxz+nPlo6xyX/ASrMwcqOyLGRUC
F8+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1728681187; x=1729285987;
h=thread-index:content-language:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=gg9Mur6L5TwFRmnTwJ/9pf4TTmwevjuTeKndcJzoRxI=;
b=T26ddGmr7cMfPrs4w5seXAr60J5Tadf9z2O4eHhoBfg9IVZjfMIyNqBRK1JZL/XA39
Zb9khalbengu7QxK+naZkw67ExD/HDLqlI18K83CXFJceYIm0d4Iy5QrVh+BnottSrrq
3eJoDeB9mvDpFnCMngTnzTpd0JesNm4+b7JgaOg9/p03dKToumZvK7wp0bROO4EXC/MY
+hjXW/Qf/Ga58SztNSMHi5EIGhnewI08ZrGaiV6HX6A3dy+a4z0WeBbgLSafkwed6pcc
4vRHRHRLUPkROCC9i5cpAFC4g6UtzY3NxmhTzJEnSiKqBMS7Bc+Ce7TjSciGNvgCb5UK
gjFA==
X-Forwarded-Encrypted: i=1;
AJvYcCX1j8ORf7wdt+57u9Z14WAADlDzZ8gh25oEjQDz2jqJzo9Vnllp4Fy1ZMTY/b/9hupVnVf+kbQGtoZj8A==@python.org
X-Gm-Message-State: AOJu0YySVyQflEBR/3tFNzsc4DrXl9YM8eBBrWizDHW3XVLuUuWJqC9U
C2M9V7Qy8snpKIMJ96Q/bKCx3DRFMvYF808YgRvMzI5EV9+yhZmVKje52g==
X-Google-Smtp-Source: AGHT+IFaNPu3nz87JAKkxb/9HceiYqtzmlmvW9LC68ZnLFKEAG1XyHOzwOeMc+jS2o4ujoOP0CZIMQ==
X-Received: by 2002:a05:620a:2953:b0:7a9:b605:f823 with SMTP id
af79cd13be357-7b11a370e27mr613887285a.37.1728681186680;
Fri, 11 Oct 2024 14:13:06 -0700 (PDT)
In-Reply-To: <MQaOO.3313338$EVn.2054758@fx04.ams4>
X-Mailer: Microsoft Outlook 16.0
Content-Language: en-us
Thread-Index: AQFBUE6EHplIDkSetD50ItlJZJ+fJAGoI+1+s6hiaEA=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
View all headers

Is there some utility function out there that can be called to show what the
regular expression you typed in will look like by the time it is ready to be
used?

Obviously, life is not that simple as it can go through multiple layers with
each dealing with a layer of backslashes.

But for simple cases, ...

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
Behalf Of Gilmeh Serda via Python-list
Sent: Friday, October 11, 2024 10:44 AM
To: python-list@python.org
Subject: Re: Correct syntax for pathological re.search()

On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:

> I'm trying to discard lines that include the string "\sout{" (which is
> TeX, for those who are curious. I have tried:
> if not re.search("\sout{", line): if not re.search("\sout\{", line):
> if not re.search("\\sout{", line): if not re.search("\\sout\{",
> line):
>
> But the lines with that string keep coming through. What is the right
> syntax to properly escape the backslash and the left curly bracket?

$ python
Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = r"testing \sout{WHADDEVVA}"
>>> re.search(r"\\sout{", s)
<re.Match object; span=(8, 14), match='\\sout{'>

You want a literal backslash, hence, you need to escape everything.

It is not enough to escape the "\s" as "\\s", because that only takes care
of Python's demands for escaping "\". You also need to escape the "\" for
the RegEx as well, or it will read it like it means "\s", which is the
RegEx for a space character and therefore your search doesn't match,
because it reads it like you want to search for " out{".

Therefore, you need to escape it either as per my example, or by using
four "\" and no "r" in front of the first quote, which also works:

>>> re.search("\\\\sout{", s)
<re.Match object; span=(8, 14), match='\\sout{'>

You don't need to escape the curly braces. We call them "seagull wings"
where I live.

--
Gilmeh

Sometimes I simply feel that the whole world is a cigarette and I'm the
only ashtray.
--
https://mail.python.org/mailman/listinfo/python-list

Subject: Re: Correct syntax for pathological re.search()
From: MRAB
Newsgroups: comp.lang.python
Date: Sat, 12 Oct 2024 00:37 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: python@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Sat, 12 Oct 2024 01:37:55 +0100
Lines: 57
Message-ID: <mailman.20.1728693664.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de VmbvvUHWNaNU0EC8gx/wVQS4nXktgrfAvpUlSRFiTRvQ==
Cancel-Lock: sha1:ejfm9cSqdc6/7inCseMomsJqZUg= sha256:B5ZVcGCGFFX5KO6hW+wIPPlw7tU0qBugGGKPeIcIOjQ=
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=l5g/DMQ6;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04; "python's":
0.05; 'demands': 0.07; 'string': 0.07; ':-)': 0.09; 'expression':
0.09; 'from:addr:python': 0.09; 'linux': 0.09; 'obviously,': 0.09;
'received:192.168.1.64': 0.09; 'regex': 0.09; 'skip:\\ 10': 0.09;
'utility': 0.09; 'yes.': 0.09; 'import': 0.15; 'syntax': 0.15;
'2024': 0.16; '>>>>': 0.16; 'avi': 0.16; 'backslash': 0.16;
'cases,': 0.16; 'discard': 0.16; 'from:addr:mrabarnett.plus.com':
0.16; 'from:name:mrab': 0.16; 'gross': 0.16; 'layers': 0.16;
'message-id:@mrabarnett.plus.com': 0.16; 'received:plus.net':
0.16; 'subject:syntax': 0.16; 'wrote:': 0.16; 'python': 0.16;
'october': 0.17; 'to:addr:python-list': 0.20; 'lines': 0.23;
'skip:- 10': 0.25; 'space': 0.26; '11,': 0.26; 'friday,': 0.26;
'coming': 0.27; 'function': 0.27; 'example,': 0.28; 'header:User-
Agent:1': 0.30; 'takes': 0.31; "doesn't": 0.32; 'python-list':
0.32; 'sep': 0.32; 'received:192.168.1': 0.32; 'but': 0.32; "i'm":
0.33; 'subject:for': 0.33; 'there': 0.33; 'header:In-Reply-To:1':
0.34; 'trying': 0.35; 'mon,': 0.36; 'those': 0.36; 'using': 0.37;
"it's": 0.37; 'received:192.168': 0.37; 'means': 0.38; 'read':
0.38; 'enough': 0.39; 'want': 0.40; 'four': 0.60; 'michael': 0.60;
'search': 0.61; 'from:': 0.62; 'to:': 0.62; 'received:212': 0.62;
'skip:r 20': 0.64; 're:': 0.64; 'your': 0.64; 'look': 0.65; 'per':
0.68; 'right': 0.68; 'through.': 0.69; 'front': 0.70; 'care':
0.71; 'life': 0.77; 'sent:': 0.78; 'left': 0.83; 'live.': 0.84;
'hence,': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1728693476; bh=/DvuPOCtRSmaSYPYMJgiAvrEZLw4BKH4080EnTu9uC4=;
h=Date:Subject:To:References:From:In-Reply-To;
b=l5g/DMQ6YGIblHFD4l10k9MzoIu0HMfRcuHXb9SvQxBfPae22FX0HEXj0RTKGGT+K
cKZYRzgtD3smYJ+beI8C9ma7nrStg0mDV+P+HbHT4MmNQcoZx1rOySGwPZm50v+5gH
hOkB78UswqCRxUgq4Yo3vYevTUsKKIuKObGauc90vyMuHjOetbsaSBV0Bf/XbAN9Y2
69krQtBTI4W7dIJoXqqWgd/YcLfXZezwIxvn8BmGNc03Xqzd+rz+X3NIGx7VTRyzmz
nWDpROsKWCl95fdOMlIZXj+Ei/uXuMau2owOzKsVIu3BE9FpOfHSSkZ3U8j/L31NPQ
3pZzsP7ekhk3Q==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=VaJUP0p9 c=1 sm=1 tr=0 ts=6709c4e4
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=8AHkEIZyAAAA:8 a=s8p6k_RvTeRuHz-KOWoA:9 a=QEXdDO2ut3YA:10
a=Ju_KwTHo8jjgFOKK0VMC:22
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
X-CMAE-Envelope: MS4xfKGe54paxfsDsKiW3O8yZZVuIgS2o+JmZrdnXa7ycoa/5LONd4/xoSrBTAcIWhFYyfORPyFNyjfOZMhb3JrVooSYIu3ZMUaxEZv/wpnQ5i/DpDdMYa1j
8vQGFcHEpACXS7m1xmE/dLc3InlO/283bJ3qNKZuPFeDq5A/tuzjVo8NO8aDJb2oGns8Kx18VcaVzLXPSIGdUOvrsiFpZSzcC8g=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
View all headers

On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
> Is there some utility function out there that can be called to show what the
> regular expression you typed in will look like by the time it is ready to be
> used?
>
> Obviously, life is not that simple as it can go through multiple layers with
> each dealing with a layer of backslashes.
>
> But for simple cases, ...
>
Yes. It's called 'print'. :-)
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
> Behalf Of Gilmeh Serda via Python-list
> Sent: Friday, October 11, 2024 10:44 AM
> To: python-list@python.org
> Subject: Re: Correct syntax for pathological re.search()
>
> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
>
>> I'm trying to discard lines that include the string "\sout{" (which is
>> TeX, for those who are curious. I have tried:
>> if not re.search("\sout{", line): if not re.search("\sout\{", line):
>> if not re.search("\\sout{", line): if not re.search("\\sout\{",
>> line):
>>
>> But the lines with that string keep coming through. What is the right
>> syntax to properly escape the backslash and the left curly bracket?
>
> $ python
> Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import re
>>>> s = r"testing \sout{WHADDEVVA}"
>>>> re.search(r"\\sout{", s)
> <re.Match object; span=(8, 14), match='\\sout{'>
>
> You want a literal backslash, hence, you need to escape everything.
>
> It is not enough to escape the "\s" as "\\s", because that only takes care
> of Python's demands for escaping "\". You also need to escape the "\" for
> the RegEx as well, or it will read it like it means "\s", which is the
> RegEx for a space character and therefore your search doesn't match,
> because it reads it like you want to search for " out{".
>
> Therefore, you need to escape it either as per my example, or by using
> four "\" and no "r" in front of the first quote, which also works:
>
>>>> re.search("\\\\sout{", s)
> <re.Match object; span=(8, 14), match='\\sout{'>
>
> You don't need to escape the curly braces. We call them "seagull wings"
> where I live.
>

Subject: Re: Correct syntax for pathological re.search()
From: Peter J. Holzer
Newsgroups: comp.lang.python
Date: Sat, 12 Oct 2024 10:59 UTC
References: 1 2 3 4
Attachments: signature.asc (application/pgp-signature)
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: hjp-python@hjp.at (Peter J. Holzer)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Sat, 12 Oct 2024 12:59:58 +0200
Lines: 63
Message-ID: <mailman.21.1728731108.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<20241012105958.cbctekv7vustleha@hjp.at>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
protocol="application/pgp-signature"; boundary="j5p3p5bfujs6sx6l"
X-Trace: news.uni-berlin.de j+4BY6MlyD3imCg8V5wSOgWw4pi0iWhZ/xMUPN2JQf8A==
Cancel-Lock: sha1:8GccUE/90aPMMCNL3FS7f6cPMGU= sha256:tydMFRo8CHhcglb/tdxgORZWdkddOutgmRzuK2OXvyU=
Return-Path: <hjp-python@hjp.at>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'content-
type:multipart/signed': 0.05; 'debugging': 0.07; 'content-
type:application/pgp-signature': 0.09; 'expression': 0.09;
'filename:fname piece:asc': 0.09; 'filename:fname
piece:signature': 0.09; 'filename:fname:signature.asc': 0.09;
'prints': 0.09; 'string,': 0.09; 'user.': 0.09; 'utility': 0.09;
'that.': 0.15; '"creative': 0.16; '__/': 0.16; 'avi': 0.16;
'challenge!"': 0.16; 'compiled': 0.16; 'expressions.': 0.16;
'from:addr:hjp-python': 0.16; 'from:addr:hjp.at': 0.16;
'from:name:peter j. holzer': 0.16; 'gross': 0.16; 'hjp@hjp.at':
0.16; 'holzer': 0.16; 'machine,': 0.16; 'reality.': 0.16;
'stross,': 0.16; 'subject:syntax': 0.16; 'url-ip:212.17.106/24':
0.16; 'url-ip:212.17/16': 0.16; 'url:hjp': 0.16; 'visualize':
0.16; '|_|_)': 0.16; 'wrote:': 0.16; 'probably': 0.17; 'to:addr
:python-list': 0.20; 'input': 0.21; 'anything': 0.25; "isn't":
0.27; 'function': 0.27; 'sense': 0.28; 'seem': 0.31; 'looked':
0.31; 'module': 0.31; "doesn't": 0.32; 'assume': 0.32; 'python-
list': 0.32; 'but': 0.32; 'subject:for': 0.33; 'there': 0.33;
'mean': 0.34; 'header:In-Reply-To:1': 0.34; 'those': 0.36; "it's":
0.37; 'way': 0.38; 'could': 0.38; 'use': 0.39; 'table': 0.39;
'received:212': 0.62; 'skip:r 20': 0.64; 'clear': 0.64; 'produce':
0.65; 'look': 0.65; 'received:userid': 0.66; 'url-ip:212/8': 0.69;
'site': 0.70; 'received:at': 0.84; 'transitions': 0.84;
'websites': 0.95
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <20241012105958.cbctekv7vustleha@hjp.at>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
View all headers

On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote:
> Is there some utility function out there that can be called to show what the
> regular expression you typed in will look like by the time it is ready to be
> used?

I assume that by "ready to be used" you mean the compiled form?

No, there doesn't seem to be a way to dump that. You can

p = re.compile("\\\\sout{")
print(p.pattern)

but that just prints the input string, which you could do without
compiling it first.

But - without having looked at the implementation - it's far from clear
that the compiled form would be useful to the user. It's probably some
kind of state machine, and a large table of state transitions isn't very
readable.

There are a number of websites which visualize regular expressions.
Those are probably better for debugging a regular expression than
anything the re module could reasonably produce (although with the
caveat that such a web site would use a different implementation and
therefore might produce different results).

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Attachments: signature.asc (application/pgp-signature)
Subject: Re: Correct syntax for pathological re.search()
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Sat, 12 Oct 2024 11:59 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: 12 Oct 2024 11:59:52 GMT
Organization: Stefan Ram
Lines: 56
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <compiled-20241012123950@ram.dialup.fu-berlin.de>
References: <ve0o34$1nep4$1@dont-email.me> <MQaOO.3313338$EVn.2054758@fx04.ams4> <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <20241012105958.cbctekv7vustleha@hjp.at> <mailman.21.1728731108.4695.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de 2woblMt0gMBi2/RO8jCQJweCcp9zOeo5dLflcsR8frlwti
Cancel-Lock: sha1:vvIoxHjagq+1WZ0dKHEv6f02nu8= sha256:KfMDoNPFNQNRgO2/WLDxxex4imtnxl/06a9+RAf/088=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

"Peter J. Holzer" <hjp-python@hjp.at> wrote or quoted:
>But - without having looked at the implementation - it's far from clear
>that the compiled form would be useful to the user.

So, what he might be getting at with "compiled form" is a
representation that's easy on the eyes for us mere mortals.

You could, for instance, use colors to show the difference between
object and meta characters. In that case, the regex "\**" would
come out as "**", but the first "*" might be navy blue (on a white
background), so just your run-of-the-mill object character, while
the second one would be burgundy, flagging it as a meta character.

So, simplified, that would be something like:

import re
import tkinter as tk
import time

def tokenize_regex( pattern ):
tokens = []
i = 0
while i < len( pattern ):
if pattern[ i ] == '\\':
if i + 1 < len( pattern ):
tokens.append( ( 'escaped', pattern[ i+1: i+2 ]))
i += 2
else:
tokens.append( ('error', 'Incomplete escape sequence' ))
i += 1
elif pattern[i] == '*':
tokens.append( ( 'repetition', '*' ))
i += 1
else:
tokens.append( ( 'plain', pattern[ i ]))
i += 1

return tokens

root = tk.Tk()
root.configure( bg='white' )

regex = r'\**'
result = tokenize_regex( regex )

for token_type, token_value in result:
if token_type == 'plain' or token_type == 'escaped':
tk.Label( root, text=token_value, font=( 'Arial', 40 ), fg='#4070FF', bg='white' ).pack( side='left' )
elif token_type == 'repetition':
tk.Label( root, text=token_value, font=( 'Arial', 40 ), fg='#C02000', bg='white' ).pack( side='left' )

root.mainloop()

.

Subject: Re: Correct syntax for pathological re.search()
From: Thomas Passin
Newsgroups: comp.lang.python
Date: Sat, 12 Oct 2024 12:51 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: list1@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Sat, 12 Oct 2024 08:51:57 -0400
Lines: 36
Message-ID: <mailman.22.1728737523.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<20241012105958.cbctekv7vustleha@hjp.at>
<966b510d-9bd7-4472-a858-7e042d78461d@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de 3x6ilcxZmjUxqQAXABIwfQ2unQM89aY8yCGmgqIM4uLw==
Cancel-Lock: sha1:k/Y1eFcnHpf0LFqbEKnQYumck58= sha256:aqkwdXVuYQcD1BpJuGlRrDkvCXhtbivbZ9nFzvWaOQU=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=wucX86Vk;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.002
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'debugging': 0.07;
'string': 0.07; 'expression': 0.09; 'prints': 0.09; 'string,':
0.09; 'user.': 0.09; 'utility': 0.09; 'that.': 0.15; 'avi': 0.16;
'compiled': 0.16; 'expressions.': 0.16; 'gross': 0.16; 'holzer':
0.16; 'machine,': 0.16; 'received:10.0.0': 0.16; 'received:64.90':
0.16; 'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'something.': 0.16;
'subject:syntax': 0.16; 'visualize': 0.16; 'wrote:': 0.16;
'probably': 0.17; 'to:addr:python-list': 0.20; 'input': 0.21;
'anything': 0.25; "isn't": 0.27; 'function': 0.27; 'header:User-
Agent:1': 0.30; 'seem': 0.31; 'am,': 0.31; 'looked': 0.31;
'module': 0.31; "doesn't": 0.32; 'assume': 0.32; 'python-list':
0.32; 'received:10.0': 0.32; 'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'but': 0.32;
'subject:for': 0.33; 'there': 0.33; 'mean': 0.34; 'header:In-
Reply-To:1': 0.34; 'display': 0.36; 'those': 0.36; "it's": 0.37;
'way': 0.38; 'could': 0.38; 'use': 0.39; 'received:100': 0.39;
'table': 0.39; 'skip:r 20': 0.64; 'clear': 0.64; 'produce': 0.65;
'look': 0.65; 'header:Received:6': 0.67; 'received:64': 0.67;
'site': 0.70; 'transitions': 0.84; 'websites': 0.95
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1728737517; a=rsa-sha256;
cv=none;
b=JAlVajETMqdTxOdQGFqHtLPNPxmsLc4rr31MwC9Tsck+gm+ZwLx0wzNvE5l6Nc1UyC+kl6
mInG+3U/spFvINw9MsbL6UOMz/S5XC/xEdpbSPJT5ZoOTeADBzeqq7qm8DKP+/5N1MKiXP
z681wTcKO8gEMxiyIlvQvoNdU9BiRWCcTz79NV22ytiDHyqCKezsALK7+wCCuOlsLw/ZB7
Oh4udbPKEyeYifzJ9i6Nx6o+xqbxuVRNQjL8rs+s6DPVU1Fz13rRX79s4ILXTTckJicFZu
/cykqPwoG3pxKCnPqWFqEh7TB54qBAwZKhaOwTJYuPNwaxSNmusmgiFA5TAJTw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1728737517;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=MBwKbbVA/QwFSNxV09J8J3rc+RPaJXdCmQSLLNvQrIk=;
b=1drTpMSG71Eu/r2MtpfFGniefQ+9uYdVT1OJblUN99bhVwuoBAUrhKRcvAU6HL/6vF1WtD
5PDpCpyfWmgdi/HcD9hIi43mTAZjShOfqgxW04Rr3mTuzWviS10fVAdc2igfOlaC+a6ne2
ldhS04UQiFoDoNvVeIiN4qQ5B+zY9YYCjfbKVIOUO21UK2EM4HRAKjD4V4AQ9rOHje9fB9
U60MtIaJCkss9sMgV0FnEFsxbn4C5Z0GatAggSM1ZzW5m9KxIWXA8h4kEgBmwuNVhWiufT
/QouyNAd2NBJrRVFqXA1X8qjCDOLe2hE2g4kbS/t4z2fBqnhQq9jZ0sQ2txZ6Q==
ARC-Authentication-Results: i=1; rspamd-6b8cf4b767-jr9ql;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Shade-Whispering: 7dd0a2ad3028fa42_1728737518010_2991154234
X-MC-Loop-Signature: 1728737518010:1093906956
X-MC-Ingress-Time: 1728737518010
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1728737517;
bh=MBwKbbVA/QwFSNxV09J8J3rc+RPaJXdCmQSLLNvQrIk=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=wucX86VkRR+1alxFI5Ebyq/rfy4spsWcbSYRroEnkUSAoAWmqUFBnoo3QdAlnuyVx
tHj+1w1ys1xXuSaxMsq7CAlG08kkQV9I8smgTzXaQ0WGCyGe0n9jFOFLxB2JZ+ngc6
oEWbujrruN5KBajNT047Chi36t03zClX3uem4kc4nO8bNfSm9+/qQfUkid07ty+J24
vofz3rXTmqSU3YFzACsqb37wNGvSjJ5isMazAKRTgY3c8hS2pJpur36O/gLMeBT5/Y
ciE/JD/6tfPj9GwuR9qouitBo/83VvTcrVfXV15+rDXOPTc3VBDCueetrmFjpJ/VEZ
HZ0tQWvIMWjYA==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <20241012105958.cbctekv7vustleha@hjp.at>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <966b510d-9bd7-4472-a858-7e042d78461d@tompassin.net>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<20241012105958.cbctekv7vustleha@hjp.at>
View all headers

On 10/12/2024 6:59 AM, Peter J. Holzer via Python-list wrote:
> On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote:
>> Is there some utility function out there that can be called to show what the
>> regular expression you typed in will look like by the time it is ready to be
>> used?
>
> I assume that by "ready to be used" you mean the compiled form?
>
> No, there doesn't seem to be a way to dump that. You can
>
> p = re.compile("\\\\sout{")
> print(p.pattern)
>
> but that just prints the input string, which you could do without
> compiling it first.

It prints the escaped version, so you can see if you escaped the string
as you intended. In this case, the print will display '\\sout{'. That's
worth something.

>
> But - without having looked at the implementation - it's far from clear
> that the compiled form would be useful to the user. It's probably some
> kind of state machine, and a large table of state transitions isn't very
> readable.
>
> There are a number of websites which visualize regular expressions.
> Those are probably better for debugging a regular expression than
> anything the re module could reasonably produce (although with the
> caveat that such a web site would use a different implementation and
> therefore might produce different results).
>
> hp
>
>

Subject: RE: Correct syntax for pathological re.search()
From: <avi.e.gross@gmail.com>
Newsgroups: comp.lang.python
Date: Sat, 12 Oct 2024 14:10 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: <avi.e.gross@gmail.com>
Newsgroups: comp.lang.python
Subject: RE: Correct syntax for pathological re.search()
Date: Sat, 12 Oct 2024 10:10:41 -0400
Lines: 60
Message-ID: <mailman.23.1728742245.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<20241012105958.cbctekv7vustleha@hjp.at>
<003201db1cb0$85ac8760$91059620$@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de 67/U/e/vy95UE4rdWC+lmgxBr4dwvUWCMpsNwNcizpQg==
Cancel-Lock: sha1:nqlBhI4JTBs0+Aqiuza4QnrnAGU= sha256:ekYYCEaoWvXre2bFkUadse6Ov2XZX+UBj0catK6r4sw=
Return-Path: <avi.e.gross@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=F5hBzjg3;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'parallel': 0.05;
'debugging': 0.07; 'intermediate': 0.07; 'expression': 0.09;
'fyi,': 0.09; 'prints': 0.09; 'string,': 0.09; 'user.': 0.09;
'utility': 0.09; 'syntax': 0.15; 'that.': 0.15; '"creative': 0.16;
'2024': 0.16; '7:00': 0.16; '__/': 0.16; 'another.': 0.16; 'avi':
0.16; 'challenge!"': 0.16; 'compiled': 0.16; 'expressions.': 0.16;
'gross': 0.16; 'hjp@hjp.at': 0.16; 'holzer': 0.16; 'luke': 0.16;
'machine,': 0.16; 'reality.': 0.16; 'sees': 0.16; 'stross,': 0.16;
'subject:syntax': 0.16; 'url-ip:212.17.106/24': 0.16; 'url-
ip:212.17/16': 0.16; 'url:hjp': 0.16; 'visualize': 0.16; '|_|_)':
0.16; 'wrote:': 0.16; 'october': 0.17; 'probably': 0.17; 'message-
id:@gmail.com': 0.18; 'to:addr:python-list': 0.20; 'input': 0.21;
'goal': 0.23; 'anything': 0.25; 'skip:- 10': 0.25; 'discussion':
0.25; "isn't": 0.27; 'function': 0.27; 'sense': 0.28; 'asked':
0.29; 'seem': 0.31; 'looked': 0.31; 'module': 0.31; "doesn't":
0.32; 'question': 0.32; 'assume': 0.32; 'python-list': 0.32;
'but': 0.32; 'subject:for': 0.33; 'there': 0.33; 'same': 0.34;
'mean': 0.34; 'header:In-Reply-To:1': 0.34; 'received:google.com':
0.34; 'understood': 0.35; 'from:addr:gmail.com': 0.35; 'change':
0.36; 'those': 0.36; "it's": 0.37; 'way': 0.38; 'could': 0.38;
'two': 0.39; 'use': 0.39; 'received:100': 0.39; 'table': 0.39;
'something': 0.40; 'from:': 0.62; 'to:': 0.62; 'point.': 0.62;
'feel': 0.63; 'skip:r 20': 0.64; 'clear': 0.64; 're:': 0.64;
'his': 0.65; 'produce': 0.65; 'look': 0.65; 'and,': 0.69; 'url-
ip:212/8': 0.69; 'site': 0.70; 'sent:': 0.78; 'happens': 0.84;
'received:mail-qk1-x735.google.com': 0.84; 'saturday,': 0.84;
'transitions': 0.84; 'want.': 0.84; 'websites': 0.95
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1728742242; x=1729347042; darn=python.org;
h=thread-index:content-language:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:from:to:cc:subject:date:message-id:reply-to;
bh=ls2SccAqmZhKNak/jZG1WrzRxe9htyAVkrfENb3OsZM=;
b=F5hBzjg3tQgXJrvIesf/lLgBNvDy/nQQ0tHIKKzcZxzWXL9yDHTdTTDxpMdgE4Ujw9
Tu4QPfGb5Jf41vyO3Kpf//6Jvo5NYpWnrUAw5xYshohLIIK1kqrZBEpT3hRxX6DVB1tK
gCqzpF7MRvHmTjSti1ytww9minvty443KE0K4L4XtV34SaEwfMDuwzSn5WbGCbBY1PCd
8E9ZD3F3lyKfTo07BC8j3jpHtq9JZ6neWbLOiSQMBkbtTkBmofmfJUD29BVPZgb/DaeN
8o7wn4A2AEZnalRtr9pmmCAODLt0Ts4iOG4SE1k6jmzxCTQmqOXZRTYbgS9LaWwhuCdO
4e/w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1728742242; x=1729347042;
h=thread-index:content-language:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=ls2SccAqmZhKNak/jZG1WrzRxe9htyAVkrfENb3OsZM=;
b=OyCtRO/4Kaw3+boVqD788mFTyCDTGpboJJ9L1yx7kT11TRAvb7IqCqHnOxU+cZvKOz
b/P1fjdDjcePum3hLBDn61eiQK680R4v39+PSl7YIO2VPakfaLWbOXFHhoh8tNdvBy8Z
fgt9T/Sf5LA+j77nNICx6BL8vdOzIE5dsc9atrGH94MT+0WIE8RV4zW9GeOL4ZdDbxh3
bemT0gDQEKOv6kzODazKpL1kgG4hU78iO/exYRk1Iv42rbcjq3pxOwhAIo+3c1qrbYnj
o5JHTzd0UFSNn0q2xgXddRBrv7EFkhYb6kRO7SKm7f87hI8G2kd8oB9GS3uPR3nesF7s
JAhg==
X-Forwarded-Encrypted: i=1;
AJvYcCWiQj/811dyIj2KEuOd1AoohLjK0/mtNyijrjFf1Ngtfsd2R2hk7uwZVhKUarRwBsJPhii/HXUDl+0wjA==@python.org
X-Gm-Message-State: AOJu0YxOlxNtMOxusvZrWLRoOxfDBU3NUhmy4J1z16tMM6lheEYJ/C22
2Tv3O9NS6LBFKVYcpIhjgjciH9ttGYlrSuba3BNy4pvSAq+sF+mtxLU8pw==
X-Google-Smtp-Source: AGHT+IFw/SDiS3lH7mAFSl7yGXh/0c223uF4aC+MESib/Z50r7kHKlOgd2oZaRFjdTsVi6gdBNTLbw==
X-Received: by 2002:a05:620a:4155:b0:7a9:a389:c13e with SMTP id
af79cd13be357-7b120fb9c74mr523249785a.18.1728742242091;
Sat, 12 Oct 2024 07:10:42 -0700 (PDT)
In-Reply-To: <20241012105958.cbctekv7vustleha@hjp.at>
X-Mailer: Microsoft Outlook 16.0
Content-Language: en-us
Thread-Index: AQFBUE6EHplIDkSetD50ItlJZJ+fJAGoI+1+AVnoAKgBv6MWzbOQsKgw
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <003201db1cb0$85ac8760$91059620$@gmail.com>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<20241012105958.cbctekv7vustleha@hjp.at>
View all headers

Peter,

Matthew understood what I was hinting at in one way and you in another.

The question asked how to add some power of two backslashes or make other
changes, so the RE functionality sees what you want. The goal is to see what
happens when one or more intermediate evaluations may change the string.

So, a simple print may suffice as a parallel way to force the same
evaluations.

Thomas made his point. And, I am starting to feel like I need to change my
name to something like Luke since this discussion must be gospel.

FYI, I was not planning on posting at all. Time to detach.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
Behalf Of Peter J. Holzer via Python-list
Sent: Saturday, October 12, 2024 7:00 AM
To: python-list@python.org
Subject: Re: Correct syntax for pathological re.search()

On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote:
> Is there some utility function out there that can be called to show what
the
> regular expression you typed in will look like by the time it is ready to
be
> used?

I assume that by "ready to be used" you mean the compiled form?

No, there doesn't seem to be a way to dump that. You can

p = re.compile("\\\\sout{")
print(p.pattern)

but that just prints the input string, which you could do without
compiling it first.

But - without having looked at the implementation - it's far from clear
that the compiled form would be useful to the user. It's probably some
kind of state machine, and a large table of state transitions isn't very
readable.

There are a number of websites which visualize regular expressions.
Those are probably better for debugging a regular expression than
anything the re module could reasonably produce (although with the
caveat that such a web site would use a different implementation and
therefore might produce different results).

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Subject: Re: Correct syntax for pathological re.search()
From: Thomas Passin
Newsgroups: comp.lang.python
Date: Sat, 12 Oct 2024 13:06 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: list1@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Sat, 12 Oct 2024 09:06:54 -0400
Lines: 82
Message-ID: <mailman.24.1728750786.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
<b75b7177-47b7-4aad-ba9a-6078417572de@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de KYvHpwsOQTtacVQOQReNKgCdVcs1qyBPnfr1PPh9xiQA==
Cancel-Lock: sha1:u/IzVAPRFHsH1jbrYnPvX1h+5wk= sha256:6RU5zC7a/0A9x4TkOWEn9WOmh7UHCNF1KLHeWTivicU=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=zMmTWjxf;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04; "python's":
0.05; 'demands': 0.07; 'string': 0.07; ':-)': 0.09; 'expression':
0.09; 'linux': 0.09; 'obviously,': 0.09; 'regex': 0.09; 'skip:\\
10': 0.09; 'url-ip:151.101.0.223/32': 0.09; 'url-
ip:151.101.128.223/32': 0.09; 'url-ip:151.101.192.223/32': 0.09;
'url-ip:151.101.64.223/32': 0.09; 'utility': 0.09; 'yes.': 0.09;
'import': 0.15; 'syntax': 0.15; '2024': 0.16; '8:37': 0.16;
'>>>>>': 0.16; 'avi': 0.16; 'backslash': 0.16; 'cases,': 0.16;
'compiled': 0.16; 'discard': 0.16; 'gross': 0.16; 'inspect': 0.16;
'layers': 0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'subject:syntax': 0.16;
'url:howto': 0.16; 'url:regex': 0.16; 'wrote:': 0.16; 'python':
0.16; 'october': 0.17; 'pm,': 0.19; 'to:addr:python-list': 0.20;
'lines': 0.23; 'skip:- 10': 0.25; 'section': 0.25; 'space': 0.26;
'11,': 0.26; 'friday,': 0.26; 'coming': 0.27; 'function': 0.27;
'>>>': 0.28; 'example,': 0.28; 'header:User-Agent:1': 0.30;
'takes': 0.31; "doesn't": 0.32; 'python-list': 0.32;
'received:10.0': 0.32; 'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'titled': 0.32; 'but':
0.32; "i'm": 0.33; 'subject:for': 0.33; 'there': 0.33; 'header:In-
Reply-To:1': 0.34; 'trying': 0.35; '"the': 0.35; 'mon,': 0.36;
'those': 0.36; "skip:' 10": 0.37; 'using': 0.37; "it's": 0.37;
'means': 0.38; 'read': 0.38; 'enough': 0.39; 'received:100': 0.39;
'want': 0.40; 'should': 0.40; 'four': 0.60; 'michael': 0.60;
'search': 0.61; 'from:': 0.62; 'to:': 0.62; 'url-ip:151.101.0/24':
0.62; 'url-ip:151.101.128/24': 0.62; 'url-ip:151.101.192/24':
0.62; 'url-ip:151.101.64/24': 0.62; 'skip:r 20': 0.64; 're:':
0.64; 'your': 0.64; 'look': 0.65; 'header:Received:6': 0.67;
'received:64': 0.67; 'per': 0.68; 'right': 0.68; 'skip:b 40':
0.69; 'through.': 0.69; 'front': 0.70; 'care': 0.71; 'life': 0.77;
'sent:': 0.78; 'left': 0.83; 'live.': 0.84; 'hence,': 0.91;
'subject.': 0.93
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1728738414; a=rsa-sha256;
cv=none;
b=jX2x/KzmgfI9kx0eMLegplDImS8VzJdQNU3IlfGTzH4ykIpmxZdxfGGm917uKczQ5zDARc
W0h68Ab1vNs9XjrCqrVlJaPHkBJHmoTAcVAwivsJaQmLvrN8URToShUy+3WD/GP1KX+mT9
TkU8cotNmvDEirhpn1kB/28iFawPtOXAi9lWwTQfI688hzlfs2a9pcUfAeYZffNteBi+nM
5/70Oskq25jcg+TwNyTLTuK0q+FtEGyRd7YnLOOgMlFgRHSMeS6ruGxFXSHvjzz2cmoo1L
DepnjT0fzlSIvGqBsdWiFgWpUMHMFTplusOODTFatyhLPcDsngxIvKl047/nhw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1728738414;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=/mQDSC6QmLEz5x+9YUC6hCzXQAt4MBQ7JcxWAzBJ6mE=;
b=lB6D0LabLGd9B6B8/B+l1uJ844TNu7MLeRYhDsPY2cu2+qXfRcTJ1r+zhNhRTUbmYJ7ZRI
sqcei+MZYDVGz0+XQfsDw6KUPsDpgMbNCn4xjyQyg3wzeT62RQePBpzemM9EYtvgQDD4PU
1JzW+QDw+PNyXOc2TFZzBSYtXl9jkcO8PXAMPS22quMlW8hKEQzDlEas50svl/8PLm+lhz
zRE4IBTewe3ctQM7hYiRlsb9I03i7xlKG4Oz1KnYM/RxDUb33Lhzso1cLrg34aR2iG8HOF
cljFPVz8I8grs1LlLugLBXwddK1CEsaF9zdRSCike6hSKL+VALLqyNt4aB0UZA==
ARC-Authentication-Results: i=1; rspamd-5b4c8788b8-8v6p7;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Occur-Scare: 288aa20909a621e2_1728738414656_2551319984
X-MC-Loop-Signature: 1728738414656:2364669830
X-MC-Ingress-Time: 1728738414655
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1728738414;
bh=/mQDSC6QmLEz5x+9YUC6hCzXQAt4MBQ7JcxWAzBJ6mE=;
h=Date:From:Subject:To:Content-Type:Content-Transfer-Encoding;
b=zMmTWjxfJCe949c3SjW0YiOWyb36OqzvbMf6QCKalMyj90oq96aM3hVlULjo4+3a4
kAReAu9khf6gEJtGfrBzBRNBRZTOSy98RWQ+/5eFAxbtjpsu3VowRPwfHqD68hP8+J
6E5javWxmJrXEJb5w6fcsPUzGYB8+hhiIn+OYXxnbqPA1/2PeZ59FeacHhrnj69ZuU
4RwV0GePx19pHLMiqyJaGrBkCV0bWCh+X2if27e/6B+yZ70TmCKq4FdxyLjSm5eoly
AlAOVOce9+XNeV+/TTyz5QcWk67jMZ8PefcfnzshYp3tnGpXglRrTfJdXALecsHiym
tOddjrTqiAr/Q==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <b75b7177-47b7-4aad-ba9a-6078417572de@tompassin.net>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
View all headers

On 10/11/2024 8:37 PM, MRAB via Python-list wrote:
> On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
>> Is there some utility function out there that can be called to show
>> what the
>> regular expression you typed in will look like by the time it is ready
>> to be
>> used?
>>
>> Obviously, life is not that simple as it can go through multiple
>> layers with
>> each dealing with a layer of backslashes.
>>
>> But for simple cases, ...
>>
> Yes. It's called 'print'. :-)

There is section in the Python docs about this backslash subject. It's
titled "The Backslash Plague" in

https://docs.python.org/3/howto/regex.html

You can also inspect the compiled expression to see what string it
received after all the escaping:

>>> import re
>>>
>>> re_string = '\\w+\\\\sub'
>>> re_pattern = re.compile(re_string)
>>>
>>> # Should look as if we had used r'\w+\\sub'
>>> print(re_pattern.pattern)
\w+\\sub

>> -----Original Message-----
>> From: Python-list <python-list-
>> bounces+avi.e.gross=gmail.com@python.org> On
>> Behalf Of Gilmeh Serda via Python-list
>> Sent: Friday, October 11, 2024 10:44 AM
>> To: python-list@python.org
>> Subject: Re: Correct syntax for pathological re.search()
>>
>> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
>>
>>> I'm trying to discard lines that include the string "\sout{" (which is
>>> TeX, for those who are curious. I have tried:
>>>    if not re.search("\sout{", line): if not re.search("\sout\{", line):
>>>    if not re.search("\\sout{", line): if not re.search("\\sout\{",
>>>    line):
>>>
>>> But the lines with that string keep coming through. What is the right
>>> syntax to properly escape the backslash and the left curly bracket?
>>
>> $ python
>> Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on
>> linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import re
>>>>> s = r"testing \sout{WHADDEVVA}"
>>>>> re.search(r"\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You want a literal backslash, hence, you need to escape everything.
>>
>> It is not enough to escape the "\s" as "\\s", because that only takes
>> care
>> of Python's demands for escaping "\". You also need to escape the "\" for
>> the RegEx as well, or it will read it like it means "\s", which is the
>> RegEx for a space character and therefore your search doesn't match,
>> because it reads it like you want to search for " out{".
>>
>> Therefore, you need to escape it either as per my example, or by using
>> four "\" and no "r" in front of the first quote, which also works:
>>
>>>>> re.search("\\\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You don't need to escape the curly braces. We call them "seagull wings"
>> where I live.
>>
>

Subject: Re: Correct syntax for pathological re.search()
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Sun, 13 Oct 2024 10:45 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: 13 Oct 2024 10:45:44 GMT
Organization: Stefan Ram
Lines: 23
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <regex-20241013114449@ram.dialup.fu-berlin.de>
References: <ve0o34$1nep4$1@dont-email.me> <MQaOO.3313338$EVn.2054758@fx04.ams4>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de 0lP9m+E/HqB/epv2WPsU9A18mXH86Y40a9HI/w5mkLe3HX
Cancel-Lock: sha1:YGMngTt1ISUdhtqGHVcqMD33JbM= sha256:ChYHgvQqeKTM/1m7turJ6fCg7wKTKuD7PP6kNzit88g=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

Gilmeh Serda <gilmeh.serda@nothing.here.invalid> wrote or quoted:
>You don't need to escape the curly braces.

Here's the 411 on some gnarly regex characters:

.. matches any single character, except when it hits a new line
^ kicks things off at the start of the sequence
$ wraps it up at the end
* goes zero to infinity
+ one or more times
? maybe once, maybe not
{ starts a specific count, like {2} or {2,3}
} ends such a count
| either this or that
\ flips the script on the next character's meaning
( drops in on a group
) bails out of the group
[ paddles out to a character class
] rides the character class to shore

.

Pages:12

rocksolid light 0.9.8
clearnet tor