Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #49: Bogon emissions


comp / comp.lang.python / re.DOTALL

SubjectAuthor
* re.DOTALLStefan Ram
+- Re: re.DOTALLStefan Ram
`- Re: re.DOTALL (Posting On Python-List Prohibited)Lawrence D'Oliveiro

1
Subject: re.DOTALL
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Wed, 17 Jul 2024 18:09 UTC
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: re.DOTALL
Date: 17 Jul 2024 18:09:51 GMT
Organization: Stefan Ram
Lines: 41
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <DOTALL-20240717190848@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de ExymEG69DwOnJlZGk+XZIwotiL6PbIB8CdsaO2gm5ZMJyg
Cancel-Lock: sha1:uro6JHqML636gjnMoEjX8CMfjwU= sha256:w3jQq5sYZjyDQ69/+KdH5SEAD7wcBZHZsV28i76SLHs=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

Below, I use [\s\S] to match each and every character.
I can't seem to get the same effect using "re.DOTALL"!

Yet the Python Library Reference says,

|(Dot.) In the default mode, this matches any character except
|a newline. If the DOTALL flag has been specified, this
|matches any character including a newline.
what the Python Library Reference says.

main.py

import re

text = '''
alpha
<hr>
gamma
<hr>
epsilon
'''[ 1: -1 ]

pattern = r'^.*?\n<hr.*?\n(.*)\n<hr.*$'

output = re.sub( pattern.replace( r'.', r'[\s\S]' ), r'\1', text )
print( output )

print( '--' )

output = re.sub( pattern, r'\1', text, re.DOTALL )
print( output )

stdout

gamma
--
alpha
<hr>
gamma
<hr>
epsilon

Subject: Re: re.DOTALL
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Wed, 17 Jul 2024 18:21 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: re.DOTALL
Date: 17 Jul 2024 18:21:26 GMT
Organization: Stefan Ram
Lines: 4
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <DOTALL-20240717192110@ram.dialup.fu-berlin.de>
References: <DOTALL-20240717190848@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de OIvc9/HIhD9jEUnfHx1GTQ1LhmoVstmaKenJE5r6DZPP9X
Cancel-Lock: sha1:1kcuS7RN0jPvCyVVQu5obC1O+IA= sha256:C+Y28VVMpvq4kjlC3wttSABtiJfiWoOJVl5MuHDJK4w=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
>I can't seem to get the same effect using "re.DOTALL"!

PS: But (?s) works.

Subject: Re: re.DOTALL (Posting On Python-List Prohibited)
From: Lawrence D'Oliv
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Wed, 17 Jul 2024 23:54 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: comp.lang.python
Subject: Re: re.DOTALL (Posting On Python-List Prohibited)
Date: Wed, 17 Jul 2024 23:54:25 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <v79ljh$22itv$3@dont-email.me>
References: <DOTALL-20240717190848@ram.dialup.fu-berlin.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 18 Jul 2024 01:54:26 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="9a105d93edfb2dbcb4f0f989ba6cd483";
logging-data="2182079"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+WzouO5z2tRMUqC7kRtq6s"
User-Agent: Pan/0.158 (Avdiivka; )
Cancel-Lock: sha1:DIcBWZ5chmuDnQmmeKoK8dpE5is=
View all headers

On 17 Jul 2024 18:09:51 GMT, Stefan Ram wrote:

> Below, I use [\s\S] to match each and every character.
> I can't seem to get the same effect using "re.DOTALL"!

This might help clarify things:

text = "alpha\n<hr>\ngamma\n<hr>\nepsilon"
pattern = r'^(.*?)\n(<hr.*?)\n(.*)\n(<hr.*)$'

re.search(pattern, text, re.DOTALL).groups()

('alpha', '<hr>', 'gamma', '<hr>\nepsilon')

1

rocksolid light 0.9.8
clearnet tor