Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #380: Operators killed when huge stack of backup tapes fell over.


comp / comp.lang.python / Decoding bytes to text strings in Python 2

SubjectAuthor
* Decoding bytes to text strings in Python 2Rayner Lucas
+* Re: Decoding bytes to text strings in Python 2Chris Angelico
|`* Re: Decoding bytes to text strings in Python 2Rayner Lucas
| +- Re: Decoding bytes to text strings in Python 2 (Posting On Python-List ProhibiteLawrence D'Oliveiro
| +- Re: Decoding bytes to text strings in Python 2Chris Angelico
| +- Re: Decoding bytes to text strings in Python 2MRAB
| +- Re: Decoding bytes to text strings in Python 2Chris Angelico
| `- Tkinter and astral characters (was: Decoding bytes to text strings in Python 2)Peter J. Holzer
`* Re: Decoding bytes to text strings in Python 2Stefan Ram
 `- Re: Decoding bytes to text strings in Python 2Rayner Lucas

1
Subject: Decoding bytes to text strings in Python 2
From: Rayner Lucas
Newsgroups: comp.lang.python
Organization: The Lumber Cartel (TINLC)
Date: Fri, 21 Jun 2024 15:49 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: usenet202101@magic-cookie.co.ukNOSPAMPLEASE (Rayner Lucas)
Newsgroups: comp.lang.python
Subject: Decoding bytes to text strings in Python 2
Date: Fri, 21 Jun 2024 16:49:08 +0100
Organization: The Lumber Cartel (TINLC)
Lines: 66
Message-ID: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 21 Jun 2024 17:49:09 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="35e98c9dcf67801cab227404e5588ce5";
logging-data="3390520"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX192KY7DsHVRzoazW00UX8nu"
User-Agent: MicroPlanet-Gravity/3.0.4
Cancel-Lock: sha1:1gU1aaj+Ax53rriOwHcZ/i/en8Q=
View all headers

I'm curious about something I've encountered while updating a very old
Tk app (originally written in Python 1, but I've ported it to Python 2
as a first step towards getting it running on modern systems). The app
downloads emails from a POP server and displays them. At the moment, the
code is completely unaware of character encodings (which is something I
plan to fix), and I have found that I don't understand what Python is
doing when no character encoding is specified.

To demonstrate, I have written this short example program that displays
a variety of UTF-8 characters to check whether they are decoded
properly:

---- Example Code ----
import Tkinter as tk

window = tk.Tk()

mytext = """
\xc3\xa9 LATIN SMALL LETTER E WITH ACUTE
\xc5\x99 LATIN SMALL LETTER R WITH CARON
\xc4\xb1 LATIN SMALL LETTER DOTLESS I
\xef\xac\x84 LATIN SMALL LIGATURE FFL
\xe2\x84\x9a DOUBLE-STRUCK CAPITAL Q
\xc2\xbd VULGAR FRACTION ONE HALF
\xe2\x82\xac EURO SIGN
\xc2\xa5 YEN SIGN
\xd0\x96 CYRILLIC CAPITAL LETTER ZHE
\xea\xb8\x80 HANGUL SYLLABLE GEUL
\xe0\xa4\x93 DEVANAGARI LETTER O
\xe5\xad\x97 CJK UNIFIED IDEOGRAPH-5B57
\xe2\x99\xa9 QUARTER NOTE
\xf0\x9f\x90\x8d SNAKE
\xf0\x9f\x92\x96 SPARKLING HEART
"""

mytext = mytext.decode(encoding="utf-8")
greeting = tk.Label(text=mytext)
greeting.pack()

window.mainloop()
---- End Example Code ----

This works exactly as expected, with all the characters displaying
correctly.

However, if I comment out the line 'mytext = mytext.decode
(encoding="utf-8")', the program still displays *almost* everything
correctly. All of the characters appear correctly apart from the two
four-byte emoji characters at the end, which instead display as four
characters. For example, the "SNAKE" character actually displays as:
U+00F0 LATIN SMALL LETTER ETH
U+FF9F HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
U+FF90 HALFWIDTH KATAKANA LETTER MI
U+FF8D HALFWIDTH KATAKANA LETTER HE

What's Python 2 doing here? sys.getdefaultencoding() returns 'ascii',
but it's clearly not attempting to display the bytes as ASCII (or
cp1252, or ISO-8859-1). How is it deciding on some sort of almost-but-
not-quite UTF-8 decoding?

I am using Python 2.7.18 on a Windows 10 system. If there's any other
relevant information I should provide please let me know.

Many thanks,
Rayner

Subject: Re: Decoding bytes to text strings in Python 2
From: Chris Angelico
Newsgroups: comp.lang.python
Date: Fri, 21 Jun 2024 17:42 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: rosuav@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2
Date: Sat, 22 Jun 2024 03:42:39 +1000
Lines: 24
Message-ID: <mailman.159.1718991773.2909.python-list@python.org>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de 03COOvz6IOZQqZuNvZSCKQ1huqAvbExMWtG+xcTf39Gg==
Cancel-Lock: sha1:0w+ovxaj1oCTRhKUFzkqH+yp9jw= sha256:s7TSGc5WISCGMwqJQmEiaYWuV2V6kgBcd0tWjAk0GNw=
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=arPeBbXr;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.029
X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'tkinter': 0.07; 'debian':
0.09; 'linux': 0.09; 'subject:Python': 0.12; 'problem.': 0.15;
'(because': 0.16; '*think*': 0.16; '2024': 0.16; 'characters.':
0.16; 'chrisa': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris
angelico': 0.16; 'ported': 0.16; 'solved': 0.16; 'unicode': 0.16;
'windows:': 0.16; 'wrote:': 0.16; 'python': 0.16; 'to:addr:python-
list': 0.20; 'written': 0.22; "i've": 0.22; 'sat,': 0.22;
'install': 0.23; 'jun': 0.26; 'old': 0.27; "doesn't": 0.32;
'encountered': 0.32; 'python-list': 0.32; 'message-
id:@mail.gmail.com': 0.32; 'but': 0.32; "i'm": 0.33; 'windows':
0.34; 'able': 0.34; 'header:In-Reply-To:1': 0.34;
'received:google.com': 0.34; 'running': 0.34;
'from:addr:gmail.com': 0.35; 'system,': 0.36; 'using': 0.37;
"it's": 0.37; 'way': 0.38; 'rest': 0.39; 'support.': 0.40;
'something': 0.40; 'should': 0.40; 'provide': 0.60; "there's":
0.61; 'once': 0.63; 'updating': 0.64; 'let': 0.66; 'that,': 0.67;
'know.': 0.68; 'longer': 0.71; 'relevant': 0.73; "you'll": 0.73;
'operate': 0.75; 'bothered': 0.84; 'lucas': 0.91; 'migrate': 0.93
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1718991771; x=1719596571; darn=python.org;
h=to:subject:message-id:date:from:in-reply-to:references:mime-version
:from:to:cc:subject:date:message-id:reply-to;
bh=CjBHVCi3G7F4BEjEcTUw51HMSULU+hbms9rLvIlSa5k=;
b=arPeBbXrEWbWnDLHDWYCPBOJAu/w5h99tvRJgAn/1ei0DXImND/DwGrNfG66Fw2SRP
9RgNPGv0VQPwEdCH2XJ+ElA0oiot94AepkVrqHmoJNPxNnOCdTnYvj8WK1eAuAfBlA2c
WBAhgwQBsczjLPVvI68Rkep4tlIIrMnH1oGKRhW60Z6l/rUJK0SGxILQQ53LGWcXK0Rj
O8um3+uQ2WmfUfXgC06xxCdtn18Zh+GQeH9xzHG5NHgwLN9ofScCwjHvbSOSZ+A55vQB
v3wLPLKaCFCWCVsgaNkrVgM9BqcIhUkI5OCOT+tzmniUAkq1Y+M2GJHo8qTZirm3/Ofm
Qjgw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1718991771; x=1719596571;
h=to:subject:message-id:date:from:in-reply-to:references:mime-version
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=CjBHVCi3G7F4BEjEcTUw51HMSULU+hbms9rLvIlSa5k=;
b=mO2YWyldYWULcKlcDxUKFH0CyJzez2T32at4XH/uFdcwmnOnokkc90rtyI2QTm6i0+
qvdaopw6IWTriC8W2prQvOHkSCAlbTPBdOaMwwxhyxLZZB53Vw0XirsCCN5HqR6bPm/2
mtYY3hzj9jbVHvNO7811g8G6u7S9iPg4wZ5KIcTzcuPhThENSc0g80JFEoUe/0NX/TUl
1DMv2pMSfSTnTdgmUVyFgJ49EeOoZ1CNHGnGUtAx6m7Hu3oJxAmrzt7Ug1A/fP67VbIT
LbSysOJLK2ir49vsid16WXkd1gV6iPrbeqINruOK6+Ko50O8kO2wCyKSLLfpiDfpZf3j
l2Ng==
X-Gm-Message-State: AOJu0YwEAbdBy3Zggv0tVi4r/7MMyBibTbgi2HPViC82rLj0XGjFiCNp
COej04EazJEotiryFExdiQyn8MNY7Yzj0hmTK97mjDJoh021/ZT7bx0fmQjtV2uEhBAzJQ4W9RE
hPxMGkP1NNYGvHAmgj5bRadzaCF9Tbw==
X-Google-Smtp-Source: AGHT+IFaFycFFhEFpP3wvbwcCqJRWYebhCZNdyGd6jKwZyBbyP4uQGPK6/hro7DBoTc9UAMiboblCVZPl7XNb5aLrsc=
X-Received: by 2002:a05:6512:312f:b0:52c:dac3:392b with SMTP id
2adb3069b0e04-52cdac33a98mr920839e87.33.1718991770611; Fri, 21 Jun 2024
10:42:50 -0700 (PDT)
In-Reply-To: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
X-Mailman-Original-References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
View all headers

On Sat, 22 Jun 2024 at 03:28, Rayner Lucas via Python-list
<python-list@python.org> wrote:
> I'm curious about something I've encountered while updating a very old
> Tk app (originally written in Python 1, but I've ported it to Python 2
> as a first step towards getting it running on modern systems).
>
> I am using Python 2.7.18 on a Windows 10 system. If there's any other
> relevant information I should provide please let me know.

Unfortunately, you're running into one of the most annoying problems
from Python 2 and Windows: "narrow builds". You don't actually have
proper Unicode support. You have a broken implementation that works
for UCS-2 but doesn't actually support astral characters.

If you switch to a Linux system, it should work correctly, and you'll
be able to migrate the rest of the way onto Python 3. Once you achieve
that, you'll be able to operate on Windows or Linux equivalently,
since Python 3 solved this problem. At least, I *think* it will; my
current system has a Python 2 installed, but doesn't have tkinter
(because I never bothered to install it), and it's no longer available
from the upstream Debian repos, so I only tested it in the console.
But the decoding certainly worked.

ChrisA

Subject: Re: Decoding bytes to text strings in Python 2
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Fri, 21 Jun 2024 17:43 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2
Date: 21 Jun 2024 17:43:13 GMT
Organization: Stefan Ram
Lines: 21
Expires: 1 Feb 2025 11:59:58 GMT
Message-ID: <Text-20240621184010@ram.dialup.fu-berlin.de>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de tFjDN6td2uKpGCjzgqeOIg3YmJWC3EiZk9EIxNS4SAqyMY
Cancel-Lock: sha1:jtaWNGKgUUrDDhU2rHvhdecBh3E= sha256:EycplbC7kEe+34X2m2hVDM5P2vlmeqAT1nnIu+SUEEY=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

Rayner Lucas <usenet202101@magic-cookie.co.ukNOSPAMPLEASE> wrote or quoted:
>What's Python 2 doing here? sys.getdefaultencoding() returns 'ascii',
>but it's clearly not attempting to display the bytes as ASCII (or
>cp1252, or ISO-8859-1). How is it deciding on some sort of almost-but-
>not-quite UTF-8 decoding?

I didn't really do a super thorough deep dive on this,
but I'm just giving the initial impression without
actually being familiar with Tkinter under Python 2,
so I might be wrong!

The Text widget typically expects text in Tcl encoding,
which is usually UTF-8.

This is independent of the result returned by sys.get-
defaultencoding()!

If a UTF-8 string is inserted directly as a bytes object,
its code points will be displayed correctly by the Text
widget as long as they are in the BMP (Basic Multilingual
Plane), as you already found out yourself.

Subject: Re: Decoding bytes to text strings in Python 2
From: Rayner Lucas
Newsgroups: comp.lang.python
Organization: The Lumber Cartel (TINLC)
Date: Sat, 22 Jun 2024 12:13 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: usenet202101@magic-cookie.co.ukNOSPAMPLEASE (Rayner Lucas)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2
Date: Sat, 22 Jun 2024 13:13:28 +0100
Organization: The Lumber Cartel (TINLC)
Lines: 31
Message-ID: <MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org> <CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com> <mailman.159.1718991773.2909.python-list@python.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 22 Jun 2024 15:13:29 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a88ae5f7f289d64acae193ef397f74a5";
logging-data="3956230"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+XYRB3gu39V0mJnrGbUvHM"
User-Agent: MicroPlanet-Gravity/3.0.4
Cancel-Lock: sha1:dpsWcdOUAeGfFPieIGW2Qi+7Xro=
View all headers

In article <mailman.159.1718991773.2909.python-list@python.org>,
rosuav@gmail.com says...
>
> If you switch to a Linux system, it should work correctly, and you'll
> be able to migrate the rest of the way onto Python 3. Once you achieve
> that, you'll be able to operate on Windows or Linux equivalently,
> since Python 3 solved this problem. At least, I *think* it will; my
> current system has a Python 2 installed, but doesn't have tkinter
> (because I never bothered to install it), and it's no longer available
> from the upstream Debian repos, so I only tested it in the console.
> But the decoding certainly worked.

Thank you for the idea of trying it on a Linux system. I did so, and my
example code generated the error:

_tkinter.TclError: character U+1f40d is above the range (U+0000-U+FFFF)
allowed by Tcl

So it looks like the problem is ultimately due to a limitation of
Tcl/Tk. I'm still not sure why it doesn't give an error on Windows and
instead either works (when UTF-8 encoding is specified) or converts the
out-of-range characters to ones it can display (when the encoding isn't
specified). But now I know what the root of the problem is, I can deal
with it appropriately (and my curiosity is at least partly satisfied).

This has given me a much better understanding of what I need to do in
order to migrate to Python 3 and add proper support for non-ASCII
characters, so I'm very grateful for your help!

Thanks,
Rayner

Subject: Re: Decoding bytes to text strings in Python 2
From: Rayner Lucas
Newsgroups: comp.lang.python
Organization: The Lumber Cartel (TINLC)
Date: Sat, 22 Jun 2024 12:26 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: usenet202101@magic-cookie.co.ukNOSPAMPLEASE (Rayner Lucas)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2
Date: Sat, 22 Jun 2024 13:26:00 +0100
Organization: The Lumber Cartel (TINLC)
Lines: 37
Message-ID: <MPG.40e0d331681f012e9896e1@news.eternal-september.org>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org> <Text-20240621184010@ram.dialup.fu-berlin.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 22 Jun 2024 15:26:01 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a88ae5f7f289d64acae193ef397f74a5";
logging-data="3956230"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19c/RdbX097bxKJeS6D5FCL"
User-Agent: MicroPlanet-Gravity/3.0.4
Cancel-Lock: sha1:PL2hyl2eQT65CMocTQg8c6cNrn8=
View all headers

In article <Text-20240621184010@ram.dialup.fu-berlin.de>, ram@zedat.fu-
berlin.de says...
>
> I didn't really do a super thorough deep dive on this,
> but I'm just giving the initial impression without
> actually being familiar with Tkinter under Python 2,
> so I might be wrong!
>
> The Text widget typically expects text in Tcl encoding,
> which is usually UTF-8.
>
> This is independent of the result returned by sys.get-
> defaultencoding()!
>
> If a UTF-8 string is inserted directly as a bytes object,
> its code points will be displayed correctly by the Text
> widget as long as they are in the BMP (Basic Multilingual
> Plane), as you already found out yourself.

Many thanks, you've helped me greatly in understanding what's happening.
When I tried running my example code on a different system (Python
2.7.18 on Linux, with Tcl/Tk 8.5), I got the error:

_tkinter.TclError: character U+1f40d is above the range (U+0000-U+FFFF)
allowed by Tcl

So, as your reply suggests, the problem is ultimately a limitation of
Tcl/Tk itself. Perhaps I should have spent more time studying the docs
for that instead of puzzling over the details of character encodings in
Python! I'm not sure why it doesn't give the same error on Windows, but
at least now I know where the root of the issue is.

I am now much better informed about how to migrate the code I'm working
on, so I am very grateful for your help.

Thanks,
Rayner

Subject: Re: Decoding bytes to text strings in Python 2 (Posting On Python-List Prohibited)
From: Lawrence D'Oliv
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Sat, 22 Jun 2024 23:19 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2 (Posting On
Python-List Prohibited)
Date: Sat, 22 Jun 2024 23:19:26 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 8
Message-ID: <v57m5u$3vq0p$1@dont-email.me>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 Jun 2024 01:19:26 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="be24819e5877b282c066214411113658";
logging-data="4188185"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/u2fcqT7X15sgUtDh0uHiu"
User-Agent: Pan/0.158 (Avdiivka; )
Cancel-Lock: sha1:eDYJbbTV81/IyLc0Tfa8f9PLMiY=
View all headers

On Sat, 22 Jun 2024 13:13:28 +0100, Rayner Lucas wrote:

> I'm still not sure why it doesn't give an error on Windows and
> instead either works (when UTF-8 encoding is specified) or converts the
> out-of-range characters to ones it can display ...

Windows can be so helpful, can’t it ...
<https://arstechnica.com/security/2024/06/php-vulnerability-allows-attackers-to-run-malicious-code-on-windows-servers/>

Subject: Re: Decoding bytes to text strings in Python 2
From: Chris Angelico
Newsgroups: comp.lang.python
Date: Sun, 23 Jun 2024 23:30 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: rosuav@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2
Date: Mon, 24 Jun 2024 09:30:30 +1000
Lines: 55
Message-ID: <mailman.162.1719185446.2909.python-list@python.org>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
<CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de HTZaXDl0QzsZgW+mYQkGzQSSYeEticw3bu6cAvMkgYiA==
Cancel-Lock: sha1:SLy+iQ7zQyoxJSd6G0f77jZ6+RI= sha256:1IBxcdQiOKRMe1d0a7onUFn6CoMOLuJ95p28Kz9oBZU=
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=k99/U3ae;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.003
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'looks': 0.02; 'generated':
0.03; 'error:': 0.05; 'fairly': 0.05; 'hopefully': 0.07;
'tkinter': 0.07; 'utf-8': 0.07; 'characters,': 0.09; 'converting':
0.09; 'debian': 0.09; 'linux': 0.09; 'subject:Python': 0.12;
'problem.': 0.15; '(because': 0.16; '(when': 0.16; '*think*':
0.16; '2024': 0.16; 'chrisa': 0.16; 'encoding': 0.16;
'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16;
'help!': 0.16; 'interpreter': 0.16; 'solved': 0.16; 'text)': 0.16;
'using.': 0.16; 'windows.': 0.16; 'wrote:': 0.16; 'problem': 0.16;
'python': 0.16; 'instead': 0.17; 'to:addr:python-list': 0.20;
'skip:_ 10': 0.22; 'install': 0.23; 'code': 0.23; 'idea': 0.24;
'(and': 0.25; 'python,': 0.25; 'seems': 0.26; 'jun': 0.26;
"isn't": 0.27; 'else': 0.27; 'old': 0.27; 'error': 0.29;
"doesn't": 0.32; 'grateful': 0.32; 'python-list': 0.32;
"wouldn't": 0.32; 'message-id:@mail.gmail.com': 0.32; 'but': 0.32;
"i'm": 0.33; 'windows': 0.34; 'able': 0.34; 'header:In-Reply-
To:1': 0.34; 'received:google.com': 0.34; 'trying': 0.35;
'from:addr:gmail.com': 0.35; 'built': 0.36; 'display': 0.36;
'mon,': 0.36; 'system,': 0.36; 'really': 0.37; "it's": 0.37;
'way': 0.38; 'least': 0.39; 'use': 0.39; 'rest': 0.39; 'still':
0.40; 'case.': 0.40; 'something': 0.40; 'should': 0.40; 'above':
0.62; 'true': 0.63; 'email addr:gmail.com': 0.63; 'once': 0.63;
'range': 0.64; 'your': 0.64; 'look': 0.65; 'that,': 0.67; 'order':
0.69; 'hybrid': 0.69; 'soon!': 0.70; 'longer': 0.71; 'deal': 0.73;
"you'll": 0.73; 'article': 0.73; 'operate': 0.75; '(that': 0.84;
'bothered': 0.84; 'characters': 0.84; 'converts': 0.84;
'ultimately': 0.84; 'curiosity': 0.91; 'lucas': 0.91; 'reliable.':
0.91; 'migrate': 0.93
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1719185443; x=1719790243; darn=python.org;
h=to:subject:message-id:date:from:in-reply-to:references:mime-version
:from:to:cc:subject:date:message-id:reply-to;
bh=qqQT23snBArHo7idTPzAiAMfLZYznF2LisAbawz1KfA=;
b=k99/U3aerZSa8L38hwEfm5VTaAxS3xVsK5aKxPceZPbs8DdW2Ve+JNjxbdGveiUfuZ
qYBn1tav1ZYJRx7b3z6J8U32Q7ZvWD+DaPQ2AU3A9zBhyE3KMYEP1znPnMz507kukTFz
/iW8LG/PZ1GGCZVsEluSZGCUFcaRyMVVxyOnyIdFHomHq/h7JdTPbJ530SVJaQUJexgH
JR48WvVu6GsoolCl/NSOVHXnkLkYqNtDhbmOtFe1K01vUNUiOg6OLmn7+SfonXNKcafw
785WewjnX3fjlfGGEUwlHsKrQkm+6kB/OhcWe31XtkCCSLJjBk3nCKX+ymG4OeP96yg/
zJkQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1719185443; x=1719790243;
h=to:subject:message-id:date:from:in-reply-to:references:mime-version
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=qqQT23snBArHo7idTPzAiAMfLZYznF2LisAbawz1KfA=;
b=bjkCbdE/9elLw1aVoWOMLxiJ1zs+v7DduPTdcTKVU/dOwMD8ovDcJgoj/joi16UpAb
DSFz1i7NNQFNC0u1F/ynxP5xnGJ2F6miGbMj9xpGXRA1G+U32rUA83XUi+mVoA1jX6Zq
6JMlr5oKLv57BsJDjFYmkme+VQiN/YYjiYzO5UZuQu2hf3Yt3E2XEYeFNkivhaf0/GVY
B31qBHryth6z+onjEqNotm/uHWbnSp9iJODjFTKZFqmo03NrVaJ9PuS1LYySKiR3cehJ
Qmgt8SjVhwHr8a0xtb3/vNsnAFkq71M3T7hFly/XoJ3/HC7iw7pUWugIw+xjemLD2BrJ
xk1Q==
X-Gm-Message-State: AOJu0Yw/iYXOYgPvSBMJmV9Z0M5auHcvYGuC6NxYXYKIT6Tv9CJ7bV/8
ayJcgidp5wbsc1KTgwd2S/BppanTXNGZ/q1qd+t/I8crpJfbwNHDQ4TiWLfDcYrNkBm5ZN3GfYm
/kwEID6IiPJUmdUxTcjxIPWrOdZQB4g==
X-Google-Smtp-Source: AGHT+IEp1funNj06euhDerW1Le95U9KIajE2js8E0js4289Jh6W/o62XrUIhWzrEPHbHiuQclKyhLWffzED/LiCrpM8=
X-Received: by 2002:a05:6512:3449:b0:52c:dbf9:7e54 with SMTP id
2adb3069b0e04-52ce1836330mr1716246e87.41.1719185442277; Sun, 23 Jun 2024
16:30:42 -0700 (PDT)
In-Reply-To: <MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
X-Mailman-Original-References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
View all headers

On Mon, 24 Jun 2024 at 08:20, Rayner Lucas via Python-list
<python-list@python.org> wrote:
>
> In article <mailman.159.1718991773.2909.python-list@python.org>,
> rosuav@gmail.com says...
> >
> > If you switch to a Linux system, it should work correctly, and you'll
> > be able to migrate the rest of the way onto Python 3. Once you achieve
> > that, you'll be able to operate on Windows or Linux equivalently,
> > since Python 3 solved this problem. At least, I *think* it will; my
> > current system has a Python 2 installed, but doesn't have tkinter
> > (because I never bothered to install it), and it's no longer available
> > from the upstream Debian repos, so I only tested it in the console.
> > But the decoding certainly worked.
>
> Thank you for the idea of trying it on a Linux system. I did so, and my
> example code generated the error:
>
> _tkinter.TclError: character U+1f40d is above the range (U+0000-U+FFFF)
> allowed by Tcl
> > that, you'll be able to operate on Windows or Linux equivalently,
> > since Python 3 solved this problem. At least, I *think* it will; my
> > current system has a Python 2 installed, but doesn't have tkinter
> > (because I never bothered to install it), and it's no longer available
> > from the upstream Debian repos, so I only tested it in the console.
> > But the decoding certainly worked.
>
> Thank you for the idea of trying it on a Linux system. I did so, and my
> example code generated the error:
>
> _tkinter.TclError: character U+1f40d is above the range (U+0000-U+FFFF)
> allowed by Tcl
>
> So it looks like the problem is ultimately due to a limitation of
> Tcl/Tk.
Yep, that seems to be the case. Not sure if that's still true on a
more recent Python, but it does look like you won't get astral
characters in tkinter on the one you're using.

>
> So it looks like the problem is ultimately due to a limitation of
> Tcl/Tk.
Yep, that seems to be the case. Not sure if that's still true on a
more recent Python, but it does look like you won't get astral
characters in tkinter on the one you're using.

> I'm still not sure why it doesn't give an error on Windows and

Because of the aforementioned weirdness of old (that is: pre-3.3)
Python versions on Windows. They were built to use a messy, buggy
hybrid of UCS-2 and UTF-16. Sometimes this got you around problems, or
at least masked them; but it wouldn't be reliable. That's why, in
Python 3.3, all that was fixed :)

> instead either works (when UTF-8 encoding is specified) or converts the
> out-of-range characters to ones it can display (when the encoding isn't
> specified). But now I know what the root of the problem is, I can deal
> with it appropriately (and my curiosity is at least partly satisfied).

Converting out-of-range characters is fairly straightforward, at least
as long as your Python interpreter is correctly built (so, Python 3,
or a Linux build of Python 2).

"".join(c if ord(c) < 65536 else "?" for c in text)

> This has given me a much better understanding of what I need to do in
> order to migrate to Python 3 and add proper support for non-ASCII
> characters, so I'm very grateful for your help!
>

Excellent. Hopefully all this mess is just a transitional state and
you'll get to something that REALLY works, soon!

ChrisA

Subject: Re: Decoding bytes to text strings in Python 2
From: MRAB
Newsgroups: comp.lang.python
Date: Mon, 24 Jun 2024 00:14 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: python@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2
Date: Mon, 24 Jun 2024 01:14:22 +0100
Lines: 31
Message-ID: <mailman.163.1719188251.2909.python-list@python.org>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
<CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
<41260f11-89ae-433d-b44a-26c604b91356@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de N2gAtgBfcf5HYhWP17IuKgja8NO8XzitoaQ6n3FzSSMQ==
Cancel-Lock: sha1:hUPm7Cg6Xwn3t3Fh+Xmz8B1TMXM= sha256:PSist8lJ4Zs5RQbne/gVmuJVojJxi1xJnT7rJKqL9Uo=
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=ZNqKTz/I;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.002
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'looks': 0.02; 'generated':
0.03; 'error:': 0.05; 'tkinter': 0.07; 'angelico': 0.09;
'characters,': 0.09; 'debian': 0.09; 'from:addr:python': 0.09;
'linux': 0.09; 'received:192.168.1.64': 0.09; 'subject:Python':
0.12; 'problem.': 0.15; '(because': 0.16; '*think*': 0.16;
'00:30,': 0.16; '2024': 0.16; '3.8,': 0.16; '[snip]': 0.16;
'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16;
'message-id:@mrabarnett.plus.com': 0.16; 'received:84.93': 0.16;
'received:84.93.230': 0.16; 'received:plus.net': 0.16; 'solved':
0.16; 'using.': 0.16; 'wrote:': 0.16; 'problem': 0.16; 'python':
0.16; 'to:addr:python-list': 0.20; 'skip:_ 10': 0.22; 'install':
0.23; 'code': 0.23; 'idea': 0.24; 'python,': 0.25; 'seems': 0.26;
'jun': 0.26; 'chris': 0.28; 'header:User-Agent:1': 0.30;
"doesn't": 0.32; 'python-list': 0.32; 'received:192.168.1': 0.32;
'but': 0.32; 'windows': 0.34; 'able': 0.34; 'header:In-Reply-
To:1': 0.34; 'trying': 0.35; 'mon,': 0.36; 'system,': 0.36;
"it's": 0.37; 'received:192.168': 0.37; 'way': 0.38; 'least':
0.39; 'handle': 0.39; 'rest': 0.39; 'still': 0.40; 'case.': 0.40;
'should': 0.40; 'above': 0.62; 'true': 0.63; 'email
addr:gmail.com': 0.63; 'once': 0.63; 'range': 0.64; 'look': 0.65;
'back': 0.67; 'that,': 0.67; 'longer': 0.71; "you'll": 0.73;
'article': 0.73; 'operate': 0.75; 'bothered': 0.84; 'characters':
0.84; 'ultimately': 0.84; 'lucas': 0.91; 'oldest': 0.91;
'migrate': 0.93
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1719188063; bh=9ZvvQHqg/9PYyfC/R45iJrBzTqrmt8OxF024n7eV3pI=;
h=Date:Subject:To:References:From:In-Reply-To;
b=ZNqKTz/IuZtv32bEzQnZxt7mSIWmaXMAobFxc5tcPyrpZV7g2rZJCpmlZgBrE+Zf5
FTbuS2hGS25YtK0CzcT9ZqH60SSA6zl5oc6E1S6XGrySIXyU8YNODmocsyUcwM6nYd
9J+Xx45EMyxxX4ytP2cUDl9plsvPHO0LVL5yb1D3bySoWYJDnGhIYqD/GCi+kovSTb
TyVYlqBe2+/E6AhuyuLgk63T45txY/wM1I7MUr6HNma8cdSnLTcdzhZ3QcTw2uhZ+3
1g4vTy3jriK46/p5hesWvF+SNH7O+DXnHWayUjoVbMXdngimaiyvZqZYzzto7DONIS
rFiMUZMz6eZ4w==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=MePPuI/f c=1 sm=1 tr=0 ts=6678ba5f
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=8AHkEIZyAAAA:8 a=pGLkceISAAAA:8 a=XiE0LXHe_q3aitv2Jf0A:9
a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
X-CMAE-Envelope: MS4xfBHfWWSodAw7+GHCjNkK9ZXxA5JsBPAMOCgBG4olhKxQxd0PJ0pyRZ5r7C5WzRHc7Eac9WzSaltIEy2wD/GFLTHBi/9HuFSaTdgDYr2YOCzc+5iUeJh6
p9HfewMOUEHHaVyIfblWZ/Yp6Sdxns+hhYmEdq3CecOwwpJ5/M2xQJErsUThe/i5a+/lGUtfXmHnoXirzN+B/pXqitVQlMWb6zE=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <41260f11-89ae-433d-b44a-26c604b91356@mrabarnett.plus.com>
X-Mailman-Original-References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
<CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
View all headers

On 2024-06-24 00:30, Chris Angelico via Python-list wrote:
> On Mon, 24 Jun 2024 at 08:20, Rayner Lucas via Python-list
> <python-list@python.org> wrote:
>>
>> In article <mailman.159.1718991773.2909.python-list@python.org>,
>> rosuav@gmail.com says...
>> >
>> > If you switch to a Linux system, it should work correctly, and you'll
>> > be able to migrate the rest of the way onto Python 3. Once you achieve
>> > that, you'll be able to operate on Windows or Linux equivalently,
>> > since Python 3 solved this problem. At least, I *think* it will; my
>> > current system has a Python 2 installed, but doesn't have tkinter
>> > (because I never bothered to install it), and it's no longer available
>> > from the upstream Debian repos, so I only tested it in the console.
>> > But the decoding certainly worked.
>>
>> Thank you for the idea of trying it on a Linux system. I did so, and my
>> example code generated the error:
>>
>> _tkinter.TclError: character U+1f40d is above the range (U+0000-U+FFFF)
>> allowed by Tcl
>>
>> So it looks like the problem is ultimately due to a limitation of
>> Tcl/Tk.
> Yep, that seems to be the case. Not sure if that's still true on a
> more recent Python, but it does look like you won't get astral
> characters in tkinter on the one you're using.
>
[snip]
Tkinter in recent versions of Python can handle astral characters, at
least back to Python 3.8, the oldest I have on my Windows PC.

Subject: Re: Decoding bytes to text strings in Python 2
From: Chris Angelico
Newsgroups: comp.lang.python
Date: Mon, 24 Jun 2024 01:43 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: rosuav@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2
Date: Mon, 24 Jun 2024 11:43:28 +1000
Lines: 10
Message-ID: <mailman.164.1719193423.2909.python-list@python.org>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
<CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
<41260f11-89ae-433d-b44a-26c604b91356@mrabarnett.plus.com>
<CAPTjJmonS2x92rnLdqbij+v4B_su-Owp0Zh6qD7zbkUYWUVFnA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de QtLfluD+IXtUrNt3kiSbuwiQ47GgQmkiwLoY3BUHlFtg==
Cancel-Lock: sha1:qMm3lHLGfxNATpIEt76dhryyVJs= sha256:vzD4IUUkR2h0as+4y85m+Zu0ciT7/0NXLzHBKtdTIZ8=
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=F5eThHij;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.014
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'tkinter': 0.07;
'characters,': 0.09; 'subject:Python': 0.12; '2024': 0.16; '3.8,':
0.16; 'chrisa': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris
angelico': 0.16; 'wrote:': 0.16; 'python': 0.16; 'to:addr:python-
list': 0.20; 'thanks!': 0.24; 'jun': 0.26; 'python-list': 0.32;
'message-id:@mail.gmail.com': 0.32; 'but': 0.32; 'windows': 0.34;
"didn't": 0.34; 'header:In-Reply-To:1': 0.34;
'received:google.com': 0.34; 'from:addr:gmail.com': 0.35; 'mon,':
0.36; 'least': 0.39; 'handle': 0.39; 'want': 0.40; 'back': 0.67;
'oldest': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1719193421; x=1719798221; darn=python.org;
h=to:subject:message-id:date:from:in-reply-to:references:mime-version
:from:to:cc:subject:date:message-id:reply-to;
bh=Z7Q+09avPpcozZk2Ghh8BsVoK0uJW0TwNP9K68Q67pw=;
b=F5eThHij8Ozl3DV+5jT1C9aeX6hcjCUevRLHKh3yDhFi4yiVuiUNlqC3svDs1EkBJp
8AGtfXfP3/CejJbIgEi1/VZ092CgLp4UHJoJz1TcT8V84FCy+aMrFhpqZXciSEgIXICc
JLL3Rj7x+DZRqYAtNyushQUwiLYT7HaBsqmGii/r2bn167RR5ratuDdOTehkY6OMqcve
VPOo0vnXghohKJOlwKlRkY9nQmVhUppV1Knv1mxdRK4Rqhorq1lnqxgU4rF04LHC+6ll
0LLeOOLam3SC5Vq9NxTyKMQipgv7NpB/UWmAzJ6dbrQgAQfv04L2FVxiCeqkCDWLYuaX
Mi2g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1719193421; x=1719798221;
h=to:subject:message-id:date:from:in-reply-to:references:mime-version
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=Z7Q+09avPpcozZk2Ghh8BsVoK0uJW0TwNP9K68Q67pw=;
b=jj+rMHMGkzrz2uZUMkQEL3fR9oXOqivC6zHFPYefUwz/ugus/K7mwT0Fgwpc42CNTD
XtdSXP0OmbZROMnn+3fbBa9yiOzEoumOD66EkESsA6ri8G6zjFV+AoVyk2o5mkz8IyAw
PE87UwcVp+wNW1sGN/mIpxK7LEOihBOzQFdAfwT/aC8duqGJtEQnXA6YoAuUbCs1h015
aEKgbTn57pJdmIoInWzAUnqwLdG/dRJSOeyZueA3ehvCVEMBbUHSsf0DWUksLqYPRvE/
fx1yaxMmZSWH3EJEjxKTjdWJy1BCLM1g62iKn/Ej8ULBGiF/800pLvkjEqsmhrMg00vI
XNPw==
X-Gm-Message-State: AOJu0YxF3NGdIBdsxEyZanpBmZDH2edT6cruHviajd/eAp6yE1JwkeWP
FEp2TXnonkNX0kWYtXG7M44qsc9lQSGrRWb6cbbtVzsqOkteobkLvqHKQxELfNB5FDxsO5kYd07
p29RxXHaE5CHpQSJxUiOf71UZo4dApQ==
X-Google-Smtp-Source: AGHT+IFYkpaEo37cK1a+55N8DJ7vm+h95BQ4EKRcs4DfEDFrQPDShDZlTBw3pvxKYe2zrKScfJmSkG3DyG3fKvIqm74=
X-Received: by 2002:ac2:538a:0:b0:51a:f689:b4df with SMTP id
2adb3069b0e04-52ce185d027mr1911068e87.44.1719193420313; Sun, 23 Jun 2024
18:43:40 -0700 (PDT)
In-Reply-To: <41260f11-89ae-433d-b44a-26c604b91356@mrabarnett.plus.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmonS2x92rnLdqbij+v4B_su-Owp0Zh6qD7zbkUYWUVFnA@mail.gmail.com>
X-Mailman-Original-References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
<CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
<41260f11-89ae-433d-b44a-26c604b91356@mrabarnett.plus.com>
View all headers

On Mon, 24 Jun 2024 at 10:18, MRAB via Python-list
<python-list@python.org> wrote:
> Tkinter in recent versions of Python can handle astral characters, at
> least back to Python 3.8, the oldest I have on my Windows PC.

Good to know, thanks! I was hoping that would be the case, but I don't
have a Windows system to check on, so I didn't want to speak without
facts.

ChrisA

Subject: Tkinter and astral characters (was: Decoding bytes to text strings in Python 2)
From: Peter J. Holzer
Newsgroups: comp.lang.python
Date: Mon, 24 Jun 2024 11:03 UTC
References: 1 2 3 4 5 6 7
Attachments: signature.asc (application/pgp-signature)
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: hjp-python@hjp.at (Peter J. Holzer)
Newsgroups: comp.lang.python
Subject: Tkinter and astral characters (was: Decoding bytes to text strings
in Python 2)
Date: Mon, 24 Jun 2024 13:03:45 +0200
Lines: 50
Message-ID: <mailman.166.1719227035.2909.python-list@python.org>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
<CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
<41260f11-89ae-433d-b44a-26c604b91356@mrabarnett.plus.com>
<20240624110345.k5ojn4j5tmejwu6k@hjp.at>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
protocol="application/pgp-signature"; boundary="rpmpesvbppsggghb"
X-Trace: news.uni-berlin.de B8B/AePBEnq6yiedA0LYxwXltNI2svf3BhT2VKbshEag==
Cancel-Lock: sha1:XdJSb0cOcMOS8eCa6s4wqYoZI1o= sha256:MFA7xBoY4C1xr50wWoGiS3oN2YQznfhGObv/V7z9Up4=
Return-Path: <hjp-python@hjp.at>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'content-
type:multipart/signed': 0.05; '"hello': 0.07; 'tkinter': 0.07;
'(python': 0.09; 'characters,': 0.09; 'content-type:application
/pgp-signature': 0.09; 'filename:fname piece:asc': 0.09;
'filename:fname piece:signature': 0.09;
'filename:fname:signature.asc': 0.09; 'skip:x 10': 0.09; 'ubuntu':
0.09; 'subject:Python': 0.12; '"creative': 0.16; '3.8,': 0.16;
'__/': 0.16; 'challenge!"': 0.16; 'displayed': 0.16; 'from:addr
:hjp-python': 0.16; 'from:addr:hjp.at': 0.16; 'from:name:peter j.
holzer': 0.16; 'hjp@hjp.at': 0.16; 'holzer': 0.16; 'reality.':
0.16; 'stross,': 0.16; 'subject:characters': 0.16; 'url-
ip:212.17.106/24': 0.16; 'url-ip:212.17/16': 0.16; 'url:hjp':
0.16; '|_|_)': 0.16; 'wrote:': 0.16; 'python': 0.16; 'instead':
0.17; 'to:addr:python-list': 0.20; 'tried': 0.26; 'suspect': 0.26;
"isn't": 0.27; 'sense': 0.28; 'default': 0.31; "doesn't": 0.32;
'python-list': 0.32; 'but': 0.32; 'windows': 0.34; 'header:In-
Reply-To:1': 0.34; 'display': 0.36; 'error,': 0.38; 'least': 0.39;
'enough': 0.39; 'handle': 0.39; 'included': 0.61; 'received:212':
0.62; 'received:userid': 0.66; 'shows': 0.67; 'back': 0.67;
'smart': 0.67; 'url-ip:212/8': 0.69; 'received:at': 0.84;
'subject: \n ': 0.84; 'warning': 0.84; 'oldest': 0.91; 'fall':
0.95
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <41260f11-89ae-433d-b44a-26c604b91356@mrabarnett.plus.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <20240624110345.k5ojn4j5tmejwu6k@hjp.at>
X-Mailman-Original-References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
<CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com>
<mailman.159.1718991773.2909.python-list@python.org>
<MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org>
<CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com>
<41260f11-89ae-433d-b44a-26c604b91356@mrabarnett.plus.com>
View all headers

On 2024-06-24 01:14:22 +0100, MRAB via Python-list wrote:
> Tkinter in recent versions of Python can handle astral characters, at least
> back to Python 3.8, the oldest I have on my Windows PC.

I just tried modifying
https://docs.python.org/3/library/tkinter.html#a-hello-world-program
to display "Hello World \N{ROCKET}" instead (Python 3.10.12 as included
with Ubuntu 22.04). I don't get a warning or error, but the emoji isn't
displayed either.

I suspect that the default font doesn't include emojis and Tk isn't
smart enough to fall back to a different font (unlike xfce4-terminal
which shows the emoji just fine).

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Attachments: signature.asc (application/pgp-signature)
1

rocksolid light 0.9.8
clearnet tor