Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #160: non-redundant fan failure


comp / comp.lang.python / Script stops running with no error

SubjectAuthor
* Script stops running with no errorDaniel
+- Re: Script stops running with no errorThomas Passin
+- Re: Script stops running with no errordn
+- Re: Script stops running with no errorrbowman
`- Re: Script stops running with no errorThomas Passin

1
Subject: Script stops running with no error
From: Daniel
Newsgroups: comp.lang.python
Organization: Newshosting.com - Highest quality at a great price! www.newshosting.com
Date: Wed, 28 Aug 2024 21:09 UTC
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx09.iad.POSTED!not-for-mail
From: me@sc1f1dan.com (Daniel)
Newsgroups: comp.lang.python
Subject: Script stops running with no error
Message-ID: <87r0a8xskb.fsf@rpi3>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:7tqO5azzzpWS/Xeq6LpazQHWXjA=
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Lines: 127
X-Complaints-To: abuse(at)newshosting.com
NNTP-Posting-Date: Wed, 28 Aug 2024 21:09:58 UTC
Organization: Newshosting.com - Highest quality at a great price! www.newshosting.com
Date: Wed, 28 Aug 2024 22:09:56 +0100
X-Received-Bytes: 4857
View all headers

As you all have seen on my intro post, I am in a project using Python
(which I'm learning as I go) using the wikimedia API to pull data from
wiktionary.org. I want to parse the json and output, for now, just the
definition of the word.

Wiktionary is wikimedia's dictionary.

My requirements for v1

Query the api for the definition for table (in the python script).
Pull the proper json
Parse the json
output the definition only

What's happening?

I run the script and, maybe I don't know shit from shinola, but it
appears I composed it properly. I wrote the script to do the above.
The wiktionary json file denotes a list with this character # and
sublists as ## but numbers them

On Wiktionary, the definitions are denoted like:

1. blablabla
1. blablabla
2. blablablablabla
2. balbalbla
3. blablabla
1. blablabla

I wrote my script to alter it so that the sublist are letters

1. blablabla
a. blablabla
b. blablabla
2. blablabla and so on
/snip

At this point, the script stops after it assesses the first line_counter
and sub_counter. The code is below, please tell me which stupid mistake
I made (I'm sure it's simple).

Am I making a bad approach? Is there an easier method of parsing json
than the way I'm doing it? I'm all ears.

Be kind, i'm really new at python. Environment is emacs.

import requests
import re

search_url = 'https://api.wikimedia.org/core/v1/wiktionary/en/search/page'
search_query = 'table'
parameters = {'q': search_query}

response = requests.get(search_url, params=parameters)
data = response.json()

page_id = None

if 'pages' in data:
for page in data['pages']:
title = page.get('title', '').lower()
if title == search_query.lower():
page_id = page.get('id')
break

if page_id:
content_url =
f'https://api.wikimedia.org/core/v1/wiktionary/en/page/
{search_query}'
response = requests.get(content_url)
page_data = response.json()
if 'source' in page_data:
content = page_data['source']
cases = {'noun': r'\{en-noun\}(.*?)(?=\{|\Z)',
'verb': r'\{en-verb\}(.*?)(?=\{|\Z)',
'adjective': r'\{en-adj\}(.*?)(?=\{|\Z)',
'adverb': r'\{en-adv\}(.*?)(?=\{|\Z)',
'preposition': r'\{en-prep\}(.*?)(?=\{|\Z)',
'conjunction': r'\{en-con\}(.*?)(?=\{|\Z)',
'interjection': r'\{en-intj\}(.*?)(?=\{|\Z)',
'determiner': r'\{en-det\}(.*?)(?=\{|\Z)',
'pronoun': r'\{en-pron\}(.*?)(?=\{|\Z)'
#make sure there aren't more word types
}

def clean_definition(text):
text = re.sub(r'\[\[(.*?)\]\]', r'\1', text)
text = text.lstrip('#').strip()
return text

print(f"\n*** Definition for {search_query} ***")
for word_type, pattern in cases.items():
match = re.search(pattern, content, re.DOTALL)
if match:
lines = [line.strip() for line in
match.group(1).split('\n')
if line.strip()]
definition = []
main_counter = 0
sub_counter = 'a'

for line in lines:
if line.startswith('##*') or line.startswith('##:'):
continue

if line.startswith('# ') or line.startswith('#\t'):
main_counter += 1
sub_counter = 'a'
cleaned_line = clean_definition(line)
definition.append(f"{main_counter}. {cleaned_line}")
elif line.startswith('##'):
cleaned_line = clean_definition(line)
definition.append(f"   {sub_counter}. {cleaned_line}")
sub_counter = chr(ord(sub_counter) + 1)

if definition:
print(f"\n{word_type.capitalize()}\n")
print("\n".join(definition))
break
else:
print("try again beotch")

Thanks,

Daniel

Subject: Re: Script stops running with no error
From: Thomas Passin
Newsgroups: comp.lang.python
Date: Wed, 28 Aug 2024 22:32 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: list1@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Script stops running with no error
Date: Wed, 28 Aug 2024 18:32:16 -0400
Lines: 139
Message-ID: <mailman.13.1724884345.2917.python-list@python.org>
References: <87r0a8xskb.fsf@rpi3>
<bb82f035-45dc-4c6f-aaec-b1e59ce825f7@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de LYED85mJAHTTBAJPyfYvVw5+LGysdn3CnyfL/7dCuZAg==
Cancel-Lock: sha1:sC2Jd4DACqLIXh6frAUwd7PvCN8= sha256:qwL5WuwwaSWBnNBDD10HjryqzLvVxY5xHWv9CTO8ASM=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=nMbYpGhh;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.003
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '(which': 0.04; 'def':
0.04; 'containing': 0.05; 'subject:error': 0.07; 'python.': 0.08;
'approach?': 0.09; 'elif': 0.09; 'else:': 0.09; 'items.': 0.09;
'json': 0.09; 'parse': 0.09; 'url:search': 0.09; 'import': 0.15;
'that.': 0.15; "(i'm": 0.16; 'are.': 0.16; 'assesses': 0.16;
'constant': 0.16; 'definitions': 0.16; 'dictionary.': 0.16;
'intro': 0.16; 'kind,': 0.16; 'parsing': 0.16; 'properly.': 0.16;
'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'structure.': 0.16;
'subject:running': 0.16; 'text)': 0.16; 'wikimedia': 0.16;
'wrote:': 0.16; 'python': 0.16; 'api': 0.17; 'pull': 0.17;
'probably': 0.17; "aren't": 0.19; 'it?': 0.19; 'pm,': 0.19;
'to:addr:python-list': 0.20; 'input': 0.21; 'maybe': 0.22;
"what's": 0.22; 'code': 0.23; 'lines': 0.23; 'skip:p 30': 0.23;
'run': 0.23; '(and': 0.25; 'seems': 0.26; 'pattern': 0.26;
"isn't": 0.27; 'expect': 0.28; 'output': 0.28; 'requests': 0.28;
'environment': 0.29; 'header:User-Agent:1': 0.30; 'think': 0.32;
'point,': 0.32; 'python-list': 0.32; 'received:10.0': 0.32;
'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'structure': 0.32; 'but':
0.32; "i'm": 0.33; 'there': 0.33; 'script': 0.33; 'header:In-
Reply-To:1': 0.34; 'complex': 0.35; 'cases': 0.36; 'those': 0.36;
"skip:' 10": 0.37; 'main': 0.37; 'really': 0.37; 'using': 0.37;
"it's": 0.37; 'file': 0.38; 'way': 0.38; 'two': 0.39; 'text':
0.39; 'list': 0.39; 'use': 0.39; 'break': 0.39; 'on.': 0.39;
'table': 0.39; 'wrote': 0.39; 'advantage': 0.40; 'appears': 0.40;
'match': 0.40; 'url:page': 0.40; 'want': 0.40; 'should': 0.40;
'tell': 0.60; 'method': 0.61; 'seen': 0.62; 'skip:m 20': 0.63;
'skip:b 10': 0.63; '8bit%:17': 0.63; 'skip:r 20': 0.64;
'definition': 0.64; 'produce': 0.65; 'skip:t 20': 0.66; 'now,':
0.67; 'numbers': 0.67; 'types': 0.67; 'bad': 0.67;
'header:Received:6': 0.67; 'received:64': 0.67; 'items': 0.68;
'and,': 0.69; 'content,': 0.69; 'repeatedly': 0.69; 'content':
0.72; 'url:api': 0.84; 'big,': 0.84; 'composed': 0.84; 'inherent':
0.84; 'stupid': 0.84; "wikimedia's": 0.84; 'skip:d 30': 0.86;
'sub': 0.91; 'url:wikimedia': 0.91; 'word.': 0.91
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1724884336; a=rsa-sha256;
cv=none;
b=7oaaHRbgNxzCBgKXVgMCq4533edi27GKLiSHfY85Cza00+BDDZocoLRQltc/a4gqQMlFsY
G6r9C/xYfl43+5OGqhZfMsgc9eQe8wk/r5lCcaDQbpX+/LH70OBNsQdjegfamrtdKgcUOp
CWMQD616kSH8HvaWmCHcSi3QNxCORt35XNA1I3hfVY5jvbTUc9fUS4Ti2fOka4/hZMA/2j
UnPdzEpdcfJz1E4NII1fb897456zRYmx5F/0ED2x/HEiCL7gqoqPXn3YQcAWL+gYKUgG+U
I4xXs7o5zTEkq+bFblSh08WB1+eeNYycOIowzMIM/vTOhSxBFMIlaI3ghcMA8w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1724884336;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=rq3NGoC1JoNU22pzmHLBkX0BXD7/wqrelhmGCvI8asA=;
b=VK07ogHh9AbE0rBaxL15z7f9zTwGUNN+RV00JdFrHMteNAPSxa7hqFEcfx9dzTt93tk9hn
C/TMYw2a7kVwlU3+3Y+NUP8g+ZtXY+wUxbeCcaNb1UZDVA8iYukm8PcJ5nPcH4cOziVEfR
mjRFtZwy0+2tLg5DBD7PjFjaT86pykyqLmkcR1i71Zx6T4ggWCwhN79ApZbdVDU5Q0m0oK
7FTXqOeoAir1aX1KQfYCSPGND1jjFZXPX9gC6GudPAjzSml8K8iLKFdnUT23g4gLGqxgGq
Okn20+lu0ueQ1zjWHgKcVlQ1PJCkOwYDsCeNb/CI2MCX5KnNpLnAwAahmxpzQA==
ARC-Authentication-Results: i=1; rspamd-cf944896d-8r9n5;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Cold-Reaction: 711934b90af91c7b_1724884336539_184705217
X-MC-Loop-Signature: 1724884336539:3638118120
X-MC-Ingress-Time: 1724884336539
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1724884336;
bh=rq3NGoC1JoNU22pzmHLBkX0BXD7/wqrelhmGCvI8asA=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=nMbYpGhhxdqBhKGZF8tdOIC/k8ys9wB/BSLSmeGTp6o02B6bIPTCXb5zv5yvsmqyA
R8iNJN/IZSBHm0YepJJq9wENsYP5h60LuNOtjosovKlxnhZScjM65fTx19EXyUFztG
kzmiBibwXKgIf9VXI5covK1hGvyoLptErCj2B8WwiVzRpEWYdkSWScsjoxaFTE91Zh
sWetVKCUlNMvG4h6kMVLH/U9uDSEmtkD65KzjjsEGVqrY+N2JSU9hfZf/tfpYZzYB3
52J/bUFkLdG5Rb/AxqXZFPnZ5lycLOV6z975iKNTdxDRs8j+QW6VpU6pvlUAahTeKh
gLcBZMW2pccrg==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <87r0a8xskb.fsf@rpi3>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <bb82f035-45dc-4c6f-aaec-b1e59ce825f7@tompassin.net>
X-Mailman-Original-References: <87r0a8xskb.fsf@rpi3>
View all headers

On 8/28/2024 5:09 PM, Daniel via Python-list wrote:
> As you all have seen on my intro post, I am in a project using Python
> (which I'm learning as I go) using the wikimedia API to pull data from
> wiktionary.org. I want to parse the json and output, for now, just the
> definition of the word.
>
> Wiktionary is wikimedia's dictionary.
>
> My requirements for v1
>
> Query the api for the definition for table (in the python script).
> Pull the proper json
> Parse the json
> output the definition only
>
> What's happening?
>
> I run the script and, maybe I don't know shit from shinola, but it
> appears I composed it properly. I wrote the script to do the above.
> The wiktionary json file denotes a list with this character # and
> sublists as ## but numbers them
>
> On Wiktionary, the definitions are denoted like:
>
> 1. blablabla
> 1. blablabla
> 2. blablablablabla
> 2. balbalbla
> 3. blablabla
> 1. blablabla
>
>
> I wrote my script to alter it so that the sublist are letters
>
> 1. blablabla
> a. blablabla
> b. blablabla
> 2. blablabla and so on
> /snip
>
> At this point, the script stops after it assesses the first line_counter
> and sub_counter. The code is below, please tell me which stupid mistake
> I made (I'm sure it's simple).
>
> Am I making a bad approach? Is there an easier method of parsing json
> than the way I'm doing it? I'm all ears.
>
> Be kind, i'm really new at python. Environment is emacs.
>
> import requests
> import re
>
> search_url = 'https://api.wikimedia.org/core/v1/wiktionary/en/search/page'
> search_query = 'table'
> parameters = {'q': search_query}
>
> response = requests.get(search_url, params=parameters)
> data = response.json()
>
> page_id = None
>
> if 'pages' in data:
> for page in data['pages']:
> title = page.get('title', '').lower()
> if title == search_query.lower():
> page_id = page.get('id')
> break
>
> if page_id:
> content_url =
> f'https://api.wikimedia.org/core/v1/wiktionary/en/page/
> {search_query}'
> response = requests.get(content_url)
> page_data = response.json()
> if 'source' in page_data:
> content = page_data['source']
> cases = {'noun': r'\{en-noun\}(.*?)(?=\{|\Z)',
> 'verb': r'\{en-verb\}(.*?)(?=\{|\Z)',
> 'adjective': r'\{en-adj\}(.*?)(?=\{|\Z)',
> 'adverb': r'\{en-adv\}(.*?)(?=\{|\Z)',
> 'preposition': r'\{en-prep\}(.*?)(?=\{|\Z)',
> 'conjunction': r'\{en-con\}(.*?)(?=\{|\Z)',
> 'interjection': r'\{en-intj\}(.*?)(?=\{|\Z)',
> 'determiner': r'\{en-det\}(.*?)(?=\{|\Z)',
> 'pronoun': r'\{en-pron\}(.*?)(?=\{|\Z)'
> #make sure there aren't more word types
> }
>
> def clean_definition(text):
> text = re.sub(r'\[\[(.*?)\]\]', r'\1', text)
> text = text.lstrip('#').strip()
> return text
>
> print(f"\n*** Definition for {search_query} ***")
> for word_type, pattern in cases.items():
> match = re.search(pattern, content, re.DOTALL)
> if match:
> lines = [line.strip() for line in
> match.group(1).split('\n')
> if line.strip()]
> definition = []
> main_counter = 0
> sub_counter = 'a'
>
> for line in lines:
> if line.startswith('##*') or line.startswith('##:'):
> continue
>
> if line.startswith('# ') or line.startswith('#\t'):
> main_counter += 1
> sub_counter = 'a'
> cleaned_line = clean_definition(line)
> definition.append(f"{main_counter}. {cleaned_line}")
> elif line.startswith('##'):
> cleaned_line = clean_definition(line)
> definition.append(f"   {sub_counter}. {cleaned_line}")
> sub_counter = chr(ord(sub_counter) + 1)
>
> if definition:
> print(f"\n{word_type.capitalize()}\n")
> print("\n".join(definition))
> break
> else:
> print("try again beotch")

You need to check at each part of the code to see if you are getting or
producing what you think you are. You also should create a text
constant containing the JSON input you expect to get. Make sure you can
process that. Start simple - one main item. Then two main items. Then
two main items with one sub item. And so on.

I'm not sure what you want to produce in the end but this seems awfully
complex to be starting with. Also you aren't taking advantage of the
structure inherent in the JSON. If the data response isn't too big, you
can probably take it as is and use the Python JSON reader to produce a
Python data structure. It should be much easier (and faster) to process
the data structure than to repeatedly scan all those lines of data with
regexes.

Subject: Re: Script stops running with no error
From: dn
Newsgroups: comp.lang.python
Organization: DWM
Date: Thu, 29 Aug 2024 00:07 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: PythonList@DancesWithMice.info (dn)
Newsgroups: comp.lang.python
Subject: Re: Script stops running with no error
Date: Thu, 29 Aug 2024 12:07:07 +1200
Organization: DWM
Lines: 136
Message-ID: <mailman.14.1724890041.2917.python-list@python.org>
References: <87r0a8xskb.fsf@rpi3>
<bb82f035-45dc-4c6f-aaec-b1e59ce825f7@tompassin.net>
<0fec5175-e2a2-407a-9e09-c6901617b75c@DancesWithMice.info>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de EjzuFUrnAxYe6IofivItSQPkxvnV39+arnYx0R0eEqmg==
Cancel-Lock: sha1:85XVYQws5nLDRmkuXwNI6MAmGmo= sha256:Fq1f7XMbcD0QD2DxHt7yPNiua53olQINvUcWQOdH/sA=
Return-Path: <PythonList@DancesWithMice.info>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=danceswithmice.info header.i=@danceswithmice.info
header.b=CK1BByx4; dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.012
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; '(which': 0.04;
'containing': 0.05; '(to': 0.07; 'matches': 0.07; 'subject:error':
0.07; '=dn': 0.09; 'compute': 0.09;
'from:addr:danceswithmice.info': 0.09; 'from:addr:pythonlist':
0.09; 'json': 0.09; 'ok,': 0.09; 'parse': 0.09; 'question:': 0.09;
'received:192.168.1.64': 0.09; 'requests.': 0.09; 'steps': 0.11;
'(more': 0.16; 'are.\xc2\xa0': 0.16; 'computation': 0.16;
'computers': 0.16; 'constant': 0.16; 'dictionary.': 0.16; 'hint':
0.16; 'intro': 0.16; 'message-id:@DancesWithMice.info': 0.16;
'outputting': 0.16; 'pertinent': 0.16; 'printer': 0.16; 'pytest,':
0.16; 'received:51.254': 0.16; 'received:51.254.211': 0.16;
'received:51.254.211.219': 0.16; 'received:cloud': 0.16;
'received:rangi.cloud': 0.16; 'script.': 0.16; 'splitting': 0.16;
'subject:running': 0.16; 'wikimedia': 0.16; 'wrote:': 0.16;
'problem': 0.16; 'python': 0.16; 'api': 0.17; 'pull': 0.17;
'probably': 0.17; "aren't": 0.19; 'pm,': 0.19; 'to:addr:python-
list': 0.20; 'input': 0.21; 'code': 0.23; 'lines': 0.23;
'url:wiki': 0.23; 'idea': 0.24; '(and': 0.25; 'discussion': 0.25;
'seems': 0.26; 'again,': 0.26; 'library': 0.26; "isn't": 0.27;
'leave': 0.27; 'function': 0.27; 'expect': 0.28; 'output': 0.28;
'settings': 0.28; 'example,': 0.28; 'ideas': 0.28; 'header:User-
Agent:1': 0.30; 'seem': 0.31; 'header:Organization:1': 0.31;
'think': 0.32; 'execution': 0.32; 'python-list': 0.32;
'structure': 0.32; 'received:192.168.1': 0.32; 'but': 0.32; "i'm":
0.33; 'there': 0.33; 'able': 0.34; 'header:In-Reply-To:1': 0.34;
'complex': 0.35; 'functions': 0.36; 'those': 0.36; "skip:' 10":
0.37; 'main': 0.37; 'using': 0.37; 'received:192.168': 0.37;
'way': 0.38; 'could': 0.38; 'means': 0.38; 'put': 0.38; 'read':
0.38; 'two': 0.39; 'text': 0.39; 'enough': 0.39; 'use': 0.39;
'(with': 0.39; 'on.': 0.39; 'seeing': 0.39; 'skip:u 20': 0.39;
'table': 0.39; 'advantage': 0.40; 'data.': 0.40; 'learn': 0.40;
'provide': 0.60; 'likely': 0.61; 'sample': 0.61; 'seen': 0.62;
'come': 0.62; 'once': 0.63; 'our': 0.64; 'research': 0.64; 'skip:r
20': 0.64; 'definition': 0.64; 'received:51': 0.64; 'your': 0.64;
'top': 0.65; 'produce': 0.65; 'look': 0.65; 'improve': 0.66;
'earlier': 0.67; 'now,': 0.67; 'back': 0.67; 'outside': 0.67;
'time,': 0.67; 'that,': 0.67; 'items': 0.68; 'further': 0.69;
'functional': 0.69; 'repeatedly': 0.69; 'times': 0.69; 'above,':
0.70; 'url-ip:208.80.154/24': 0.70; 'url-ip:208.80/16': 0.70;
'url-ip:208/8': 0.71; 'process,': 0.75; 'finds': 0.76; 'discuss':
0.78; 'returned': 0.81; 'url:api': 0.84; 'ask,': 0.84; 'bid':
0.84; 'big,': 0.84; 'expand,': 0.84; 'inherent': 0.84;
'organised': 0.84; 'thus,': 0.84; "wikimedia's": 0.84; 'you?':
0.88; 'sub': 0.91; 'url:wikimedia': 0.91; 'word.': 0.91
DKIM-Filter: OpenDKIM Filter v2.11.0 vps.rangi.cloud 8346976A
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=danceswithmice.info;
s=staff; t=1724890033;
bh=//F2FXiRd529WWMhyePr8TMbdF2BYckQbEsTzKP/pvs=;
h=Date:From:Subject:To:References:In-Reply-To:From;
b=CK1BByx4uc1B1MPbs7oQcRAZqbFnItZwrHxioPXXGtEQBrRFQ58eFiIDqmYtUH1SS
MaLY7yclWdZFlcAb+o3fYtDj7C4bdmWOnEMmyyEKjD2wsUx+4k5W3vntQqgJTeif0f
R+5clzz8sK4Yz4AQDPwglYcLGrPyPD8j2782UhRz07JD6/xTimx7Jds+3coTm8tM0z
97q1802bkEwP8xkr31LNApRkeD2/NTFf2QvY2neiNnu9DXVG1rKqLcLSv5H5AKK7Vv
2wAwIdY8Quk1TWg5orYPQfqSeIrYxyL8wfEB16artFxd4EK/0dr7WV8uH2jTrKp1zW
X84HbcIoaobPA==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <bb82f035-45dc-4c6f-aaec-b1e59ce825f7@tompassin.net>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <0fec5175-e2a2-407a-9e09-c6901617b75c@DancesWithMice.info>
X-Mailman-Original-References: <87r0a8xskb.fsf@rpi3>
<bb82f035-45dc-4c6f-aaec-b1e59ce825f7@tompassin.net>
View all headers

On 29/08/24 10:32, Thomas Passin via Python-list wrote:
> On 8/28/2024 5:09 PM, Daniel via Python-list wrote:
>> As you all have seen on my intro post, I am in a project using Python
>> (which I'm learning as I go) using the wikimedia API to pull data from
>> wiktionary.org. I want to parse the json and output, for now, just the
>> definition of the word.
>>
>> Wiktionary is wikimedia's dictionary.
>>
>> My requirements for v1
>>
>> Query the api for the definition for table (in the python script).
>> Pull the proper json
>> Parse the json
>> output the definition only

> You need to check at each part of the code to see if you are getting or
> producing what you think you are.  You also should create a text
> constant containing the JSON input you expect to get.  Make sure you can
> process that.  Start simple - one main item.  Then two main items.  Then
> two main items with one sub item.  And so on.
>
> I'm not sure what you want to produce in the end but this seems awfully
> complex to be starting with.  Also you aren't taking advantage of the
> structure inherent in the JSON.  If the data response isn't too big, you
> can probably take it as is and use the Python JSON reader to produce a
> Python data structure.  It should be much easier (and faster) to process
> the data structure than to repeatedly scan all those lines of data with
> regexes.

Good effort so far!

Further to @Thomas: the code does seem to be taking the long way around!
How can we illustrate that, and improve life?

The Wiktionary docs at https://developer.wikimedia.org/use-content/
discuss how to use their "Developer Portal". Worth reading!

As part of the above, we find the "API:Data formats" page
(https://www.mediawiki.org/wiki/API:Data_formats) which offers a simple
example (more simple than your objectives):

api.php?action=query&titles=Main%20page&format=json

which produces:

{
"query": {
"pages": {
"217225": {
"pageid": 217225,
"ns": 0,
"title": "Main page"
}
}
}
}

Does this look like a Python dict[ionary's] output to you?

It is, (more discussion at the web.ref)
- but it is wrapped into a JSON payload.

There are various ways of dealing with JSON-formatted data. You're
already using requests. Perhaps leave such research until later.

So, as soon as "page_data" is realised from "response", print() it (per
above: make sure you're actually seeing what you're expecting to see).
Computers have this literal habit of doing what we ask, not what we want!

PS the pprint/pretty printer library offers a neater way of outputting a
"nested" data-structure (https://docs.python.org/3/library/pprint.html).

Thereafter, make as much use of the returned dict/list structure as can.
At each stage of the 'drilling-down' process, again, print() it (to make
sure ...)

In this way the code will step-through the various 'layers' of
data-organisation. That observation and stepping-through of 'layers' is
a hint that the code should (probably) also be organised by 'layer'! For
example, the first for-loop finds a page which matches the search-key.
This could be abstracted into a (well-named) function.

Thus, you can write a test-harness which provides the function with some
sample input (which you know from earlier print-outs!) and can ensure
(with yet another print()) that the returned-result is as-expected!

NB the test-data and check-print() should be outside the function.
Please take these steps as-read or as 'rules'. Once your skills expand,
you will likely become ready to learn about unit-testing, pytest, etc.
At which time, such ideas will 'fall into place'.

BTW/whilst that 'unit' is in-focus: how many times will the current code
compute search_query.lower()? How many times (per function call) will
"search_query" be any different from previous calls? So, should that
computation be elsewhere?
(won't make much difference to execution time, but a coding-skill:
consider whether to leave computation until the result is actually
needed (lazy-evaluation), or if early-computation will save unnecessary
repeated-computation)

Similarly, 'lift' constants such as "cases" out of (what will become)
functions and put them towards the top of the script. This means that
all such 'definition' and 'configuration' settings will be found
together in one easy-to-find location AND makes the functional code
easier to read.

Now, back to the question: where is the problem arising? Do you know or
do you only know that what comes-out at the end is
unattractive/unacceptable?

The idea of splitting the code into functions (or "units") is not only
that you could test each and thereby narrow-down the location of the
problem (and so that we don't have to read so much code in a bid to
help) but that when you do ask for assistance you will be able to
provide only the pertinent code AND some sample input-data with
expected-results!
(although, if all our dreams come true, you will answer your own question!)

OK, is that enough by way of coding-tactics (not to mention the
web-research) to keep you on-track for a while?

--
Regards,
=dn

Subject: Re: Script stops running with no error
From: rbowman
Newsgroups: comp.lang.python
Date: Thu, 29 Aug 2024 01:33 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: bowman@montana.com (rbowman)
Newsgroups: comp.lang.python
Subject: Re: Script stops running with no error
Date: 29 Aug 2024 01:33:33 GMT
Lines: 55
Message-ID: <lja1fdF7i45U1@mid.individual.net>
References: <87r0a8xskb.fsf@rpi3>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net fsZN/tirJ3W/XY/J++WQGgG/ZIt+5peTGbh3PxACCw5wmXFQ4c
Cancel-Lock: sha1:vwYLXZ/jD9/Y9PmcR/OSKSTXmyc= sha256:9A4SMFI+ZCRG6Vrh/fMzl/L9xhOzTEgEm9igvXgwIdA=
User-Agent: Pan/0.149 (Bellevue; 4c157ba)
View all headers

On Wed, 28 Aug 2024 22:09:56 +0100, Daniel wrote:

> if definition:
> print(f"\n{word_type.capitalize()}\n")
> print("\n".join(definition))
> break

I don't know if that was intended but the 'break' kicks you out of

for word_type, pattern in cases.items():

I added a little debugging to show the cases iteration and commented out
the break. 'noun' has five lines and appears to be correct. 'verb' has
two lines, neither of which match the if/else. The others aren't in the
return from https://api.wikimedia.org/core/v1/wiktionary/en/page/table.

I have to admit I sometimes miss C where I can bounce between curlies.

Output:

python wiki.py

*** Definition for table ***

word_type noun pattern: \{en-noun\}(.*?)(?=\{|\Z)
line }
line # Furniture with a top surface to accommodate a variety of uses.
line ## An item of [[furniture]] with a [[flat]] [[top]] [[surface]]
raised above the ground, usually on one or more legs.
line ##: ''Set that dish on the '''table''' over there, please.''
line ##*

Noun

1. Furniture with a top surface to accommodate a variety of uses.
   a. An item of furniture with a flat top surface raised above the
ground, usually on one or more legs.

word_type verb pattern: \{en-verb\}(.*?)(?=\{|\Z)
line }
line #

word_type adjective pattern: \{en-adj\}(.*?)(?=\{|\Z)

word_type adverb pattern: \{en-adv\}(.*?)(?=\{|\Z)

word_type preposition pattern: \{en-prep\}(.*?)(?=\{|\Z)

word_type conjunction pattern: \{en-con\}(.*?)(?=\{|\Z)

word_type interjection pattern: \{en-intj\}(.*?)(?=\{|\Z)

word_type determiner pattern: \{en-det\}(.*?)(?=\{|\Z)

word_type pronoun pattern: \{en-pron\}(.*?)(?=\{|\Z)

Subject: Re: Script stops running with no error
From: Thomas Passin
Newsgroups: comp.lang.python
Date: Thu, 29 Aug 2024 02:58 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: list1@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Script stops running with no error
Date: Wed, 28 Aug 2024 22:58:02 -0400
Lines: 170
Message-ID: <mailman.15.1724900293.2917.python-list@python.org>
References: <87r0a8xskb.fsf@rpi3>
<bb82f035-45dc-4c6f-aaec-b1e59ce825f7@tompassin.net>
<0fec5175-e2a2-407a-9e09-c6901617b75c@DancesWithMice.info>
<0a5b67d8-ba6f-4a15-b9e6-a9905e5a987e@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de Hw1HhTyAl5VOykOBWAMndgJlrcWwZ4tMTk1ms2rSgVSg==
Cancel-Lock: sha1:tdXgJeTmMbg6YH3Do2vFVotmvHA= sha256:iKGKZlv1dFwnu6FOKDX8ylWJfLtrmQSaXP7qQHysul8=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=kHKaRNIQ;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.004
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '(which': 0.04;
'containing': 0.05; '(to': 0.07; 'matches': 0.07; 'subject:error':
0.07; 'approaches': 0.09; 'compute': 0.09; 'json': 0.09; 'ok,':
0.09; 'parse': 0.09; 'question:': 0.09; 'requests.': 0.09; 'url-
ip:151.101.0.223/32': 0.09; 'url-ip:151.101.128.223/32': 0.09;
'url-ip:151.101.192.223/32': 0.09; 'url-ip:151.101.64.223/32':
0.09; 'steps': 0.11; 'import': 0.15; 'problem.': 0.15; '(more':
0.16; 'are.\xc2\xa0': 0.16; 'computation': 0.16; 'computers':
0.16; 'constant': 0.16; 'dictionary.': 0.16; 'hint': 0.16;
'intro': 0.16; 'outputting': 0.16; 'parsing': 0.16; 'pertinent':
0.16; 'printer': 0.16; 'pytest,': 0.16; 'queries.': 0.16;
'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'script.': 0.16; 'slow': 0.16;
'splitting': 0.16; 'subject:running': 0.16; 'url:project': 0.16;
'url:pypi': 0.16; 'wikimedia': 0.16; 'wrote:': 0.16; 'problem':
0.16; 'python': 0.16; 'api': 0.17; 'pull': 0.17; 'probably': 0.17;
"aren't": 0.19; 'pm,': 0.19; 'to:addr:python-list': 0.20; 'input':
0.21; 'code': 0.23; 'lines': 0.23; 'idea': 0.24; '(and': 0.25;
'discussion': 0.25; 'seems': 0.26; 'again,': 0.26; 'library':
0.26; "isn't": 0.27; 'leave': 0.27; 'function': 0.27; '>>>': 0.28;
'expect': 0.28; 'output': 0.28; 'settings': 0.28; 'example,':
0.28; 'ideas': 0.28; 'header:User-Agent:1': 0.30; 'seem': 0.31;
'approach': 0.31; 'url-ip:188/8': 0.31; 'think': 0.32;
'execution': 0.32; 'python-list': 0.32; 'received:10.0': 0.32;
'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'skip:w 40': 0.32;
'structure': 0.32; 'but': 0.32; "i'm": 0.33; 'there': 0.33;
'able': 0.34; 'header:In-Reply-To:1': 0.34; 'complex': 0.35;
'functions': 0.36; 'those': 0.36; "skip:' 10": 0.37; 'main': 0.37;
'really': 0.37; 'using': 0.37; "it's": 0.37; 'file': 0.38; 'way':
0.38; 'could': 0.38; 'url-ip:151.101.0/24': 0.62; 'url-
ip:151.101.128/24': 0.62; 'url-ip:151.101.192/24': 0.62; 'url-
ip:151.101.64/24': 0.62; 'come': 0.62; 'once': 0.63; 'our': 0.64;
'research': 0.64; 'skip:r 20': 0.64; 'definition': 0.64; 'your':
0.64; 'top': 0.65; 'produce': 0.65; 'look': 0.65; 'improve': 0.66;
'earlier': 0.67; 'now,': 0.67; 'back': 0.67; 'outside': 0.67;
'header:Received:6': 0.67; 'time,': 0.67; 'received:64': 0.67;
'that,': 0.67; 'right': 0.68; 'items': 0.68; 'further': 0.69;
'functional': 0.69; 'repeatedly': 0.69; 'times': 0.69; 'above,':
0.70; 'process,': 0.75; '8bit%:100': 0.76; 'finds': 0.76;
'discuss': 0.78; 'returned': 0.81; 'ask,': 0.84; 'bid': 0.84;
'big,': 0.84; 'detail:': 0.84; 'easy.': 0.84; 'expand,': 0.84;
'inherent': 0.84; 'organised': 0.84; 'thus,': 0.84; "wikimedia's":
0.84; '\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0': 0.84; 'you?':
0.88; 'sub': 0.91; 'url:wikimedia': 0.91; 'word.': 0.91
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1724900282; a=rsa-sha256;
cv=none;
b=Vqc7fDFI8TsgpwShoV128j95p+5Wy4EzoUb1LQ7ZZXTJKGCVGYtQ+SzVCKuWsx6dKkkRaJ
EfRj7KSct7i1hvT7Pf2Hcb4YoNTcvOlCljWJVQqOIZ+Y7b8wptj5QwnSwpytNRFNvDzAgq
SAKSp2tjlWYWde1AKOi923JZk5S465R45tKiAgRh2hb25Vwj3HqqV9Khu2SqnguaIR+OSb
SIl56fIgZNP0vRmO9fd5ay2Yid+6UjMc+pHJQ67dGKqClv9/4diTMy//uBRvomA7XYNJiI
J3z8fxJiBnsGmHKrxCYAW95FbsQa/LIRR3IPeHu/QkTPlIaRMQH6pnv7n26tIA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1724900282;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=dpCse2wF+wG9jMdOvjYQymOdh4woKY4bmeSqP5PJ7fY=;
b=tWkeapty85SNyV+PFl5SkwkWAcbAnrh0fuVzRConrx5kfSt37h7G4GeVBf3knX47mJDao6
OJKL413yCWYh8lNT2dglpd/WPlp1HS9CHslPSUJaavXS7VWWYV+JnIn4d8R4i7c6VtqKNY
qYgxsGuqye+X2lYxhAzikWCCQfxSXtrRDy3Vg0x83Lp+sDRaNwunQf6sS7Vv6fACssJdtY
BwbipZbenIEU3T92YuyqgV7EZFwuZp4MkrChxGwXkAkaaxvLlShYTCvh7gEcxRfzf7xvVc
N6DFkPMMiw2Tnia+v1nOEjPNIKrIjbS1rEFNc8/TnCv3OmZFpAc7jBShMAI+UA==
ARC-Authentication-Results: i=1; rspamd-cf944896d-txt8p;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Befitting-Spot: 29ad267c2d418b2c_1724900282752_922394888
X-MC-Loop-Signature: 1724900282752:1414773516
X-MC-Ingress-Time: 1724900282752
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1724900282;
bh=dpCse2wF+wG9jMdOvjYQymOdh4woKY4bmeSqP5PJ7fY=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=kHKaRNIQ5d58qZEJlOaBgJOIY4Xj/eS8UgotXG+gGmjn36/kUrKSM65JgWOpt1SPy
LEM7wFc+eC5X40n9utZHMAlEXO7Q6LVActJHR/7oSove05ZnNQ0nFvx+Mh2znboind
L9Gv1kK1mTKzXyvt45Hts6AXywNmz3XwNGdskjDmV3gzGgs4RJ3nM1bqzluKz3H/2a
POpciGhNksoFpVJlRU7cZMddoXdX/AcxBaSzmEtUJoVOZgQgvpgNXSATW+mWhQrz2D
2dv+dm7Vspl8tqYC6KpYT0QtcyjUooS0nmYNwZ/E3qNTNFeFHhQ12yCTEYBa95nWQ5
YM1RYXKR09mbA==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <0fec5175-e2a2-407a-9e09-c6901617b75c@DancesWithMice.info>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <0a5b67d8-ba6f-4a15-b9e6-a9905e5a987e@tompassin.net>
X-Mailman-Original-References: <87r0a8xskb.fsf@rpi3>
<bb82f035-45dc-4c6f-aaec-b1e59ce825f7@tompassin.net>
<0fec5175-e2a2-407a-9e09-c6901617b75c@DancesWithMice.info>
View all headers

On 8/28/2024 8:07 PM, dn via Python-list wrote:
> On 29/08/24 10:32, Thomas Passin via Python-list wrote:
>> On 8/28/2024 5:09 PM, Daniel via Python-list wrote:
>>> As you all have seen on my intro post, I am in a project using Python
>>> (which I'm learning as I go) using the wikimedia API to pull data from
>>> wiktionary.org. I want to parse the json and output, for now, just the
>>> definition of the word.
>>>
>>> Wiktionary is wikimedia's dictionary.
>>>
>>> My requirements for v1
>>>
>>> Query the api for the definition for table (in the python script).
>>> Pull the proper json
>>> Parse the json
>>> output the definition only
>
>
>> You need to check at each part of the code to see if you are getting
>> or producing what you think you are.  You also should create a text
>> constant containing the JSON input you expect to get.  Make sure you
>> can process that.  Start simple - one main item.  Then two main
>> items.  Then two main items with one sub item.  And so on.
>>
>> I'm not sure what you want to produce in the end but this seems
>> awfully complex to be starting with.  Also you aren't taking advantage
>> of the structure inherent in the JSON.  If the data response isn't too
>> big, you can probably take it as is and use the Python JSON reader to
>> produce a Python data structure.  It should be much easier (and
>> faster) to process the data structure than to repeatedly scan all
>> those lines of data with regexes.
>
>
> Good effort so far!
>
>
> Further to @Thomas: the code does seem to be taking the long way around!
> How can we illustrate that, and improve life?
>
>
> The Wiktionary docs at https://developer.wikimedia.org/use-content/
> discuss how to use their "Developer Portal". Worth reading!
>
> As part of the above, we find the "API:Data formats" page (https://
> www.mediawiki.org/wiki/API:Data_formats) which offers a simple example
> (more simple than your objectives):
>
> api.php?action=query&titles=Main%20page&format=json
>
> which produces:
>
> {
>   "query": {
>     "pages": {
>       "217225": {
>         "pageid": 217225,
>         "ns": 0,
>         "title": "Main page"
>       }
>     }
>   }
> }
>
> Does this look like a Python dict[ionary's] output to you?
>
> It is, (more discussion at the web.ref)
> - but it is wrapped into a JSON payload.

To give more detail:

import json
from pprint import pprint

DATA = """{
"query": {
"pages": {
"217225": {
"pageid": 217225,
"ns": 0,
"title": "Main page"
}
}
}
}"""

data_dict = json.loads(DATA)
pprint(data_dict)

Easy. If you have a really big file it can be fearfully slow so it may
or may not be a good approach for this problem.

Or you could parse out the data with JSONpath (which I have never used
but it's the right kind of approach):

https://pypi.org/project/jsonpath-ng/

Another possibility: JMESPath:

https://python.land/data-processing/working-with-json/jmespath

These kind of approaches also handle the parsing for you and help in
constructing queries.

> There are various ways of dealing with JSON-formatted data. You're
> already using requests. Perhaps leave such research until later.
>
>
> So, as soon as "page_data" is realised from "response", print() it (per
> above: make sure you're actually seeing what you're expecting to see).
> Computers have this literal habit of doing what we ask, not what we want!
>
> PS the pprint/pretty printer library offers a neater way of outputting a
> "nested" data-structure (https://docs.python.org/3/library/pprint.html).
>
>
> Thereafter, make as much use of the returned dict/list structure as can.
> At each stage of the 'drilling-down' process, again, print() it (to make
> sure ...)
>
>
> In this way the code will step-through the various 'layers' of data-
> organisation. That observation and stepping-through of 'layers' is a
> hint that the code should (probably) also be organised by 'layer'! For
> example, the first for-loop finds a page which matches the search-key.
> This could be abstracted into a (well-named) function.
>
> Thus, you can write a test-harness which provides the function with some
> sample input (which you know from earlier print-outs!) and can ensure
> (with yet another print()) that the returned-result is as-expected!
>
> NB the test-data and check-print() should be outside the function.
> Please take these steps as-read or as 'rules'. Once your skills expand,
> you will likely become ready to learn about unit-testing, pytest, etc.
> At which time, such ideas will 'fall into place'.
>
>
> BTW/whilst that 'unit' is in-focus: how many times will the current code
> compute search_query.lower()? How many times (per function call) will
> "search_query" be any different from previous calls? So, should that
> computation be elsewhere?
> (won't make much difference to execution time, but a coding-skill:
> consider whether to leave computation until the result is actually
> needed (lazy-evaluation), or if early-computation will save unnecessary
> repeated-computation)
>
>
> Similarly, 'lift' constants such as "cases" out of (what will become)
> functions and put them towards the top of the script. This means that
> all such 'definition' and 'configuration' settings will be found
> together in one easy-to-find location AND makes the functional code
> easier to read.
>
>
> Now, back to the question: where is the problem arising? Do you know or
> do you only know that what comes-out at the end is unattractive/
> unacceptable?
>
> The idea of splitting the code into functions (or "units") is not only
> that you could test each and thereby narrow-down the location of the
> problem (and so that we don't have to read so much code in a bid to
> help) but that when you do ask for assistance you will be able to
> provide only the pertinent code AND some sample input-data with
> expected-results!
> (although, if all our dreams come true, you will answer your own question!)
>
>
> OK, is that enough by way of coding-tactics (not to mention the web-
> research) to keep you on-track for a while?
>

1

rocksolid light 0.9.8
clearnet tor