Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

The naked truth of it is, I have no shirt. -- William Shakespeare, "Love's Labour's Lost"


comp / comp.lang.python / Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

SubjectAuthor
o Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Ke2QdxY4RzWzUUiLuE

1
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
From: 2QdxY4RzWzUUiLuE@potatochowder.com
Newsgroups: comp.lang.python
Date: Mon, 30 Sep 2024 22:16 UTC
References: 1 2 3 4 5 6 7 8
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: 2QdxY4RzWzUUiLuE@potatochowder.com
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
GB) from Kenna API
Date: Mon, 30 Sep 2024 18:16:03 -0400
Lines: 38
Message-ID: <mailman.15.1727734568.3018.python-list@python.org>
References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
<CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<CA+hg4Rhn8iX7rp0uC=MbOi+8g73wQ4y4=uV0dU0jHdDUz3jk4w@mail.gmail.com>
<CAJQBtgk122sHzs+=MumYM1HW2DwKm1+i02bqgBKh4oUJYievCg@mail.gmail.com>
<4XHQPG4LzsznVwM@mail.python.org> <Zvrt0RJe5omaFkQq@anomaly>
<CAPTjJmqCz0UthKfs2-sd6E0Jcq23m0r-DgriywwazhWZ381wwg@mail.gmail.com>
<ZvsjI-E8Qtz4rCeL@anomaly>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: news.uni-berlin.de N1mU5Q7/UJ5kxW9HHyRgyAb1sh36Ts8CgIV9dY9LdJ1g==
Cancel-Lock: sha1:xKpV9xAv1hyOZ3ewNtBSpRpjKYg= sha256:IAKOLZ+fVHHgiC+cVYdvH7ytDkAi0HKatlNWC3u97sw=
Return-Path: <2QdxY4RzWzUUiLuE@potatochowder.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=potatochowder.com header.i=@potatochowder.com
header.b=JMfcquOF; dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.009
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'subject:API': 0.07;
'angelico': 0.09; 'dan': 0.09; 'fail.': 0.09; 'infinite': 0.09;
'json': 0.09; 'parse': 0.09; 'received:78': 0.09; 'rejecting':
0.09; '2024': 0.16; 'algorithms': 0.16; 'anyway.': 0.16;
'arguments': 0.16; 'decimal': 0.16; 'from:addr:2qdxy4rzwzuuilue':
0.16; 'from:addr:potatochowder.com': 0.16; 'integer': 0.16; 'odd':
0.16; 'parsing': 0.16; 'received:136.243': 0.16;
'received:172.58': 0.16; 'received:78.46': 0.16; 'received:www458
.your-server.de': 0.16; 'received:your-server.de': 0.16; 'wrote:':
0.16; 'subject:Help': 0.17; "can't": 0.17; 'tue,': 0.19; 'to:addr
:python-list': 0.20; 'written': 0.22; 'goal': 0.23; 'url:wiki':
0.23; 'received:de': 0.23; 'anything': 0.25; 'examples': 0.25;
"wasn't": 0.26; 'chris': 0.28; 'whole': 0.30; "doesn't": 0.32;
'point,': 0.32; 'python-list': 0.32; 'received:136': 0.32; 'but':
0.32; 'subject:for': 0.33; 'there': 0.33; 'particular': 0.33;
'header:In-Reply-To:1': 0.34; 'question.': 0.35; 'cases': 0.36;
'possibly': 0.36; 'posts': 0.36; 'subject:from': 0.37; "it's":
0.37; 'read': 0.38; 'two': 0.39; 'least': 0.39; 'use': 0.39;
'base': 0.40; 'something': 0.40; 'potential': 0.60; 'tell': 0.60;
"there's": 0.61; 'numbers': 0.67; 'further': 0.69; 'sequence':
0.69; 'url-ip:208.80/16': 0.70; 'url:wikipedia': 0.70; 'knowing':
0.71; 'subject:Data': 0.71; 'url-ip:208/8': 0.71; 'accepting':
0.75; 'significant': 0.78; 'left': 0.83; 'happens': 0.84;
'significant,': 0.84; 'six,': 0.84; 'strings': 0.84; 'subject: \n
': 0.84; 'very,': 0.84
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
d=potatochowder.com; s=default2305; h=In-Reply-To:Content-Type:MIME-Version:
References:Message-ID:Subject:To:From:Date:Sender:Reply-To:Cc:
Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID;
bh=sPEJq93ZnjELfkOkJThkjLiZzGHfoGOU7L4u+zPp9w8=; b=JMfcquOFV4b73xx0a9q0pnu0Nu
+ONaCX7yH0tVU7q5asVuQhD2Bvbm8qZo6mAdihwqX2HmwUN6953maboCf4hgkQwVx//I/LM/qxHfY
UwBPSQvDvJ5CBwJLJ9/N/aJ8kS2t8pv+rBW40sSP2wFPlUtP5HiYkD2DH55A3QYtes59ErV0l97CP
o32Rk0E5b1UoO7ka7O4fNqNGLmVy2RkDzFWIAwhAEkoSYyCw52mo+JmX+0oEhRphjdHBwlRzaYBhm
MzoctHgBogYgb2xb62wH/v8vpg2PWiao9VoklIaD5/mZU0MfeUQaURrewmp3BetE1sxk1orxbArxc
OXX+vY8w==;
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <CAPTjJmqCz0UthKfs2-sd6E0Jcq23m0r-DgriywwazhWZ381wwg@mail.gmail.com>
X-Authenticated-Sender: 2QdxY4RzWzUUiLuE@potatochowder.com
X-Virus-Scanned: Clear (ClamAV 0.103.10/27413/Mon Sep 30 10:48:24 2024)
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <ZvsjI-E8Qtz4rCeL@anomaly>
X-Mailman-Original-References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
<CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<CA+hg4Rhn8iX7rp0uC=MbOi+8g73wQ4y4=uV0dU0jHdDUz3jk4w@mail.gmail.com>
<CAJQBtgk122sHzs+=MumYM1HW2DwKm1+i02bqgBKh4oUJYievCg@mail.gmail.com>
<4XHQPG4LzsznVwM@mail.python.org>
<Zvrt0RJe5omaFkQq@anomaly>
<CAPTjJmqCz0UthKfs2-sd6E0Jcq23m0r-DgriywwazhWZ381wwg@mail.gmail.com>
View all headers

On 2024-10-01 at 04:46:35 +1000,
Chris Angelico via Python-list <python-list@python.org> wrote:

> On Tue, 1 Oct 2024 at 04:30, Dan Sommers via Python-list
> <python-list@python.org> wrote:
> >
> > But why do I need to start with the least
> > significant digit?
>
> If you start from the most significant, you don't know anything about
> the number until you finish parsing it. There's almost nothing you can
> say about a number given that it starts with a particular sequence
> (since you don't know how MANY digits there are). However, if you know
> the LAST digits, you can make certain statements about it (trivial
> examples being whether it's odd or even).

But that wasn't the question. Sure, under certain circumstances and for
specific use cases and/or requirements, there might be arguments to read
potential numbers as strings and possibly not have to parse them
completely before accepting or rejecting them.

And if I start with the least significant digit and the number happens
to be written in scientific notation and/or has a decimal point, then I
can't tell whether it's odd or even until I further process the whole
thing anyway.

> It's not very, well, significant. But there's something to it. And it
> extends nicely to p-adic numbers, which can have an infinite number of
> nonzero digits to the left of the decimal:
>
> https://en.wikipedia.org/wiki/P-adic_number

In Common Lisp, integers can be written in any integer base from two to
thirty six, inclusive. So knowing the last digit doesn't tell you
whether an integer is even or odd until you know the base anyway.

Curiously, we agree: if you move the goal posts arbitrarily, then
some algorithms that parse JSON numbers will fail.

1

rocksolid light 0.9.8
clearnet tor