Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #408: Computers under water due to SYN flooding.


comp / comp.lang.python / Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

SubjectAuthor
o Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from KeThomas Passin

1
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
From: Thomas Passin
Newsgroups: comp.lang.python
Date: Mon, 30 Sep 2024 17:57 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: list1@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
GB) from Kenna API
Date: Mon, 30 Sep 2024 13:57:05 -0400
Lines: 31
Message-ID: <mailman.12.1727722015.3018.python-list@python.org>
References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
<9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
<CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com>
<848d6843-d919-4a43-80e1-768fb8da2139@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de PdAzWEZHEag/cmofsS5OcgF6Jl94YUpopSodVNT+JzbQ==
Cancel-Lock: sha1:tRPTYA0PHAySw9gJYdd1hsFCLj4= sha256:DjThz2dLajsCkGMz6OBlo0VEaKtf7wvExRs3Uh6a1Nc=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=ffKQC68m;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.011
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'subject:API': 0.07;
'angelico': 0.09; 'memory.': 0.09; 'import': 0.15; '2024': 0.16;
'>>>>': 0.16; 'barry': 0.16; 'chrisa': 0.16; 'janhangeer': 0.16;
'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'wrote:': 0.16; 'subject:Help':
0.17; 'pm,': 0.19; 'tue,': 0.19; 'to:addr:python-list': 0.20;
'>>>': 0.28; 'chris': 0.28; 'thinking': 0.28; 'computer': 0.29;
'header:User-Agent:1': 0.30; 'whole': 0.30; 'am,': 0.31; 'python-
list': 0.32; 'received:10.0': 0.32; 'received:mailchannels.net':
0.32; 'received:relay.mailchannels.net': 0.32; 'right,': 0.32;
'sep': 0.32; 'unless': 0.32; 'subject:for': 0.33; 'header:In-
Reply-To:1': 0.34; 'subject:from': 0.37; 'file': 0.38;
'received:100': 0.39; 'still': 0.40; 'once': 0.63;
'header:Received:6': 0.67; 'received:64': 0.67; 'perfectly': 0.69;
'subject:Data': 0.71; 'receive': 0.71; 'larger,': 0.84; 'subject:
\n ': 0.84
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1727719026; a=rsa-sha256;
cv=none;
b=DMWpD8cqw6QIRuCPOPey019UmMelcCgHBlj3KH0ZKKyhIEV3TgI9U5ZOALFwtdA1EqGCOC
7ikRmkNk16qWhTURXhT9MPKK73YeoujK2tR8QBa/qjXLoDmKBT7WQYWbtXVmlEdrf5GLaI
+gsehA64nKVepCylMpq403p9AFxYvslTPmzRGip13J3+KJW/OROfgVQm0UM2tOcCoo98NA
d7hQoXovleLz0pSrqvO0FY6jako+H12MwP/Ix24Mhb9dN9XlpxPLeqUwmBOCbtTqLPQ1MD
MI5kFHcovfIa25Xg92sjWxiPJMckANr3d6zWnk+PvCGwbQl7QdzTpBKmqJkTXg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1727719026;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=A8iSIdB/7lEqU/YQGUyzu2jt2joQqmedpNkQ8/pzC+4=;
b=g4yVz1qd8VzwB50X9oSdNNjXw95GxkwyLK1b8ImaOvfS2Bsv4YLotsn2eJmtN2wVtypKmX
iRt2FFFhT3eIghpRFmdGbVztN9jZ8GjYmag76h9677yI5MTrgYoGMip5BlOKjios/W2SPe
AV4p2iLCNekMcdC5p+WA2XZtsnWwqmw0OvzhBrx25lUvWr/1hctC09c8wx8REymIckubgH
K44n2dp+/NnHUN/1DxV/atflbPYS0WKVVW5PBcFK2TobxxTNG4+im/JglFHYk5PzLFBuxG
Rc3vNYryEnJeIDHulfb78SBta8bVSx2WU6cizzdCe30DD5h/Qx0sJ4jEIuVufw==
ARC-Authentication-Results: i=1; rspamd-657f47799c-jlm8v;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Harbor-Whimsical: 2040fab6183a0c53_1727719026922_1983242883
X-MC-Loop-Signature: 1727719026922:3510390117
X-MC-Ingress-Time: 1727719026921
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1727719026;
bh=A8iSIdB/7lEqU/YQGUyzu2jt2joQqmedpNkQ8/pzC+4=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=ffKQC68mndTNnYri7waLHEAg2IBDKbvtsM/jquM0wtg5FQYnQ4V/9GW4UxN0y7J+d
F27+1q9oRH1m3skf2aSlohPQMQYPYm7pI28dXPmSRALozbRtMGxUFjX7iWKRWJMRON
TQdQYdQodH7PXrOehbPegvyXdj7rFGPQ0WiDKMksABlW+sugZN8ccfmxtRUEryl+Gn
1+BWZTtGiOZE4mnrEJb4a7t516cq2v1sC5MJeDKTR55x8MiTrPYDUZ4INVhoWaFuam
ahKCkisnW3nD9bqECCjCb5IAFZxf9Bg09u4KpEZuTuuKgrxPsfYmWW6n7aMACKzASF
3K9ZazLrvfVnw==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <848d6843-d919-4a43-80e1-768fb8da2139@tompassin.net>
X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
<9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
<CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com>
View all headers

On 9/30/2024 1:00 PM, Chris Angelico via Python-list wrote:
> On Tue, 1 Oct 2024 at 02:20, Thomas Passin via Python-list
> <python-list@python.org> wrote:
>>
>> On 9/30/2024 11:30 AM, Barry via Python-list wrote:
>>>
>>>
>>>> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote:
>>>>
>>>>
>>>> import polars as pl
>>>> pl.read_json("file.json")
>>>>
>>>>
>>>
>>> This is not going to work unless the computer has a lot more the 60GiB of RAM.
>>>
>>> As later suggested a streaming parser is required.
>>
>> Streaming won't work because the file is gzipped. You have to receive
>> the whole thing before you can unzip it. Once unzipped it will be even
>> larger, and all in memory.
>
> Streaming gzip is perfectly possible. You may be thinking of PKZip
> which has its EOCD at the end of the file (although it may still be
> possible to stream-decompress if you work at it).
>
> ChrisA

You're right, that's what I was thinking of.

1

rocksolid light 0.9.8
clearnet tor