Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #318: Your EMAIL is now being delivered by the USPS.


comp / comp.lang.python / Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

SubjectAuthor
o Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from KeThomas Passin

1
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
From: Thomas Passin
Newsgroups: comp.lang.python
Date: Mon, 30 Sep 2024 16:11 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: list1@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
GB) from Kenna API
Date: Mon, 30 Sep 2024 12:11:46 -0400
Lines: 18
Message-ID: <mailman.7.1727713114.3018.python-list@python.org>
References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
<9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de Up93g91uYpGNbHf5B5mA6ApDwHkrTxxRSys0aqowT7KQ==
Cancel-Lock: sha1:EruH+zTeuyRBB9N/ebP8nILU7SY= sha256:Mc20RVGjWbk0pByvc735OJn96FX2F89MWc3Rjutfvm8=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=1J8c6mvR;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.032
X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'subject:API': 0.07;
'memory.': 0.09; 'import': 0.15; 'barry': 0.16; 'janhangeer':
0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'wrote:': 0.16; 'subject:Help':
0.17; 'to:addr:python-list': 0.20; 'computer': 0.29; 'header:User-
Agent:1': 0.30; 'whole': 0.30; 'am,': 0.31; 'python-list': 0.32;
'received:10.0': 0.32; 'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'sep': 0.32; 'unless':
0.32; 'subject:for': 0.33; 'header:In-Reply-To:1': 0.34;
'subject:from': 0.37; 'file': 0.38; 'received:100': 0.39; 'once':
0.63; 'header:Received:6': 0.67; 'received:64': 0.67;
'subject:Data': 0.71; 'receive': 0.71; 'larger,': 0.84; 'subject:
\n ': 0.84
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1727712707; a=rsa-sha256;
cv=none;
b=NL/KH8NAYh+un+wLiv6ECwoSujLwIT4iajgx6JZRFJAHjTtLOJ2pkM+uqcINmkzbiQHxwD
hXdBvu0QHNw94mPJD/ER8YEbfMsGPbULZAQT7k6xsKmJD3hYcR6VSQM1UM8W958UGY1Q9N
+6A9LoQC6E+ziFyRsqULClZpA9Pi69QDNOyeLoR811dQWY4BTIWEjVOzPco4f9LHPjYNHu
6c7nWhA67qt0V/k4mU3YFzPZdsYbuO1xJIf0ENpawx1ocOFCnmwxyn+BOXzBDAqAksUkYu
hvn5USB6+mfKTnNzvUxG2HUkj+sf96yz/rFEUYEVu2vxfTWeMECceW6rzuirwQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1727712707;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=KuIyJ0d3l32ey4tzvNxh1xYq5vElC8SG5apG088/Aak=;
b=rAjtTb2PyDMhieYyizPbqKi2Uy7R4SUdhw/YiP+XvBPTj+K26vk3fw4URTHCeDrk1G9Fag
GrkHhVxQhBA7gb5GpG1v1eHezkX4lE8qumWvbBITjR1Ye5wg7TlhrHGA5v++CehRnP0aIp
bU78Stx/XeqmzSHUHYWApE0hBLneS2S4rLzt/gzA68Yxl/EWE4ORNlYi9dBdNnJKdXLSJ5
eDxWQpgMR+50z6MrmJz8I9fhVtA00ab3/B4/HwajUYApbm4KuclW3/La0XEC1sgw5/KKm9
5i/hR2vHHuPJPTEz5jdibMk7WJXOeD+mza9T9hpX7b+sh7x21sOvA2JCw8YlWQ==
ARC-Authentication-Results: i=1; rspamd-657f47799c-jlm8v;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Obese-Sponge: 57f506783d6654c3_1727712707884_3988401090
X-MC-Loop-Signature: 1727712707884:663450481
X-MC-Ingress-Time: 1727712707884
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1727712707;
bh=KuIyJ0d3l32ey4tzvNxh1xYq5vElC8SG5apG088/Aak=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=1J8c6mvRRLQNqlvHucoHIKvNe5HT9OVdkhXMqAZGMDTLeM0g/g5QVXlBxVkFP6ul5
MM54Ahv+i6Ym8gfgAWt0bOEsua/UN9zRzdUNHMjcvlluYsON6STwHXVwPdPBIJF4W0
vpKxc9hxD0Pqoq+6z51NdQ/dn2k8baV77MeOxhsVS1KKNbgFNTlqa8VoLPm02BUDTE
U09rWsoNQGKSHbHXotmunPk2ubscjNjzR5r3OaaLY+rsyO456wOupiC8VM1VuOxOIP
puSCu89KiIyWz4+jUCZiOOTG3LLOy1ZVFH3f0Bzyt4S4kX59Lq2IafRA6r6iY23zzE
UAtIAIQyd7u5w==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
View all headers

On 9/30/2024 11:30 AM, Barry via Python-list wrote:
>
>
>> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote:
>>
>>
>> import polars as pl
>> pl.read_json("file.json")
>>
>>
>
> This is not going to work unless the computer has a lot more the 60GiB of RAM.
>
> As later suggested a streaming parser is required.

Streaming won't work because the file is gzipped. You have to receive
the whole thing before you can unzip it. Once unzipped it will be even
larger, and all in memory.

1

rocksolid light 0.9.8
clearnet tor