Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

The Public is merely a multiplied "me." -- Mark Twain


comp / comp.lang.python / Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

SubjectAuthor
o Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from KeThomas Passin

1
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
From: Thomas Passin
Newsgroups: comp.lang.python
Date: Mon, 30 Sep 2024 18:05 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: list1@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
GB) from Kenna API
Date: Mon, 30 Sep 2024 14:05:36 -0400
Lines: 20
Message-ID: <mailman.14.1727728109.3018.python-list@python.org>
References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
<a08124c8-bc9f-4248-8697-700014665bef@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de /60RLTrkhN2h1rQKjsNa/AIpf3WGYQEVdJvGFUE+5u8Q==
Cancel-Lock: sha1:Iav2gYYgBYlTQ2bYSvYFJm+mggQ= sha256:pXCa/1OLLXyxLD7ZTiXna6zp3sS/1Q8yBy1F/ZGQQ0s=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=nd1fdIjJ;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.003
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'pypi': 0.05;
'subject:API': 0.07; 'library,': 0.09; 'url-ip:151.101.0.223/32':
0.09; 'url-ip:151.101.128.223/32': 0.09; 'url-
ip:151.101.192.223/32': 0.09; 'url-ip:151.101.64.223/32': 0.09;
'import': 0.15; 'barry': 0.16; 'janhangeer': 0.16;
'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'url:project': 0.16; 'url:pypi':
0.16; 'wrote:': 0.16; 'subject:Help': 0.17; 'to:addr:python-list':
0.20; 'computer': 0.29; 'header:User-Agent:1': 0.30; 'am,': 0.31;
'python-list': 0.32; 'received:10.0': 0.32;
'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'sep': 0.32; 'unless':
0.32; 'subject:for': 0.33; 'there': 0.33; 'header:In-Reply-To:1':
0.34; 'subject:from': 0.37; 'received:100': 0.39; 'url-
ip:151.101.0/24': 0.62; 'url-ip:151.101.128/24': 0.62; 'url-
ip:151.101.192/24': 0.62; 'url-ip:151.101.64/24': 0.62;
'header:Received:6': 0.67; 'received:64': 0.67; 'subject:Data':
0.71; 'subject: \n ': 0.84
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1727719538; a=rsa-sha256;
cv=none;
b=sp3BkEKJfpv32D7lDRUXr3McOwsAq0KMCPW/3Q4xV56nNu7GY+AP1FkGnSMTPaUAvaA5os
zfgrFXp0wz1u1rn9e3T0Mzn0IqJ2cyFRQirw9RE/kEEr/acAkrnxo0ParcHmycIEF3QeyS
09RnCl5AAZ8NKAncqXmxqu/l+3v+nl5nZcuD/xc0ECRe5NJ/c9Lwz3e/FjgeFXsBsQZVEz
Fv3CoQ7rQ0gCG18HDRA6MaPqwceUQz/J28TbaVf1VtfHbk+RovcFU20P3gsKm86KE9f0ie
JzPYKKOR8lWVKfnoEuuIFFGlG1wuP2FdU4shZZ+iOU4gn+N02/WDVFtPZ148vw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1727719538;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=GSPPqe4R36OE6Hmc84KqldpV/ci9NIi0/0U4lGfscRI=;
b=T72yLLCyJBEmXw9U9tudAHpa9ogvzXqaCH3+lh2a+B+4N5aj4mcmO7iR24R3qAyR/q4Qhp
aQg/8xuMuPCol7fC4/73xmv0HLXqn5AAJ0i8D15kmI24rGlKk8kyWcHOyElYKUX9g5eJhL
pPqbGEv5m97G7yO7k3y28I6yZM/6egFXRe4QnZAXRXMLxIPXMCOJt8lN9LmSYc7Oq/y314
BAiLGRBAHc3S5ut2xsD+I+HMKmRBtGIdcYnWijUoCiWALjjE8P77MEI8LbqyDhpslzQOXe
179U+BsvKmiL4Yl+OJeVMlPn7YCB9qTpqSJ0znThR5BS3AfhOAxNlfYzqAjlcA==
ARC-Authentication-Results: i=1; rspamd-657f47799c-klltc;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Wide-Eyed-Minister: 12f10cfa5f6cb535_1727719538197_600720402
X-MC-Loop-Signature: 1727719538197:3335633094
X-MC-Ingress-Time: 1727719538197
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1727719537;
bh=GSPPqe4R36OE6Hmc84KqldpV/ci9NIi0/0U4lGfscRI=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=nd1fdIjJIOFP/MJ8xxxqiemyuWHZdvFPGkHo0dORiyRs7w7vvBwVriuJYV2ghHxtD
2JVP6aSI/N9A6QOdYm6Z1ZpByNFZ622l+jdNaL4aM1a1d6/g6J2FyQFHfn0042dy8Y
OyIq3OGwzh13yK/IB++mDpbGXHXc55QPryJRhIhJ/5rSZnwo6N1FLsFZXlvWXwRfoz
SHfZDTyBpqw+S30FmFnjlU/enK8JNpEwRice3xRmX41CaOWVnLudymcdQEmWnm03L7
IL8sfLOlivfxepv9y3pxxrOeIpvTkRhoYmhOLV+OmBuymzcDmR/nOlIKBzTyTuirdF
Bs70g0DTkeMxw==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <a08124c8-bc9f-4248-8697-700014665bef@tompassin.net>
X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
View all headers

On 9/30/2024 11:30 AM, Barry via Python-list wrote:
>
>
>> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote:
>>
>>
>> import polars as pl
>> pl.read_json("file.json")
>>
>>
>
> This is not going to work unless the computer has a lot more the 60GiB of RAM.
>
> As later suggested a streaming parser is required.

There is also the json-stream library, on PyPi at

https://pypi.org/project/json-stream/

1

rocksolid light 0.9.8
clearnet tor