Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

Are you making all this up as you go along?


comp / comp.lang.python / Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

SubjectAuthor
o Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from KeLeft Right

1
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
From: Left Right
Newsgroups: comp.lang.python
Date: Mon, 30 Sep 2024 19:30 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: olegsivokon@gmail.com (Left Right)
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
GB) from Kenna API
Date: Mon, 30 Sep 2024 21:30:06 +0200
Lines: 34
Message-ID: <mailman.13.1727724684.3018.python-list@python.org>
References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
<9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
<CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de DyTEs2L+VBqzPzU4lU6Hyg2hMe/9KBHv3Et7lNUZp7iA==
Cancel-Lock: sha1:pCNsydqfGw0JrstttNWkFJ9kb00= sha256:taxgqJTrj4W8GxXztNuwa8fX4qZlz6BR0KGn5yaa/vE=
Return-Path: <olegsivokon@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=KX6GMYkO;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.005
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'pypi': 0.05;
'subject:API': 0.07; 'cc:addr:python-list': 0.09; 'memory.': 0.09;
'url-ip:151.101.0.223/32': 0.09; 'url-ip:151.101.128.223/32':
0.09; 'url-ip:151.101.192.223/32': 0.09; 'url-
ip:151.101.64.223/32': 0.09; 'cc:no real name:2**0': 0.14;
'import': 0.15; 'url:mailman': 0.15; '2024': 0.16; 'barry': 0.16;
'janhangeer': 0.16; 'url:project': 0.16; 'url:pypi': 0.16;
'wrote:': 0.16; 'problem': 0.16; 'subject:Help': 0.17;
'cc:addr:python.org': 0.20; 'url-ip:188.166.95.178/32': 0.25;
'url-ip:188.166.95/24': 0.25; 'url:listinfo': 0.25; 'cc:2**0':
0.25; 'url-ip:188.166/16': 0.25; 'computer': 0.29; 'whole': 0.30;
'am,': 0.31; 'url-ip:188/8': 0.31; 'python-list': 0.32; 'sep':
0.32; 'message-id:@mail.gmail.com': 0.32; 'unless': 0.32; 'but':
0.32; 'subject:for': 0.33; 'header:In-Reply-To:1': 0.34;
'received:google.com': 0.34; 'from:addr:gmail.com': 0.35; 'mon,':
0.36; 'subject:from': 0.37; 'file': 0.38; 'search': 0.61; 'url-
ip:151.101.0/24': 0.62; 'url-ip:151.101.128/24': 0.62; 'url-
ip:151.101.192/24': 0.62; 'url-ip:151.101.64/24': 0.62; 'once':
0.63; 'subject:Data': 0.71; 'receive': 0.71; 'quick': 0.77;
'larger,': 0.84; 'revealed': 0.84; 'subject: \n ': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1727724617; x=1728329417; darn=python.org;
h=content-transfer-encoding:cc:to:subject:message-id:date:from
:in-reply-to:references:mime-version:from:to:cc:subject:date
:message-id:reply-to;
bh=q/++m9ZHJ6Q+95VvxiC3mvkxG1YiQ92ilO0ApBbIxqM=;
b=KX6GMYkO4ghkjPb2QAxSSwiWthg+euOJ9yiAftrZ7vMXXtsdah2thLuzIEenZlsEqt
O62zN72wnThD2ojgnteJDRcYAnLs9zf5BP4Dd1bjz+HC5ZMiSz2T6n318ZK47uEJvMCX
QTqDv9WhCeN9rQZGUUUL1BVIMksiD4DCljEW8I28aoca0VWPS72irq+FMbxbnxoGiCAf
ludVg407on5fOQ7ol0fa56Ly9XZKmR1i1togsSGe4ZJsH3pDIzir3tFoGML5wbdk9T8M
dIk6rqDQWwE/t9hImwoFcYNdF7slEDduUk93RYEZlKSOfSVnDLeCjno+o2FRAWa/uQnK
lvDg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1727724617; x=1728329417;
h=content-transfer-encoding:cc:to:subject:message-id:date:from
:in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
:subject:date:message-id:reply-to;
bh=q/++m9ZHJ6Q+95VvxiC3mvkxG1YiQ92ilO0ApBbIxqM=;
b=ZcNPC8h1NfoSNb2s4Me/LFtr5FDSBSj78I0SeUGkWeKly71AUV1NsoU2h5qgD1PUfn
+ifYRNAeR0vIRphQAuM4lnj4cEkaQvrwEzebsFi1BCi1KlVQ7onE1pVark8uityY4+p3
TS73AfO52TpimHykc5daZul3hUUsbwCRlhHyxHyOdr3BRtd5Cxpowc1PGw8V2ks6YLNp
KHS6seEZWfwwOoNPDy3meED+TfWik35VDbogDz4msZRQ0QOa4Q2arrZAxiCNcW8aGn7A
426WZhB6bvBv749Jjjg7VhVlwSI4cWUrMojSHJE7LapHRZIi4IUBHOfwP4gDBLbn+OEU
wCEA==
X-Gm-Message-State: AOJu0Yy+8zUFdQrMqMMSvqJyku6gkDlgr+ZxiZyGXY9Ha+yY8os7Aumb
vWgUFzvd1Lowuu6dvR65RwkMEMZAw0xjxqY9KJkUKONqIupEHaZg/SbPma7M+DVHUen7Lgdu0lC
i4PXZtb0IqHcXwygjHBoa4zvxL7X3No4z
X-Google-Smtp-Source: AGHT+IG1QM0POxdC8C4Wk+cnEPtdzC0dU9MYNkT89jIKKiAhSLs+AAWX5EaUZAqAI+GJJ/XARtr48AJIOQj4mNpYTdA=
X-Received: by 2002:a05:6902:2602:b0:e13:d23d:425 with SMTP id
3f1490d57ef6-e2604b5f2cbmr10349197276.1.1727724617251; Mon, 30 Sep 2024
12:30:17 -0700 (PDT)
In-Reply-To: <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
X-Mailman-Approved-At: Mon, 30 Sep 2024 15:31:23 -0400
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com>
X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
<9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
View all headers

> Streaming won't work because the file is gzipped. You have to receive
> the whole thing before you can unzip it. Once unzipped it will be even
> larger, and all in memory.

GZip is specifically designed to be streamed. So, that's not a
problem (in principle), but you would need to have a streaming GZip
parser, quick search in PyPI revealed this package:
https://pypi.org/project/gzip-stream/ .

On Mon, Sep 30, 2024 at 6:20 PM Thomas Passin via Python-list
<python-list@python.org> wrote:
>
> On 9/30/2024 11:30 AM, Barry via Python-list wrote:
> >
> >
> >> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote:
> >>
> >>
> >> import polars as pl
> >> pl.read_json("file.json")
> >>
> >>
> >
> > This is not going to work unless the computer has a lot more the 60GiB of RAM.
> >
> > As later suggested a streaming parser is required.
>
> Streaming won't work because the file is gzipped. You have to receive
> the whole thing before you can unzip it. Once unzipped it will be even
> larger, and all in memory.
> --
> https://mail.python.org/mailman/listinfo/python-list

1

rocksolid light 0.9.8
clearnet tor