Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

The better part of valor is discretion. -- William Shakespeare, "Henry IV"


comp / comp.lang.python / Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

SubjectAuthor
o Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna Asif Ali Hirekumbi

1
Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
From: Asif Ali Hirekumbi
Newsgroups: comp.lang.python
Date: Fri, 27 Sep 2024 06:17 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: asifali.ha@gmail.com (Asif Ali Hirekumbi)
Newsgroups: comp.lang.python
Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB)
from Kenna API
Date: Fri, 27 Sep 2024 11:47:12 +0530
Lines: 27
Message-ID: <mailman.0.1727668850.3018.python-list@python.org>
References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de 8LtqYXnlFh6E/HIdpZMHrAGi1MIZz/VfM4VG8rjUbDcA==
Cancel-Lock: sha1:oTy3s6TfCc4IHkFQcBI6cMEEijo= sha256:FHzzKT9JyaiS+tJlWuWRwdPjAsvvdkK284ZS/5/fwqg=
Return-Path: <asifali.ha@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=FDHBr9bt;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.015
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'stream': 0.04;
'subject:API': 0.07; 'python.': 0.08; 'json': 0.09;
'url:reference': 0.09; 'memory': 0.15; 'ali': 0.16; 'dataset':
0.16; 'efficiently': 0.16; 'endpoint': 0.16; 'endpoints': 0.16;
'help!': 0.16; 'received:mail-oi1-x22a.google.com': 0.16;
'single,': 0.16; 'size.': 0.16; 'python': 0.16; 'api': 0.17;
'pull': 0.17; 'subject:Help': 0.17; 'guidance': 0.19; 'libraries':
0.19; 'to:addr:python-list': 0.20; 'anyone': 0.25; 'library':
0.26; 'greatly': 0.28; 'requests': 0.28; 'wondering': 0.31;
'format,': 0.32; 'manner.': 0.32; 'retrieve': 0.32; 'message-
id:@mail.gmail.com': 0.32; 'subject:for': 0.33; 'appreciated.':
0.34; 'received:google.com': 0.34; 'handling': 0.35;
'from:addr:gmail.com': 0.35; 'cases': 0.36; 'subject:from': 0.37;
'using': 0.37; 'file': 0.38; 'way': 0.38; 'quite': 0.39; 'use':
0.39; 'data.': 0.40; 'best': 0.61; 'dear': 0.62; 'here': 0.62;
'experience': 0.64; 'your': 0.64; 'similar': 0.65; 'export': 0.69;
'terms': 0.70; 'subject:Data': 0.71; 'offer': 0.71; 'relevant':
0.73; 'management.': 0.84; 'massive': 0.84; 'proving': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1727417844; x=1728022644; darn=python.org;
h=to:subject:message-id:date:from:mime-version:from:to:cc:subject
:date:message-id:reply-to;
bh=YSyWwZBoqxfUMiKM0rzkrqyNPT5aJkDA3V34+3M6jxI=;
b=FDHBr9btXnWjZWb/z4IWcOesIOBYk1H1QoMWFCKOCjzllxGeapdi812ex9Fb8hGDVk
ixxha2JRnGDNLILt44vBlahPMi5KczfIDOs74W3/ogc1pauk/MI0YSA17iVZQeW8YgVP
GaDe3S1lKxiAPu0xWLilKOsZZ/cXf3HHi9u5YQnv384ZJSE4hexrV1kKqy8PnOUttW96
9/LnL8t/5HQTA0ya0DIVGSTQSU1KQOIUBMqPcoUGsMPHX0Pg6JralMjjAe15mEyY+Ejc
0p+5Adzt6VKC3ojGTx5FkPmcUr+CTTjxMdwmEiI2hJFDXYxWcPFnFv20HzCeXnNbcBdd
X/2Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1727417844; x=1728022644;
h=to:subject:message-id:date:from:mime-version:x-gm-message-state
:from:to:cc:subject:date:message-id:reply-to;
bh=YSyWwZBoqxfUMiKM0rzkrqyNPT5aJkDA3V34+3M6jxI=;
b=jlHJ51hz9oTCuWPqLiPOjHWk5okVgiUUg3RAOTlXY3GMOuUhqRcsv7S6BQFychOYTV
Of/PiNnb9deTMA4LGtM+L/V5mobjdxdFtchjbu6b/wEOh6T9EFXqtygWtOrNUHNwnJ5q
au6GWKYtuxmu3IxVp24y8EnuBDECKT4rtBZoFEf3J7s+AMHp2q20FnsFI5k72R0u6dam
WZprUqNSEUTOsd+YSpbPjS/vak/TSyeiO2ip/ai6J2Rg1+LcpFn0MjqnTB6s4gxkW0GJ
Glm09e20EHsmezEQmOdL8IlbZN7RtQ3R2zhr8qXNp5P4QC+rFYEwHwCFEa+GADUM5WZd
NcDQ==
X-Gm-Message-State: AOJu0Yw9z7fXXBqzCHyfd33XXJfimn9r/nxJCxE1FMDRTEdQfoVex8dh
z5vv9kyBdDe4NnSnM9I9IrIcx+mW7G/pRGnJWRJW6zcUCBEsqKp7Yy3dcovutYqm8CZHnY/6mCO
wr//lz2W79gxR9/gQWfeb/THuOuwQKo6O2UE=
X-Google-Smtp-Source: AGHT+IGGduACiEIOQpmwhyH8/CBCs9X+E557BYnWVGnwJ4/uveeC5IcfnE80OUTK4MWgmrc4/LU7giOX/hU6E6tbDyA=
X-Received: by 2002:a05:6808:3024:b0:3e0:4646:aa94 with SMTP id
5614622812f47-3e393962187mr1558923b6e.18.1727417843748; Thu, 26 Sep 2024
23:17:23 -0700 (PDT)
X-Mailman-Approved-At: Mon, 30 Sep 2024 00:00:48 -0400
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
View all headers

Dear Python Experts,

I am working with the Kenna Application's API to retrieve vulnerability
data. The API endpoint provides a single, massive JSON file in gzip format,
approximately 60 GB in size. Handling such a large dataset in one go is
proving to be quite challenging, especially in terms of memory management.

I am looking for guidance on how to efficiently stream this data and
process it in chunks using Python. Specifically, I am wondering if there’s
a way to use the requests library or any other libraries that would allow
us to pull data from the API endpoint in a memory-efficient manner.

Here are the relevant API endpoints from Kenna:

- Kenna API Documentation
<https://apidocs.kennasecurity.com/reference/welcome>
- Kenna Vulnerabilities Export
<https://apidocs.kennasecurity.com/reference/retrieve-data-export>

If anyone has experience with similar use cases or can offer any advice, it
would be greatly appreciated.

Thank you in advance for your help!

Best regards
Asif Ali

1

rocksolid light 0.9.8
clearnet tor