Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

Harp not on that string. -- William Shakespeare, "Henry VI"


comp / comp.lang.python / Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

SubjectAuthor
o Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from KeAsif Ali Hirekumbi

1
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API
From: Asif Ali Hirekumbi
Newsgroups: comp.lang.python
Date: Mon, 30 Sep 2024 06:41 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: asifali.ha@gmail.com (Asif Ali Hirekumbi)
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
GB) from Kenna API
Date: Mon, 30 Sep 2024 12:11:30 +0530
Lines: 62
Message-ID: <mailman.2.1727678506.3018.python-list@python.org>
References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
<CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
<CA+hg4Rhn8iX7rp0uC=MbOi+8g73wQ4y4=uV0dU0jHdDUz3jk4w@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de 7/WmX9I+066ddXj5JGWfGglm1+PTutzBlnIcnd5L1tWw==
Cancel-Lock: sha1:3FyX8WbA7ADj+3uoRIOG/f2ws+Y= sha256:Bmxk5lx2SmbCbFQTRZqUFFsCoZyHmpSbCJRgqaV+oNU=
Return-Path: <asifali.ha@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=Y+XcUmOw;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'url-ip:140.82/16': 0.03;
'stream': 0.04; 'subject:API': 0.07; 'python.': 0.08; 'cc:addr
:python-list': 0.09; 'email addr:python.org>': 0.09; 'json': 0.09;
'url:reference': 0.09; 'cc:no real name:2**0': 0.14; 'url:github':
0.14; 'import': 0.15; 'url:mailman': 0.15; 'memory': 0.15; 'url-
ip:140/8': 0.15; '2024': 0.16; 'ali': 0.16; 'dataset': 0.16;
'efficiently': 0.16; 'endpoint': 0.16; 'endpoints': 0.16; 'help!':
0.16; 'janhangeer': 0.16; 'mauritius': 0.16; 'received:mail-
ot1-x336.google.com': 0.16; 'single,': 0.16; 'size.': 0.16;
'wrote:': 0.16; 'python': 0.16; 'api': 0.17; 'github': 0.17;
'pull': 0.17; 'subject:Help': 0.17; 'guidance': 0.19; 'libraries':
0.19; 'cc:addr:python.org': 0.20; 'skip:p 30': 0.23; 'url-
ip:188.166.95.178/32': 0.25; 'url-ip:188.166.95/24': 0.25;
'url:listinfo': 0.25; 'cc:2**0': 0.25; 'url-ip:188.166/16': 0.25;
'anyone': 0.25; 'seems': 0.26; 'tried': 0.26; 'library': 0.26;
'greatly': 0.28; 'email addr:python.org&gt;': 0.28; 'requests':
0.28; 'blog': 0.30; 'url-ip:188/8': 0.31; 'wondering': 0.31;
'format,': 0.32; 'manner.': 0.32; 'python-list': 0.32; 'retrieve':
0.32; 'sep': 0.32; 'message-id:@mail.gmail.com': 0.32; 'but':
0.32; 'subject:for': 0.33; 'appreciated.': 0.34; 'header:In-Reply-
To:1': 0.34; 'received:google.com': 0.34; 'handling': 0.35;
'from:addr:gmail.com': 0.35; 'cases': 0.36; 'mon,': 0.36;
'subject:from': 0.37; 'using': 0.37; 'file': 0.38; 'way': 0.38;
'thanks': 0.38; 'quite': 0.39; 'use': 0.39; 'data.': 0.40; 'try':
0.40; 'best': 0.61; 'dear': 0.62; 'here': 0.62; 'experience':
0.64; 'your': 0.64; 'similar': 0.65; 'well': 0.65; 'export': 0.69;
'url-ip:lookup error': 0.70; 'terms': 0.70; 'subject:Data': 0.71;
'offer': 0.71; 'relevant': 0.73; 'email name:&lt;python-list':
0.84; 'management.': 0.84; 'massive': 0.84; 'proving': 0.84;
'subject: \n ': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1727678503; x=1728283303; darn=python.org;
h=cc:to:subject:message-id:date:from:in-reply-to:references
:mime-version:from:to:cc:subject:date:message-id:reply-to;
bh=cP1Iwa0K5vTh6ZRpongfQuqPAmY/QDC+5QT9RhgTRcg=;
b=Y+XcUmOwOuZd0/aLEN2uK2JH3QyKgh7/s8hytO+XFAi62u14DdRayAUa5Af6Htp1wv
3my+sw6/q84Uc6MWsa1cVdy5847lGRLO/J7ohLG9cRjFJRIwkyrT7iwBitgKZ9pt+M5w
KDesVg57HSWxnMf0EnYnD2uZuwsQYh3QV8mFAKqqxVKysSb3Pxi5vF0tMtmlT3kn8lOD
FRkcns9u5/u7zWtz8QwmaATvx0352VskiJzSn4rYDtSCqqL5UdIIUFXXmJMYt2E72Ym0
sfOT2od4PM8+Mj1HNTNjOae/GxvK2J/QrQohuHeLurhbNzoJ+cklS+c4280K8zBadXwK
qPcA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1727678503; x=1728283303;
h=cc:to:subject:message-id:date:from:in-reply-to:references
:mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
:reply-to;
bh=cP1Iwa0K5vTh6ZRpongfQuqPAmY/QDC+5QT9RhgTRcg=;
b=jqf16wjZaIqHSoJ4/dGKkGrjzY3FFEsMtrZ8edprrue5dWcIxJ+YdVIZb/I/segS0n
2hU7aBPXRnBsLz4/giF341llAWurib0pUsqhv7FkWGITZN7e3u3PugA2bvu98TGlBBfm
amnQop1mmIfcwlJ7VpArpSMpVFgZiZ6V+5177l4do3HS0bHdVqZkHcxhmM5F+cKtthJy
WELmqre8W+opMcIMAgGgLRs90wWpspirW6BnSRX0Gc3XRoDXbdYmxMwjgRWnNBjGsNP5
EsBS2Q7ZjofDsd6IgnlPEoes4lEXj42cyvAh1aT4aAleYmNtVi4hALeM4/bKbKaWId4d
EvjA==
X-Gm-Message-State: AOJu0YxDLh0346nUbHKw23nlK5hR5f6/BKTG89+7ZtjzZGqX4/iMaz0N
Bw7iUeuB9HvjAOblgI/YGYaiQWOhxr2IFToyvQmBx1otlogbhAyUnPuLQW3wQukh1DEsDjC/rlO
d6ol4DzBNuJaXEL7lns2OW1MJJC9vOw==
X-Google-Smtp-Source: AGHT+IE39Ha4HRZNz8/KZDZFEkwPdNYsb2VOgrUm9iWXoYfrKm/6363wIRHTMHqbwlNvLnAJVhRNRsA1gL165PJlKkE=
X-Received: by 2002:a05:6830:6d0c:b0:709:396c:c465 with SMTP id
46e09a7af769-714fbe90629mr7430098a34.18.1727678503404; Sun, 29 Sep 2024
23:41:43 -0700 (PDT)
In-Reply-To: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CA+hg4Rhn8iX7rp0uC=MbOi+8g73wQ4y4=uV0dU0jHdDUz3jk4w@mail.gmail.com>
X-Mailman-Original-References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
<CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
View all headers

Thanks Abdur Rahmaan.
I will give it a try !

Thanks
Asif

On Mon, Sep 30, 2024 at 11:19 AM Abdur-Rahmaan Janhangeer <
arj.python@gmail.com> wrote:

> Idk if you tried Polars, but it seems to work well with JSON data
>
> import polars as pl
> pl.read_json("file.json")
>
> Kind Regards,
>
> Abdur-Rahmaan Janhangeer
> about <https://compileralchemy.github.io/> | blog
> <https://www.pythonkitchen.com>
> github <https://github.com/Abdur-RahmaanJ>
> Mauritius
>
>
> On Mon, Sep 30, 2024 at 8:00 AM Asif Ali Hirekumbi via Python-list <
> python-list@python.org> wrote:
>
>> Dear Python Experts,
>>
>> I am working with the Kenna Application's API to retrieve vulnerability
>> data. The API endpoint provides a single, massive JSON file in gzip
>> format,
>> approximately 60 GB in size. Handling such a large dataset in one go is
>> proving to be quite challenging, especially in terms of memory management.
>>
>> I am looking for guidance on how to efficiently stream this data and
>> process it in chunks using Python. Specifically, I am wondering if there’s
>> a way to use the requests library or any other libraries that would allow
>> us to pull data from the API endpoint in a memory-efficient manner.
>>
>> Here are the relevant API endpoints from Kenna:
>>
>> - Kenna API Documentation
>> <https://apidocs.kennasecurity.com/reference/welcome>
>> - Kenna Vulnerabilities Export
>> <https://apidocs.kennasecurity.com/reference/retrieve-data-export>
>>
>> If anyone has experience with similar use cases or can offer any advice,
>> it
>> would be greatly appreciated.
>>
>> Thank you in advance for your help!
>>
>> Best regards
>> Asif Ali
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>

1

rocksolid light 0.9.8
clearnet tor