Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #275: Bit rot


comp / comp.lang.python / Re: help: pandas and 2d table

SubjectAuthor
* help: pandas and 2d tablejak
`* Re: help: pandas and 2d tableStefan Ram
 `* Re: help: pandas and 2d tablejak
  +- Re: help: pandas and 2d tableMats Wichmann
  `* Re: help: pandas and 2d tableTim Williams
   `* Re: help: pandas and 2d tableStefan Ram
    `* Re: help: pandas and 2d tablejak
     `* Re: help: pandas and 2d tableStefan Ram
      +- Re: help: pandas and 2d tablejak
      `- Re: help: pandas and 2d tableStefan Ram

1
Subject: help: pandas and 2d table
From: jak
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Fri, 12 Apr 2024 18:40 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam@please.ty (jak)
Newsgroups: comp.lang.python
Subject: help: pandas and 2d table
Date: Fri, 12 Apr 2024 20:40:09 +0200
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <uvbv6a$2gmc4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 12 Apr 2024 20:40:10 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="ab9b1404b7f2fc26ba8389e839944949";
logging-data="2644356"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+CiCmo8ZEa+b+WXCy3c9Li"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:2RAYhdFduGi90gxwqRWT8zHzD8Q=
X-Mozilla-News-Host: snews://news.eternal-september.org:563
View all headers

Hi everyone.
I state that I don't know anything about 'pandas' but I intuited that
it could do what I want. I get, through the "read_excel" method, a
table similar to this:

obj| foo1 foo2 foo3 foo4 foo5 foo6
-----------------------------------
foo1| aa ab zz ad ae af
|
foo2| ba bb bc bd zz bf
|
foo3| ca zz cc cd ce zz
|
foo4| da db dc dd de df
|
foo5| ea eb ec zz ee ef
|
foo6| fa fb fc fd fe ff

And I would like to get a result similar to this:

{
'zz':[('foo1','foo3'),
('foo2','foo5'),
('foo3','foo2'),
('foo3','foo6'),
('foo5','foo4')
]
}

Would you show me the path, please?
Thank you in advance.

Subject: Re: help: pandas and 2d table
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Fri, 12 Apr 2024 19:22 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: 12 Apr 2024 19:22:54 GMT
Organization: Stefan Ram
Lines: 29
Expires: 1 Feb 2025 11:59:58 GMT
Message-ID: <pandas-20240412202220@ram.dialup.fu-berlin.de>
References: <uvbv6a$2gmc4$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de OqLWatpg6tzn8Rl04znm7ASTUx9Gec7x4JrpCauoEs3IKg
Cancel-Lock: sha1:4Cc0qbUgHhw+hQRal+To09BHQLI= sha256:E0+BHf/A4FUGysxg5rdr+c2thQBBSKi9X9Rzyl93F4s=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

jak <nospam@please.ty> wrote or quoted:
>Would you show me the path, please?

I was not able to read xls here, so I used csv instead; Warning:
the script will overwrite file "file_20240412201813_tmp_DML.csv"!

import pandas as pd

with open( 'file_20240412201813_tmp_DML.csv', 'w' )as out:
print( '''obj,foo1,foo2,foo3,foo4,foo5,foo6
foo1,aa,ab,zz,ad,ae,af
foo2,ba,bb,bc,bd,zz,bf
foo3,ca,zz,cc,cd,ce,zz
foo4,da,db,dc,dd,de,df
foo5,ea,eb,ec,zz,ee,ef
foo6,fa,fb,fc,fd,fe,ff''', file=out )

df = pd.read_csv( 'file_20240412201813_tmp_DML.csv' )

result = {}

for rownum, row in df.iterrows():
iterator = row.items()
_, rowname = next( iterator )
for colname, value in iterator:
if value not in result: result[ value ]= []
result[ value ].append( ( rowname, colname ))

print( result )

Subject: Re: help: pandas and 2d table
From: jak
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Sat, 13 Apr 2024 13:00 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: Sat, 13 Apr 2024 15:00:35 +0200
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <uvdvlj$30soq$1@dont-email.me>
References: <uvbv6a$2gmc4$1@dont-email.me>
<pandas-20240412202220@ram.dialup.fu-berlin.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 13 Apr 2024 15:00:35 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="85eb45bab8bf5c02392665bafaa869d9";
logging-data="3175194"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/fOGygLc4Xa+kyFrIaiU0J"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:OR8+pX8WCwcJR36ov4SM6SLm6eA=
In-Reply-To: <pandas-20240412202220@ram.dialup.fu-berlin.de>
View all headers

Stefan Ram ha scritto:
> jak <nospam@please.ty> wrote or quoted:
>> Would you show me the path, please?
>
> I was not able to read xls here, so I used csv instead; Warning:
> the script will overwrite file "file_20240412201813_tmp_DML.csv"!
>
> import pandas as pd
>
> with open( 'file_20240412201813_tmp_DML.csv', 'w' )as out:
> print( '''obj,foo1,foo2,foo3,foo4,foo5,foo6
> foo1,aa,ab,zz,ad,ae,af
> foo2,ba,bb,bc,bd,zz,bf
> foo3,ca,zz,cc,cd,ce,zz
> foo4,da,db,dc,dd,de,df
> foo5,ea,eb,ec,zz,ee,ef
> foo6,fa,fb,fc,fd,fe,ff''', file=out )
>
> df = pd.read_csv( 'file_20240412201813_tmp_DML.csv' )
>
> result = {}
>
> for rownum, row in df.iterrows():
> iterator = row.items()
> _, rowname = next( iterator )
> for colname, value in iterator:
> if value not in result: result[ value ]= []
> result[ value ].append( ( rowname, colname ))
>
> print( result )
>

In reality what I wanted to achieve was this:

what = 'zz'
result = {what: []}

for rownum, row in df.iterrows():
iterator = row.items()
_, rowname = next(iterator)
for colname, value in iterator:
if value == what:
result[what] += [(rowname, colname)]
print(result)

In any case, thank you again for pointing me in the right direction. I
had lost myself looking for a pandas method that would do this in a
single shot or almost.

Subject: Re: help: pandas and 2d table
From: Mats Wichmann
Newsgroups: comp.lang.python
Date: Sat, 13 Apr 2024 17:07 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: mats@wichmann.us (Mats Wichmann)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: Sat, 13 Apr 2024 11:07:37 -0600
Lines: 65
Message-ID: <mailman.103.1713028072.3468.python-list@python.org>
References: <uvbv6a$2gmc4$1@dont-email.me>
<pandas-20240412202220@ram.dialup.fu-berlin.de>
<uvdvlj$30soq$1@dont-email.me>
<8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de YBfwJ1RgGGllhN+lardeHQQKMFiX0u2g0/1CFt4XqXtg==
Cancel-Lock: sha1:zER8TBUtFWXTVKqHrtaO1884iMQ= sha256:AzrHHVfZQcGCPJtSVKOReGu0aOZA8LXWq9U3Cd6cXuY=
Return-Path: <mats@wichmann.us>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="1024-bit key; unprotected key"
header.d=pobox.com header.i=@pobox.com header.b=LGF7Av0w;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.003
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'csv': 0.03; 'this:': 0.03;
'row': 0.05; 'please?': 0.07; 'ram': 0.07; 'skip:\xc2 30': 0.07;
'numpy': 0.09; 'pandas': 0.09; 'shot': 0.09; 'skip:\xc2 20': 0.09;
'received:173': 0.13; 'import': 0.15; 'does,': 0.16; 'numpy.':
0.16; 'print(': 0.16; 'warning:': 0.16; 'wrote:': 0.16; 'to:addr
:python-list': 0.20; 'stefan': 0.26; '>>>': 0.28; 'header:User-
Agent:1': 0.30; "doesn't": 0.32; 'here,': 0.32; 'python-list':
0.32; 'but': 0.32; 'script': 0.33; 'able': 0.34; 'header:In-Reply-
To:1': 0.34; 'received:192.168': 0.37; 'file': 0.38; 'read': 0.38;
'single': 0.39; 'rest': 0.39; 'wrote': 0.39; 'match': 0.40;
'method': 0.61; 'skip:\xc2 10': 0.62; 'lost': 0.64; 'right': 0.68;
'skip:f 20': 0.75; '8bit%:100': 0.76; 'direction.': 0.84;
'pointing': 0.84; 'thing?': 0.84;
'\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=message-id
:date:mime-version:subject:to:references:from:in-reply-to
:content-type:content-transfer-encoding; s=sasl; bh=NiaqkXhGv96m
XizQGEajLwK/mNiQNqJErfasB9Lm94A=; b=LGF7Av0wRsp+lgrVyhMnPB7JTaYf
8zIat/VMucO4ryxuogF95IysTtPG6QUKGaduPxn6C91cSlo+3O678JmTQymrwJY2
fQq7ntA6Zi7imsYUwgk2xOG4WquezK+ZH3ec0CTr/oVw+54N3GHMGxOBHtd0SAnQ
bEkBV0oRwcvsYvw=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=wichmann.us;
h=message-id:date:mime-version:subject:to:references:from:in-reply-to:content-type:content-transfer-encoding;
s=2018-07.pbsmtp; bh=lvg88bvAwSV6fd/xY3z0VC4zAmGbPkckBuSAQJucfPY=;
b=ndEZlDYOfOM6jpCVMkvuhuNpYQ1Q8Wb+c0o05zfJu6TOQy/OB6slQtZpLszK9NpE66By2CiyybsZf5dXVPb1L8t+Ag8GN9Qqn3KCdLAsy0VMyMl+fgzAJN+19DZc3ZcDR1z6eGpoNnzVEz2YL9v2G/8m1sZXHOPi0sRr6/e7R4U=
User-Agent: Mozilla Thunderbird
Content-Language: en-US
Autocrypt: addr=mats@wichmann.us; keydata=
xsDiBD9xp6oRBAC1vd3YI8Gcr1CxpV1gldNQu0uQsNaICDk+Ai3+R163s/P83JOYG+SBEA3P
v7iZx70qpQ3RzP7KrjF1Nm6j0em9ccUX2fPQUCAxXw5Hiq7CSMiwQQZRI6shcnyMh9XTKViT
WK5MrKDyvjDEn7epjKzKwPS5SG039l6XaOKU0A4uGwCgsNqUQqC0gMMcbKlJV8ql58iKmbMD
/ii8FPQrXmyS/FnsPs7UddV5qMHKm7NUH5oiKuMVyakInRyq9iIxuu3D4Ec6mWRKcGsjmIkW
HXCSz0aefs6dsqNqpU54cYioJ3wP5LzHK7oclgJPryVt5Qezbdutf8SQf8gVkaNIlkxwGUzi
bKTZ6CHzwlz9nNgeel0XPUcZzFxGA/4paeCg2rMSVuAhUQbsLYHu4XzTs9P16zaXkrtxc4m5
b+BF5xsLgTpyO5l859XudS2Gp+7/Y37dAU4QlyGGOboWmF1y9U5DnzBwG8ghsnym+ga58MJh
LdRdQQ6xQolCpEXOuzm40f2r5uMxF3KOJ7WpIPuGAkeCPru9BmlATH+zOs0gTWF0cyBXaWNo
bWFubiA8bWF0c0B3aWNobWFubi51cz7CYQQTEQIAIQIbAwYLCQgHAwIDFQIDAxYCAQIeAQIX
gAUCT0VyZwIZAQAKCRDAMaCQc9hUxiZBAJ9cWziGp7hVfsu5T+cQptc3rLNndQCgrZh8u5LW
BfJ5e/Y+3PwZ8UEm+ELOwE0EP5is8BAEAMtwzcA8TYf5UTjDMgwcSNoErTc9ag+IX05QFgL8
aF8sfJRv5atcitqQy0gSIsOzI+L/AFdPN/+QQI3dL1tCq14t32KPDtigDhzm6jVPXX5z+V9u
xnD8XTp+ZvNcWoHXjViM8aXeLLEiCpiVCho307h3XShvqoKINWRQWeAsKKDDAAMFA/48zaey
wiiEyvI0meJ1KkNHxdLP0yLODr1WV6j9xkPkLWOaIDw7dlwEOlF1N1YtZ2wa0p1wsttdIbIx
ffgwXmcH4zrdxUIMz3U0BqYzk5H+5cYFXECXTFVOmweS+JECYMj80PjRoKCO1eVO1N30zksB
36NnhZWPRWIhjK3ZarIYH8JGBBgRAgAGBQI/mKzwAAoJEMAxoJBz2FTG6VEAoKDYHfDp5Q3q
PuPvPahCE9HsXMgAAJ9INTqcLSJrOfyJ8q95nBO1T26H2Q==
In-Reply-To: <uvdvlj$30soq$1@dont-email.me>
X-Pobox-Relay-ID: 557F2B9E-F9B8-11EE-B79A-F515D2CDFF5E-81526775!pb-smtp20.pobox.com
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us>
X-Mailman-Original-References: <uvbv6a$2gmc4$1@dont-email.me>
<pandas-20240412202220@ram.dialup.fu-berlin.de>
<uvdvlj$30soq$1@dont-email.me>
View all headers

On 4/13/24 07:00, jak via Python-list wrote:
> Stefan Ram ha scritto:
>> jak <nospam@please.ty> wrote or quoted:
>>> Would you show me the path, please?
>>
>>    I was not able to read xls here, so I used csv instead; Warning:
>>    the script will overwrite file "file_20240412201813_tmp_DML.csv"!
>>
>> import pandas as pd
>>
>> with open( 'file_20240412201813_tmp_DML.csv', 'w' )as out:
>>      print( '''obj,foo1,foo2,foo3,foo4,foo5,foo6
>> foo1,aa,ab,zz,ad,ae,af
>> foo2,ba,bb,bc,bd,zz,bf
>> foo3,ca,zz,cc,cd,ce,zz
>> foo4,da,db,dc,dd,de,df
>> foo5,ea,eb,ec,zz,ee,ef
>> foo6,fa,fb,fc,fd,fe,ff''', file=out )
>>
>> df = pd.read_csv( 'file_20240412201813_tmp_DML.csv' )
>>
>> result = {}
>>
>> for rownum, row in df.iterrows():
>>      iterator = row.items()
>>      _, rowname = next( iterator )
>>      for colname, value in iterator:
>>          if value not in result: result[ value ]= []
>>          result[ value ].append( ( rowname, colname ))
>>
>> print( result )
>>
>
> In reality what I wanted to achieve was this:
>
>     what = 'zz'
>     result = {what: []}
>
>     for rownum, row in df.iterrows():
>         iterator = row.items()
>         _, rowname = next(iterator)
>         for colname, value in iterator:
>             if value == what:
>                 result[what] += [(rowname, colname)]
>     print(result)
>
> In any case, thank you again for pointing me in the right direction. I
> had lost myself looking for a pandas method that would do this in a
> single shot or almost.
>
>

doesn't Pandas have a "where" method that can do this kind of thing? Or
doesn't it match what you are looking for? Pretty sure numpy does, but
that's a lot to bring in if you don't need the rest of numpy.

Subject: Re: help: pandas and 2d table
From: Tim Williams
Newsgroups: comp.lang.python
Date: Sat, 13 Apr 2024 17:14 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: tjandacw@gmail.com (Tim Williams)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: Sat, 13 Apr 2024 13:14:35 -0400
Lines: 12
Message-ID: <mailman.104.1713028490.3468.python-list@python.org>
References: <uvbv6a$2gmc4$1@dont-email.me>
<pandas-20240412202220@ram.dialup.fu-berlin.de>
<uvdvlj$30soq$1@dont-email.me>
<8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us>
<CAO39LaTUr5_KC5PLqb9_PZs4cQMombFUVf21K9o5HhkrWJfKnw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de btumV/a9PwLzrI6AK0ZUpwsgekrg1aXEZ7GeVJw2GiRw==
Cancel-Lock: sha1:VEemta5/h0NondGWyk40GDN3iDI= sha256:ouU8ojLHNBXELvmJVKy7u9r1OiP5xdvBBUqmAMH+zs4=
Return-Path: <tjandacw@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=TYO0pFG0;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.012
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'email addr:python.org>':
0.09; 'numpy': 0.09; 'pandas': 0.09; 'url:reference': 0.09;
'url:stable': 0.09; '2024': 0.16; 'does,': 0.16; 'mats': 0.16;
'numpy.': 0.16; 'url-ip:104.26.0.204/32': 0.16; 'url-
ip:104.26.1.204/32': 0.16; 'url-ip:172.67.71.236/32': 0.16;
'url:pandas': 0.16; 'url:pydata': 0.16; 'wichmann': 0.16;
'wrote:': 0.16; 'to:addr:python-list': 0.20; 'sat,': 0.22;
'\xe2\x80\x94': 0.22; 'email addr:python.org&gt;': 0.28;
"doesn't": 0.32; '13,': 0.32; 'python-list': 0.32; 'message-
id:@mail.gmail.com': 0.32; 'but': 0.32; 'header:In-Reply-To:1':
0.34; 'received:google.com': 0.34; 'from:addr:gmail.com': 0.35;
'8bit%:14': 0.38; 'rest': 0.39; 'match': 0.40; 'method': 0.61;
'url-ip:104.26/16': 0.64; 'url-ip:104.26.0/24': 0.76; 'url-
ip:104.26.1/24': 0.76; 'url:api': 0.84; 'email name:&lt;python-
list': 0.84; 'thing?': 0.84; 'url:where': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1713028488; x=1713633288; darn=python.org;
h=to:subject:message-id:date:from:in-reply-to:references:mime-version
:from:to:cc:subject:date:message-id:reply-to;
bh=8WBgV9YMjHDLcO2B1FYd5gyhnxvSRJFEjiGk0IrPnY4=;
b=TYO0pFG0/90zHFWUVFVE7Wg66/qrKSd+algaYF31XWio59EDCTpVJKATVgzLflCbxi
rGB7KCoQjrbdzDKf+C3L/WT2WjRk07QqkKxfUq3PWOCaB637M6Rf+eEOrwR9hkHdgYMI
oq5HYJf4BQGlQVFSjRAwaa8H7E0ntkjnoqNxqSripbQg/TgtfqDpApDb6E23U7GSsTUx
JtL/ZKWF7/unPOnEr/waddzvXe9RfixD6WaHQQoJQ/dKZT4xHgyKtL6ZRHz0o7M177zn
Sm5Bl0S6JrFv8ET5/vh9uSHKbf9wiPGJSNipxan3oTrGJ1jfo3Yyw/P/5wm74kUy7PTJ
S6nA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1713028488; x=1713633288;
h=to:subject:message-id:date:from:in-reply-to:references:mime-version
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=8WBgV9YMjHDLcO2B1FYd5gyhnxvSRJFEjiGk0IrPnY4=;
b=fSK62nRpNzKRXR5/jmYeBoTXmt8664dsp7ljt/wTa2eRMgGzEtR6f023+pVWZszMQT
lf4YoZ+pqTFL/jS1YH9bN87djRk9gLmAsWc4Kzd3znGzHgC9EwOYpn2BBClU0YW+/3XP
4yVFMBTmJW57kiyazAuuF3uGylJZuL3/8Fd9VO0pV1HsLVp9aN/8yKeH9XiPWKVNIvjD
If8XEULMZ/SI/AubTiJlDVnH0GqIw0hiiy42Y0Wxc2hHwNT/0ed4e61dqZqKb9jRDNSd
Z3hxL2usO6strB/H/OM1VByAmxpscErI66HhSmz14oV+I72zaZvUW9peWxpko5pHkuqp
YiVg==
X-Gm-Message-State: AOJu0Yyaju1J9wh4qH4sFg7E+N5JYkjl1MWGkHtEtlinbybE9INFlnHQ
Xf7bs2kiApu8kU+4Zraxnu6bAL0t/s0td8FU2PM6lzoxljBZJlYcGT6Hzw/8O4yaLwPyP0BpwXj
o9zN7i2/Ujp71oAbO+4YeB65mF8MrYHA=
X-Google-Smtp-Source: AGHT+IHv0sun4XvatRX3Sg766eNdHnEF3p2fIuCSNaei3jZSdG+LGHuupeqGD5sBez5i5Z3g7Y0FRc1+qD3JhD7UxIw=
X-Received: by 2002:a05:6102:1589:b0:473:1582:ee9f with SMTP id
g9-20020a056102158900b004731582ee9fmr8042707vsv.33.1713028487815; Sat, 13 Apr
2024 10:14:47 -0700 (PDT)
In-Reply-To: <8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us>
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAO39LaTUr5_KC5PLqb9_PZs4cQMombFUVf21K9o5HhkrWJfKnw@mail.gmail.com>
X-Mailman-Original-References: <uvbv6a$2gmc4$1@dont-email.me>
<pandas-20240412202220@ram.dialup.fu-berlin.de>
<uvdvlj$30soq$1@dont-email.me>
<8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us>
View all headers

On Sat, Apr 13, 2024 at 1:10 PM Mats Wichmann via Python-list <
python-list@python.org> wrote:

> On 4/13/24 07:00, jak via Python-list wrote:
>
> doesn't Pandas have a "where" method that can do this kind of thing? Or
> doesn't it match what you are looking for? Pretty sure numpy does, but
> that's a lot to bring in if you don't need the rest of numpy.
>
> pandas.DataFrame.where — pandas 2.2.2 documentation (pydata.org)
<https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html#pandas.DataFrame.where>

Subject: Re: help: pandas and 2d table
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Sat, 13 Apr 2024 18:39 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: 13 Apr 2024 18:39:51 GMT
Organization: Stefan Ram
Lines: 30
Expires: 1 Feb 2025 11:59:58 GMT
Message-ID: <pandas-20240413193824@ram.dialup.fu-berlin.de>
References: <uvbv6a$2gmc4$1@dont-email.me> <pandas-20240412202220@ram.dialup.fu-berlin.de> <uvdvlj$30soq$1@dont-email.me> <8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us> <CAO39LaTUr5_KC5PLqb9_PZs4cQMombFUVf21K9o5HhkrWJfKnw@mail.gmail.com> <mailman.104.1713028490.3468.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de le6+qIPaQeZjGKZs29b1dg5D3Tolh5VPsqM4dTfl9UBLbN
Cancel-Lock: sha1:TSEOSZ8zLH2E7iCoeeKsLYIJ6Gk= sha256:SjD/bsAg7MLbvGrB0qYbIclzLGpkUzYCvIcao2bzACY=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

Tim Williams <tjandacw@gmail.com> wrote or quoted:
>e.where.html#pandas.DataFrame.where>

Threw together a quick thing with "where", see how it looks.

import pandas as pd

# Warning! Will overwrite the file 'file_20240412201813_tmp_DML.csv'!
with open( 'file_20240412201813_tmp_DML.csv', 'w' )as out:
print( '''obj,foo1,foo2,foo3,foo4,foo5,foo6
foo1,aa,ab,zz,ad,ae,af
foo2,ba,bb,bc,bd,zz,bf
foo3,ca,zz,cc,cd,ce,zz
foo4,da,db,dc,dd,de,df
foo5,ea,eb,ec,zz,ee,ef
foo6,fa,fb,fc,fd,fe,ff''', file=out )

# Note the "index_col=0" below, which is important here!
df = pd.read_csv( 'file_20240412201813_tmp_DML.csv', index_col=0 )

df = df.where( df == 'zz' ).stack().reset_index()
result ={ 'zz': list( zip( df.iloc[ :, 0 ], df.iloc[ :, 1 ]))}

print( result )

Prints here:

{'zz': [('foo1', 'foo3'), ('foo2', 'foo5'), ('foo3', 'foo2'), ('foo3', 'foo6'), ('foo5', 'foo4')]}

.

Subject: Re: help: pandas and 2d table
From: jak
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Sat, 13 Apr 2024 21:35 UTC
References: 1 2 3 4 5 6 7
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: Sat, 13 Apr 2024 23:35:57 +0200
Organization: A noiseless patient Spider
Lines: 6
Message-ID: <uvetru$375r4$1@dont-email.me>
References: <uvbv6a$2gmc4$1@dont-email.me>
<pandas-20240412202220@ram.dialup.fu-berlin.de>
<uvdvlj$30soq$1@dont-email.me>
<8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us>
<CAO39LaTUr5_KC5PLqb9_PZs4cQMombFUVf21K9o5HhkrWJfKnw@mail.gmail.com>
<mailman.104.1713028490.3468.python-list@python.org>
<pandas-20240413193824@ram.dialup.fu-berlin.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 13 Apr 2024 23:35:58 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2854b3bf011acad7d59e0d49c2f59463";
logging-data="3381092"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/HGdQoZ1gllgoON/UXal1u"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:VNpoCiFCREvxNXrMvlkGzAM13gI=
In-Reply-To: <pandas-20240413193824@ram.dialup.fu-berlin.de>
View all headers

Stefan Ram ha scritto:
> df = df.where( df == 'zz' ).stack().reset_index()
> result ={ 'zz': list( zip( df.iloc[ :, 0 ], df.iloc[ :, 1 ]))}

Since I don't know Pandas, I will need a month at least to understand
these 2 lines of code. Thanks again.

Subject: Re: help: pandas and 2d table
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Sun, 14 Apr 2024 08:58 UTC
References: 1 2 3 4 5 6 7 8
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: 14 Apr 2024 08:58:16 GMT
Organization: Stefan Ram
Lines: 167
Expires: 1 Feb 2025 11:59:58 GMT
Message-ID: <pandas-20240414094956@ram.dialup.fu-berlin.de>
References: <uvbv6a$2gmc4$1@dont-email.me> <pandas-20240412202220@ram.dialup.fu-berlin.de> <uvdvlj$30soq$1@dont-email.me> <8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us> <CAO39LaTUr5_KC5PLqb9_PZs4cQMombFUVf21K9o5HhkrWJfKnw@mail.gmail.com> <mailman.104.1713028490.3468.python-list@python.org> <pandas-20240413193824@ram.dialup.fu-berlin.de> <uvetru$375r4$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de Lod5LRV7nSkg/FvapUzQVACIaE0CeYwptMjIr+RlyAcjN8
Cancel-Lock: sha1:UaFsmj7t9Lu1R/B0cxBzYjSXeNM= sha256:RE1CUuICWGzfBpurQldzINVL5Dn1OtE+rFkPYIzeq4A=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

jak <nospam@please.ty> wrote or quoted:
>Stefan Ram ha scritto:
>>df = df.where( df == 'zz' ).stack().reset_index()
>>result ={ 'zz': list( zip( df.iloc[ :, 0 ], df.iloc[ :, 1 ]))}
>Since I don't know Pandas, I will need a month at least to understand
>these 2 lines of code. Thanks again.

Here's a technique to better understand such code:

Transform it into a program with small statements and small
expressions with no more than one call per statement if possible.
(After each litte change check that the output stays the same.)

import pandas as pd

# Warning! Will overwrite the file 'file_20240412201813_tmp_DML.csv'!
with open( 'file_20240412201813_tmp_DML.csv', 'w' )as out:
print( '''obj,foo1,foo2,foo3,foo4,foo5,foo6
foo1,aa,ab,zz,ad,ae,af
foo2,ba,bb,bc,bd,zz,bf
foo3,ca,zz,cc,cd,ce,zz
foo4,da,db,dc,dd,de,df
foo5,ea,eb,ec,zz,ee,ef
foo6,fa,fb,fc,fd,fe,ff''', file=out )
# Note the "index_col=0" below, which is important here!
df = pd.read_csv( 'file_20240412201813_tmp_DML.csv', index_col=0 )

selection = df.where( df == 'zz' )
selection_stack = selection.stack()
df = selection_stack.reset_index()
df0 = df.iloc[ :, 0 ]
df1 = df.iloc[ :, 1 ]
z = zip( df0, df1 )
l = list( z )
result ={ 'zz': l }
print( result )

I suggest to next insert print statements to print each intermediate
value:

# Note the "index_col=0" below, which is important here!
df = pd.read_csv( 'file_20240412201813_tmp_DML.csv', index_col=0 )
print( 'df = \n', type( df ), ':\n"', df, '"\n' )

selection = df.where( df == 'zz' )
print( "result of where( df == 'zz' ) = \n", type( selection ), ':\n"',
selection, '"\n' )

selection_stack = selection.stack()
print( 'result of stack() = \n', type( selection_stack ), ':\n"',
selection_stack, '"\n' )

df = selection_stack.reset_index()
print( 'result of reset_index() = \n', type( df ), ':\n"', df, '"\n' )

df0 = df.iloc[ :, 0 ]
print( 'value of .iloc[ :, 0 ]= \n', type( df0 ), ':\n"', df0, '"\n' )

df1 = df.iloc[ :, 1 ]
print( 'value of .iloc[ :, 1 ] = \n', type( df1 ), ':\n"', df1, '"\n' )

z = zip( df0, df1 )
print( 'result of zip( df0, df1 )= \n', type( z ), ':\n"', z, '"\n' )

l = list( z )
print( 'result of list( z )= \n', type( l ), ':\n"', l, '"\n' )

result ={ 'zz': l }
print( "value of { 'zz': l }= \n", type( result ), ':\n"',
result, '"\n' )

print( result )

Now you can see what each single step does!

df =
<class 'pandas.core.frame.DataFrame'> :
" foo1 foo2 foo3 foo4 foo5 foo6
obj
foo1 aa ab zz ad ae af
foo2 ba bb bc bd zz bf
foo3 ca zz cc cd ce zz
foo4 da db dc dd de df
foo5 ea eb ec zz ee ef
foo6 fa fb fc fd fe ff "

result of where( df == 'zz' ) =
<class 'pandas.core.frame.DataFrame'> :
" foo1 foo2 foo3 foo4 foo5 foo6
obj
foo1 NaN NaN zz NaN NaN NaN
foo2 NaN NaN NaN NaN zz NaN
foo3 NaN zz NaN NaN NaN zz
foo4 NaN NaN NaN NaN NaN NaN
foo5 NaN NaN NaN zz NaN NaN
foo6 NaN NaN NaN NaN NaN NaN "

result of stack() =
<class 'pandas.core.series.Series'> :
" obj
foo1 foo3 zz
foo2 foo5 zz
foo3 foo2 zz
foo6 zz
foo5 foo4 zz
dtype: object "

result of reset_index() =
<class 'pandas.core.frame.DataFrame'> :
" obj level_1 0
0 foo1 foo3 zz
1 foo2 foo5 zz
2 foo3 foo2 zz
3 foo3 foo6 zz
4 foo5 foo4 zz "

value of .iloc[ :, 0 ]=
<class 'pandas.core.series.Series'> :
" 0 foo1
1 foo2
2 foo3
3 foo3
4 foo5
Name: obj, dtype: object "

value of .iloc[ :, 1 ] =
<class 'pandas.core.series.Series'> :
" 0 foo3
1 foo5
2 foo2
3 foo6
4 foo4
Name: level_1, dtype: object "

result of zip( df0, df1 )=
<class 'zip'> :
" <zip object at 0x000000000B3B9548> "

result of list( z )=
<class 'list'> :
" [('foo1', 'foo3'), ('foo2', 'foo5'), ('foo3', 'foo2'), ('foo3', 'foo6'), ('foo5', 'foo4')] "

value of { 'zz': l }=
<class 'dict'> :
" {'zz': [('foo1', 'foo3'), ('foo2', 'foo5'), ('foo3', 'foo2'), ('foo3', 'foo6'), ('foo5', 'foo4')]} "

{'zz': [('foo1', 'foo3'), ('foo2', 'foo5'), ('foo3', 'foo2'), ('foo3', 'foo6'), ('foo5', 'foo4')]}

The script reads a CSV file and stores the data in a Pandas
DataFrame object named "df". The "index_col=0" parameter tells
Pandas to use the first column as the index for the DataFrame,
which is kinda like column headers.

The "where" creates a new DataFrame selection that contains
the same data as df, but with all values replaced by NaN (Not
a Number) except for the values that are equal to 'zz'.

"stack" returns a Series with a multi-level index created
by pivoting the columns. Here it gives a Series with the
row-col-addresses of a all the non-NaN values. The general
meaning of "stack" might be the most complex operation of
this script. It's explained in the pandas manual (see there).

"reset_index" then just transforms this Series back into a
DataFrame, and ".iloc[ :, 0 ]" and ".iloc[ :, 1 ]" are the
first and second column, respectively, of that DataFrame. These
then are zipped to get the desired form as a list of pairs.

Subject: Re: help: pandas and 2d table
From: jak
Newsgroups: comp.lang.python
Organization: A noiseless patient Spider
Date: Mon, 15 Apr 2024 06:05 UTC
References: 1 2 3 4 5 6 7 8 9
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: Mon, 15 Apr 2024 08:05:18 +0200
Organization: A noiseless patient Spider
Lines: 177
Message-ID: <uvig30$4mnd$1@dont-email.me>
References: <uvbv6a$2gmc4$1@dont-email.me>
<pandas-20240412202220@ram.dialup.fu-berlin.de>
<uvdvlj$30soq$1@dont-email.me>
<8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us>
<CAO39LaTUr5_KC5PLqb9_PZs4cQMombFUVf21K9o5HhkrWJfKnw@mail.gmail.com>
<mailman.104.1713028490.3468.python-list@python.org>
<pandas-20240413193824@ram.dialup.fu-berlin.de>
<uvetru$375r4$1@dont-email.me>
<pandas-20240414094956@ram.dialup.fu-berlin.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 15 Apr 2024 08:05:27 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b83cf2f57a5618151709e19f1f1207c1";
logging-data="154349"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ntx+kuujYhZV3UgeukLDe"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:QvNK++UzxHPs4RiPUmcgFJJl+XU=
In-Reply-To: <pandas-20240414094956@ram.dialup.fu-berlin.de>
View all headers

Stefan Ram ha scritto:
> jak <nospam@please.ty> wrote or quoted:
>> Stefan Ram ha scritto:
>>> df = df.where( df == 'zz' ).stack().reset_index()
>>> result ={ 'zz': list( zip( df.iloc[ :, 0 ], df.iloc[ :, 1 ]))}
>> Since I don't know Pandas, I will need a month at least to understand
>> these 2 lines of code. Thanks again.
>
> Here's a technique to better understand such code:
>
> Transform it into a program with small statements and small
> expressions with no more than one call per statement if possible.
> (After each litte change check that the output stays the same.)
>
> import pandas as pd
>
> # Warning! Will overwrite the file 'file_20240412201813_tmp_DML.csv'!
> with open( 'file_20240412201813_tmp_DML.csv', 'w' )as out:
> print( '''obj,foo1,foo2,foo3,foo4,foo5,foo6
> foo1,aa,ab,zz,ad,ae,af
> foo2,ba,bb,bc,bd,zz,bf
> foo3,ca,zz,cc,cd,ce,zz
> foo4,da,db,dc,dd,de,df
> foo5,ea,eb,ec,zz,ee,ef
> foo6,fa,fb,fc,fd,fe,ff''', file=out )
> # Note the "index_col=0" below, which is important here!
> df = pd.read_csv( 'file_20240412201813_tmp_DML.csv', index_col=0 )
>
> selection = df.where( df == 'zz' )
> selection_stack = selection.stack()
> df = selection_stack.reset_index()
> df0 = df.iloc[ :, 0 ]
> df1 = df.iloc[ :, 1 ]
> z = zip( df0, df1 )
> l = list( z )
> result ={ 'zz': l }
> print( result )
>
> I suggest to next insert print statements to print each intermediate
> value:
>
> # Note the "index_col=0" below, which is important here!
> df = pd.read_csv( 'file_20240412201813_tmp_DML.csv', index_col=0 )
> print( 'df = \n', type( df ), ':\n"', df, '"\n' )
>
> selection = df.where( df == 'zz' )
> print( "result of where( df == 'zz' ) = \n", type( selection ), ':\n"',
> selection, '"\n' )
>
> selection_stack = selection.stack()
> print( 'result of stack() = \n', type( selection_stack ), ':\n"',
> selection_stack, '"\n' )
>
> df = selection_stack.reset_index()
> print( 'result of reset_index() = \n', type( df ), ':\n"', df, '"\n' )
>
> df0 = df.iloc[ :, 0 ]
> print( 'value of .iloc[ :, 0 ]= \n', type( df0 ), ':\n"', df0, '"\n' )
>
> df1 = df.iloc[ :, 1 ]
> print( 'value of .iloc[ :, 1 ] = \n', type( df1 ), ':\n"', df1, '"\n' )
>
> z = zip( df0, df1 )
> print( 'result of zip( df0, df1 )= \n', type( z ), ':\n"', z, '"\n' )
>
> l = list( z )
> print( 'result of list( z )= \n', type( l ), ':\n"', l, '"\n' )
>
> result ={ 'zz': l }
> print( "value of { 'zz': l }= \n", type( result ), ':\n"',
> result, '"\n' )
>
> print( result )
>
> Now you can see what each single step does!
>
> df =
> <class 'pandas.core.frame.DataFrame'> :
> " foo1 foo2 foo3 foo4 foo5 foo6
> obj
> foo1 aa ab zz ad ae af
> foo2 ba bb bc bd zz bf
> foo3 ca zz cc cd ce zz
> foo4 da db dc dd de df
> foo5 ea eb ec zz ee ef
> foo6 fa fb fc fd fe ff "
>
> result of where( df == 'zz' ) =
> <class 'pandas.core.frame.DataFrame'> :
> " foo1 foo2 foo3 foo4 foo5 foo6
> obj
> foo1 NaN NaN zz NaN NaN NaN
> foo2 NaN NaN NaN NaN zz NaN
> foo3 NaN zz NaN NaN NaN zz
> foo4 NaN NaN NaN NaN NaN NaN
> foo5 NaN NaN NaN zz NaN NaN
> foo6 NaN NaN NaN NaN NaN NaN "
>
> result of stack() =
> <class 'pandas.core.series.Series'> :
> " obj
> foo1 foo3 zz
> foo2 foo5 zz
> foo3 foo2 zz
> foo6 zz
> foo5 foo4 zz
> dtype: object "
>
> result of reset_index() =
> <class 'pandas.core.frame.DataFrame'> :
> " obj level_1 0
> 0 foo1 foo3 zz
> 1 foo2 foo5 zz
> 2 foo3 foo2 zz
> 3 foo3 foo6 zz
> 4 foo5 foo4 zz "
>
> value of .iloc[ :, 0 ]=
> <class 'pandas.core.series.Series'> :
> " 0 foo1
> 1 foo2
> 2 foo3
> 3 foo3
> 4 foo5
> Name: obj, dtype: object "
>
> value of .iloc[ :, 1 ] =
> <class 'pandas.core.series.Series'> :
> " 0 foo3
> 1 foo5
> 2 foo2
> 3 foo6
> 4 foo4
> Name: level_1, dtype: object "
>
> result of zip( df0, df1 )=
> <class 'zip'> :
> " <zip object at 0x000000000B3B9548>"
>
> result of list( z )=
> <class 'list'> :
> " [('foo1', 'foo3'), ('foo2', 'foo5'), ('foo3', 'foo2'), ('foo3', 'foo6'), ('foo5', 'foo4')]"
>
> value of { 'zz': l }=
> <class 'dict'> :
> " {'zz': [('foo1', 'foo3'), ('foo2', 'foo5'), ('foo3', 'foo2'), ('foo3', 'foo6'), ('foo5', 'foo4')]}"
>
> {'zz': [('foo1', 'foo3'), ('foo2', 'foo5'), ('foo3', 'foo2'), ('foo3', 'foo6'), ('foo5', 'foo4')]}
>
> The script reads a CSV file and stores the data in a Pandas
> DataFrame object named "df". The "index_col=0" parameter tells
> Pandas to use the first column as the index for the DataFrame,
> which is kinda like column headers.
>
> The "where" creates a new DataFrame selection that contains
> the same data as df, but with all values replaced by NaN (Not
> a Number) except for the values that are equal to 'zz'.
>
> "stack" returns a Series with a multi-level index created
> by pivoting the columns. Here it gives a Series with the
> row-col-addresses of a all the non-NaN values. The general
> meaning of "stack" might be the most complex operation of
> this script. It's explained in the pandas manual (see there).
>
> "reset_index" then just transforms this Series back into a
> DataFrame, and ".iloc[ :, 0 ]" and ".iloc[ :, 1 ]" are the
> first and second column, respectively, of that DataFrame. These
> then are zipped to get the desired form as a list of pairs.
>

And this is a technique very similar to reverse engineering. Thanks for
the explanation and examples. All this is really clear and I was able to
follow it easily because I have already written a version of this code
in C without any kind of external library that uses the .CSV version of
the table as data ( 234 code lines :^/ ).

Subject: Re: help: pandas and 2d table
From: Stefan Ram
Newsgroups: comp.lang.python
Organization: Stefan Ram
Date: Sun, 19 May 2024 16:32 UTC
References: 1 2 3 4 5 6 7 8 9
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: help: pandas and 2d table
Date: 19 May 2024 16:32:50 GMT
Organization: Stefan Ram
Lines: 8
Expires: 1 Feb 2025 11:59:58 GMT
Message-ID: <Jay-20240519173201@ram.dialup.fu-berlin.de>
References: <uvbv6a$2gmc4$1@dont-email.me> <pandas-20240412202220@ram.dialup.fu-berlin.de> <uvdvlj$30soq$1@dont-email.me> <8b63c74a-d8e5-4c3a-ac3e-b240c88b7dcb@wichmann.us> <CAO39LaTUr5_KC5PLqb9_PZs4cQMombFUVf21K9o5HhkrWJfKnw@mail.gmail.com> <mailman.104.1713028490.3468.python-list@python.org> <pandas-20240413193824@ram.dialup.fu-berlin.de> <uvetru$375r4$1@dont-email.me> <pandas-20240414094956@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de HGc6sc5LCR4kcvwfFQp4+QkWxhIXXlBvIEXXjKWpUv/vpW
Cancel-Lock: sha1:lMriopUrypBkMSTyty3QZMzrAFQ= sha256:QlVhURebW9u3vtjiyUVEHTdORgauNOM0hMfyXKnTCcY=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
View all headers

ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
>row-col-addresses of a all the non-NaN values. The general
>meaning of "stack" might be the most complex operation of
>this script. It's explained in the pandas manual (see there).

Jay Alammar knocked it out of the park with a killer website
called "Visualizing Pandas' Pivoting and Reshaping" that takes
a deep dive into "pivot", "melt", "stack" and "unstack".

1

rocksolid light 0.9.8
clearnet tor