Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

This life is yours. Some of it was given to you; the rest, you made yourself.


comp / comp.lang.python / How to weight terms based on semantic importance

SubjectAuthor
o How to weight terms based on semantic importancemarc nicole

1
Subject: How to weight terms based on semantic importance
From: marc nicole
Newsgroups: comp.lang.python
Date: Wed, 15 Jan 2025 17:40 UTC
References: 1
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: mk1853387@gmail.com (marc nicole)
Newsgroups: comp.lang.python
Subject: How to weight terms based on semantic importance
Date: Wed, 15 Jan 2025 18:40:43 +0100
Lines: 9
Message-ID: <mailman.80.1736963341.2912.python-list@python.org>
References: <CAGJtH9TYE-MEqSUHWO-JW5j-d2CtUqet7A_R2fn7A25iScGpFg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de 6WlaTgLXz7kXUHlNiFLrGg3wlU4dSjvvZYbuHze3SSOQ==
Cancel-Lock: sha1:KNWsHnZzsCLonk3D12HEGRQE/Rw= sha256:dYTjIOaQozc6f6wOK9uKBHLzJvBSdX4TxdMYru1RLb0=
Return-Path: <mk1853387@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=DFKoolsY;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.192
X-Spam-Level: *
X-Spam-Evidence: '*H*': 0.65; '*S*': 0.03; 'example:': 0.09; 'nltk':
0.16; 'semantics': 0.16; 'weights': 0.16; 'to:addr:python-list':
0.20; 'to:no real name:2**1': 0.22; 'sfxlen:2': 0.31;
'subject:How': 0.31; 'message-id:@mail.gmail.com': 0.31; 'there':
0.33; 'received:google.com': 0.34; 'from:addr:gmail.com': 0.34;
'using': 0.37; 'others': 0.37; 'way': 0.38; 'thanks': 0.39;
'hello,': 0.39; 'text': 0.39; 'want': 0.40; 'terms': 0.69;
'weight': 0.84; 'frequency': 0.84; 'subject:based': 0.84;
'etc...': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1736963338; x=1737568138; darn=python.org;
h=to:subject:message-id:date:from:mime-version:from:to:cc:subject
:date:message-id:reply-to;
bh=7wtmvXlypr2Nc0m8v3GwAbjYUl/Fy9npQKP4sIb/Xfg=;
b=DFKoolsYmRVaY464vt3MHs1MO82Yuqb+sEkiclSg5R9qt6XzhzHtvhLWKj3pI/DnU9
dt3ygjMo7SuQfAxCtIA8N2+ARLLOt9gLeCeqZPvImZFRrf0c80gRgbJlzOEtnZZeNRZ+
WRUlTWlMgUxpa89gWteYquHEAEca+93cF53dFh9sLbCAN3u4G2WtN17yL7YGjWqVcWHe
dnGkOhEuUuRKazD1nGe0K17QBde6SOGZngw69RFjL13tDJczwFYrTpaGPR9YJakQaG/m
4TRhxNH7cyG7+0CXYy2xrSxBSf1/8mAM//RaxqmAjymR8dCzXOqAhc+t+0foMcO0tMz2
oqdw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1736963338; x=1737568138;
h=to:subject:message-id:date:from:mime-version:x-gm-message-state
:from:to:cc:subject:date:message-id:reply-to;
bh=7wtmvXlypr2Nc0m8v3GwAbjYUl/Fy9npQKP4sIb/Xfg=;
b=vF22XgZE8/Bc/yIbLvNX+hc33RhhHtGzOmmYbLxtdOmhD4DlyqA9X8cqP+/4UBKZZH
lGmIoawyBtv6w/tu+YG2zCwtcAqauAA3T9KFyQVlxE3UgHUE7btjG7CjlkkhYpwU3mph
YWbCsL7w1Q5IE29FCuDzetABBeWyovr27BOU66ap1hDH2pj+dUeR6MEdLAFHRbI4Rq2O
r2bmMVAVrD5U5mL/r5gpOYor+XsQoyVh3xGs/v4C6eNKJx5pyJFtPYV9EV0SeRkqKl3T
gjX8TtSDT6ghDv6BzgGpMIDdLY0RnCN6XARj67X/PX+kQe3k9Ldpd2PT+mJfisbXj+Va
kKow==
X-Forwarded-Encrypted: i=1;
AJvYcCW5Je0D88zxST8EZeKbuoeiPplZONvicQEwFJ3nH20czYD7/zN65Nciy7WN8LRl93rX35UnCIaYt/c6hA==@python.org
X-Gm-Message-State: AOJu0YxXYIw0xj2kIhLNnC+gxStBxHYV42FQQTYCjXL+kdXwHf2O1S8m
huXqzZGgM5sfD4N/v+d9gAr3+AK/HQEWPj3+EtpZOKZoQqV/uP/BNSbHY8mkBjWiPBJZ8QTGMXN
cI8i3Qeep7NGOJWkzs/zcfhIY6zdePDyx
X-Gm-Gg: ASbGncv365Xu2F9n6/4v5AlJ1YjuUjfe88THY3cS1V8rp4Zeb3y2YfZU8uJrqdEBYPN
NQaOANugL6fI21TERnOLU1hJwa0e4cd+s3XoYen/R
X-Google-Smtp-Source: AGHT+IGTOo1No/bshzDoIRm95gEpBeFpwaecd7vp69hf/BiYqTUhM1RGwgN0dIjb7HwF9l874JiVe69avmXhG4YLJfQ=
X-Received: by 2002:a05:690c:6b11:b0:6e2:fcb5:52fa with SMTP id
00721157ae682-6f6c9b20b7amr31334167b3.9.1736962854560; Wed, 15 Jan 2025
09:40:54 -0800 (PST)
X-Gm-Features: AbW1kvaTLhxFjIwId_ToFLGXls3fxyAnjABoKsPivKbiuKnoPCe-1XvgwLVt9DI
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAGJtH9TYE-MEqSUHWO-JW5j-d2CtUqet7A_R2fn7A25iScGpFg@mail.gmail.com>
View all headers

Hello,

I want to weight terms of a large text based on their semantics (not on
their frequency (TF-IDF)).
Is there a way to do that using NLTK or other means? through a vectorizer?

For example: a certain term weights more than others etc...

Thanks

1

rocksolid light 0.9.8
clearnet tor