Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #416: We're out of slots on the server


comp / comp.lang.scheme / Re: From JoyceUlysses.txt -- words occurring exactly once

SubjectAuthor
* From JoyceUlysses.txt -- words occurring exactly onceHenHanna
+- Re: From JoyceUlysses.txt -- words occurring exactly oncePaul Rubin
+- Re: From JoyceUlysses.txt -- words occurring exactly onceB. Pym
+- Re: From JoyceUlysses.txt -- words occurring exactly onceJeff Barnett
`* Re: From JoyceUlysses.txt -- words occurring exactly onceStefan Monnier
 `- Re: From JoyceUlysses.txt -- words occurring exactly onceKaz Kylheku

1
Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
From: Jeff Barnett
Newsgroups: comp.lang.lisp, comp.lang.scheme
Organization: A noiseless patient Spider
Date: Thu, 30 May 2024 22:33 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jbb@notatt.com (Jeff Barnett)
Newsgroups: comp.lang.lisp,comp.lang.scheme
Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
Date: Thu, 30 May 2024 16:33:30 -0600
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <v3aus4$1sknf$1@dont-email.me>
References: <v3ame4$1qf6m$5@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64
Injection-Date: Fri, 31 May 2024 00:33:40 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c2bcaee4d820f520b787d3813faef04a";
logging-data="1987311"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19H47WtPGq6F2cXCDXWubQmIJ8XvLfY40k="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:QEOR0sE/AWC2bTT76Zt59TPiKG8=
In-Reply-To: <v3ame4$1qf6m$5@dont-email.me>
Content-Language: en-US
View all headers

On 5/30/2024 2:09 PM, HenHanna wrote:
>
> i'd not use Gauche for this, but maybe someone can change my mind.
>
>
> _______________________
> From JoyceUlysses.txt -- words occurring exactly once
>
>
> Given a text file of a novel (JoyceUlysses.txt) ...
>
> could someone give me a pretty fast (and simple) program that'd give me
> a list of all words occurring exactly once?
>
>               -- Also, a list of words occurring once, twice or 3 times
>
>
>
> re: hyphenated words        (you can treat it anyway you like)
>
>        ideally, i'd treat  [editor-in-chief]
>                            [go-ahead]  [pen-knife]
>                            [know-how]  [far-fetched] ...
>        as one unit.
Make a list (or array) of the individual words (as strings or symbols in
a special package) of the original document then sort the list using the
Lisp-supplied sort function. You than write a loop using your favorite
tools and look for interior sequences of the required length. This gives
you a program that is asymptotically efficient as the theoretical
run-time will look something like (* c N (log N)), where N is the length
of the list produced by the first step and c is some constant.
Note, any solution resembling this one is not really what you want. For
example it would think "Snark" and "Snarks" are different words. Some
differences such as capitalization can be suppressed by choosing a sort
predicate that is case insensitive. You can, of course, write your own
sort predicate. The thing to note is that the predicate (the <= operator
used by sort) will not access the words or maintain state between
invocations; otherwise, the complexity can become arbitrarily large.
--
Jeff Barnett

Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
From: Stefan Monnier
Newsgroups: comp.lang.lisp, comp.lang.scheme
Organization: A noiseless patient Spider
Date: Thu, 30 May 2024 22:45 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: monnier@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.lang.lisp,comp.lang.scheme
Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
Date: Thu, 30 May 2024 18:45:00 -0400
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <jwvzfs6ncq0.fsf-monnier+comp.lang.lisp@gnu.org>
References: <v3ame4$1qf6m$5@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Fri, 31 May 2024 00:45:10 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="09cb8bd1565b7925484549f09d63700c";
logging-data="1988815"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+LPh+DSwMh5NNdGubo+SD6LnbRtPS80Mo="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:JI24YxwIKD0stq8chza0Go84xTg=
sha1:UTvNvILuxHRt0wxgEctLnWYyb1M=
View all headers

> Given a text file of a novel (JoyceUlysses.txt) ...
> could someone give me a pretty fast (and simple) program that'd give me
> a list of all words occurring exactly once?

tr ' .;:,?!' '\n' | sort | uniq -u

?

- Stefan

Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
From: B. Pym
Newsgroups: comp.lang.lisp, comp.lang.scheme
Organization: A noiseless patient Spider
Date: Fri, 31 May 2024 10:13 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: No_spamming@noWhere_7073.org (B. Pym)
Newsgroups: comp.lang.lisp,comp.lang.scheme
Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
Date: Fri, 31 May 2024 10:13:50 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <v3c7st$26biv$1@dont-email.me>
References: <v3ame4$1qf6m$5@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Injection-Date: Fri, 31 May 2024 12:13:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="6c2b9b9238357433b68a6ad6acbc6363";
logging-data="2305631"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+lq/+0ukfdWOEHT9W9Ot2H"
User-Agent: XanaNews/1.18.1.6
Cancel-Lock: sha1:XLLkddecDl9FUISxDGw2H0gfzv4=
View all headers

On 5/30/2024, HenHanna wrote:

>
> i'd not use Gauche for this, but maybe someone can change my mind.
>
>
> _______________________
> From JoyceUlysses.txt -- words occurring exactly once
>
>
> Given a text file of a novel (JoyceUlysses.txt) ...
>
> could someone give me a pretty fast (and simple) program that'd give me a list of all words occurring exactly once?
>
> -- Also, a list of words occurring once, twice or 3 times
>
>
>
> re: hyphenated words (you can treat it anyway you like)
>
> ideally, i'd treat [editor-in-chief]
> [go-ahead] [pen-knife]
> [know-how] [far-fetched] ...
> as one unit.

Gauche Scheme

(use file.util) ;; file->string
(use srfi-13) ;; character sets
(use srfi-14) ;; string-tokenize

(define h (make-hash-table 'string=?))

(dolist
(s
(string-tokenize (file->string "Alice.txt")
(char-set-adjoin char-set:letter #\-)))
(hash-table-update! h
(regexp-replace* (string-upcase s) #/^-+/ "" #/-+$/ "")
(pa$ + 1) 0))

(filter (lambda(kv) (< (cdr kv) 3))
(hash-table->alist h))

===>

(("LASTED" . 2) ("WAY--NEVER" . 1) ("VISIT" . 1) ("CHANCED" . 1)
("WILDLY" . 2) ("BEHEAD" . 1) ("PROMISE" . 1) ("MEANWHILE" . 1)
("ENGAGED" . 1) ("KNIFE" . 2) ("ROARED" . 1) ("RETIRE" . 1)
("BLACKING" . 1) ("HATED" . 1) ("BRIGHT-EYED" . 1)
("SHEEP-BELLS" . 1) ("PROTECTION" . 1) ("CRIES" . 1) ("ADA" . 1)
("ENJOY" . 1) ("WRITHING" . 1) ("RAW" . 1) ("APPEALED" . 1)
("RELIEVED" . 1) ("CHILDHOOD" . 1) ("WEPT" . 1) ("RACE-COURSE" . 1)
("THEIRS" . 1) ("MAD--AT" . 1) ("SPOKEN" . 1) ("PENCILS" . 1)
("CLEAR" . 2) ("TREADING" . 2) ("RETURNED" . 2) ("CHERRY-TART" . 1)
("UNEASY" . 1) ("LOW-SPIRITED" . 1) ("BONE" . 1) ("PROMISED" . 1)
("HAPPENING" . 1) ("OYSTER" . 1) ("PATIENTLY" . 2) ("NEEDS" . 1)
("LESSON-BOOK" . 1) ("PITIED" . 1) ("UNCOMFORTABLY" . 1)
("ANTIPATHIES" . 1) ("PICTURED" . 1) ("DESPERATE" . 1)
("ENGRAVED" . 1)
...
)

Subject: (lambda (x) (list (car x) (length x))) using Cut or Cute?
From: HenHanna
Newsgroups: comp.lang.scheme
Organization: A noiseless patient Spider
Date: Sun, 9 Jun 2024 23:25 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: HenHanna@devnull.tb (HenHanna)
Newsgroups: comp.lang.scheme
Subject: (lambda (x) (list (car x) (length x))) using Cut or Cute?
Date: Sun, 9 Jun 2024 16:25:24 -0700
Organization: A noiseless patient Spider
Lines: 3
Message-ID: <v45dl4$3s5s5$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 10 Jun 2024 01:25:24 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="fdf3da398add0912fb8385f840f76bc6";
logging-data="4069253"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+kHob+KfEBTJfO/JusGcwXIcOJiR8mC9I="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:s49vvf/+QXMPDS6iKIAqfMBPx/s=
Content-Language: en-US
View all headers

There's no way to write (lambda (x) (list (car x) (length x)))
using Cut or Cute???

Subject: Re: in Python? -- Chunk -- (ChunkC '(a a b b b)), ==> ((a 2) (b 3))
From: HenHanna
Newsgroups: comp.lang.python, comp.lang.scheme
Organization: A noiseless patient Spider
Date: Mon, 10 Jun 2024 02:36 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: HenHanna@devnull.tb (HenHanna)
Newsgroups: comp.lang.python,comp.lang.scheme
Subject: Re: in Python? -- Chunk -- (ChunkC '(a a b b b)), ==> ((a 2) (b 3))
Date: Sun, 9 Jun 2024 19:36:39 -0700
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <v45orn$3j3r$1@dont-email.me>
References: <v456ak$3pmpo$1@dont-email.me>
<010f01dabada$ad867d00$08937700$@gmail.com>
<mailman.107.1717985138.2909.python-list@python.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 10 Jun 2024 04:36:39 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="fdf3da398add0912fb8385f840f76bc6";
logging-data="117883"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+bK2n1djS+ZUchocHY2Dgc4+SwOKLLbCU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:WrcDxmKxLcaO8Vtc1njlCwZQPBk=
Content-Language: en-US
In-Reply-To: <mailman.107.1717985138.2909.python-list@python.org>
View all headers

On 6/9/2024 7:05 PM, avi.e.gross@gmail.com wrote:
> I remembered that HenHanna had been hard to deal with in the past and when
> my reply to him/her/them bounced as a bad/fake address it came back to me
> that I am better off not participating in this latest attempt to get us to
> perform then probably shoot whatever we say down.
>
> A considerate person would ask questions more clearly and perhaps explain
> what language they are showing us code from and so on.
>
> Life is too short to waste.
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
> Behalf Of HenHanna via Python-list
> Sent: Sunday, June 9, 2024 5:20 PM
> To: python-list@python.org
> Subject: in Python? -- Chunk -- (ChunkC '(a a b b b)), ==> ((a 2) (b 3))
>
> Chunk, ChunkC -- nice simple way(s) to write these in Python?
>
>
> (Chunk '(a a b a a a b b))
> ==> ((a a) (b) (a a a) (b b))
>
>
> (Chunk '(a a a a b c c a a d e e e e))
> ==> ((a a a a) (b) (c c) (a a) (d) (e e e e))
>
>
> (Chunk '(2 2 foo bar bar j j j k baz baz))
> ==> ((2 2) (foo) (bar bar) (j j j) (k) (baz baz))
>
> _________________
>
> (ChunkC '(a a b b b))
> ==> ((a 2) (b 3))
>
> (ChunkC '(a a b a a a b b))
> ==> ((a 2) (b 1) (a 3) (b 2))

i was just curiuos about simple, clever way to write it in Python

in Scheme (Gauche)

(use srfi-1) ;; span

(define (gp x)
(if (null? x) '()
(let-values (((F L) (span (cut equal? (car x) <>) x)))
(cons F (gp L)))))

(print (gp '(a b b a a a b b b b)))
(print (gp '(c c c a d d d d a e e e e e)))

(define (gpC x) (map (lambda (x) (list (car x) (length x))) (gp x)))

(print (gpC '(a b b a a a b b b b)))
(print (gpC '(c c c a d d d d a e e e e e)))

Subject: Re: in Python? -- Chunk -- (ChunkC '(a a b b b)), ==> ((a 2) (b 3))
From: B. Pym
Newsgroups: comp.lang.python, comp.lang.scheme
Organization: A noiseless patient Spider
Date: Sat, 15 Jun 2024 02:53 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: No_spamming@noWhere_7073.org (B. Pym)
Newsgroups: comp.lang.python,comp.lang.scheme
Subject: Re: in Python? -- Chunk -- (ChunkC '(a a b b b)), ==> ((a 2) (b 3))
Date: Sat, 15 Jun 2024 02:53:52 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <v4ivnt$39hbo$1@dont-email.me>
References: <v456ak$3pmpo$1@dont-email.me> <010f01dabada$ad867d00$08937700$@gmail.com> <mailman.107.1717985138.2909.python-list@python.org> <v45orn$3j3r$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Injection-Date: Sat, 15 Jun 2024 04:53:53 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="fbdbd461b6d50fc87347a01f45a4822c";
logging-data="3458424"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/wpwqbOrFoH1PJPLBGVdna"
User-Agent: XanaNews/1.18.1.6
Cancel-Lock: sha1:9r91LkFqomCPN26GlB+I9Gae9UI=
View all headers

On 6/9/2024, HenHanna wrote:

> > (Chunk '(a a b a a a b b))
> > ==> ((a a) (b) (a a a) (b b))
> >

Gauche Scheme:

(use gauche.sequence)

(group-sequence '(2 4 4 3 0 5 8 6 6 6))
===>
((2) (4 4) (3) (0) (5) (8) (6 6 6))

1

rocksolid light 0.9.8
clearnet tor