Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

Tempt not a desperate man. -- William Shakespeare, "Romeo and Juliet"


comp / comp.lang.tcl / slow fileutil::foreachLine

SubjectAuthor
* slow fileutil::foreachLineMark Summerfield
`- Re: slow fileutil::foreachLineRich

1
Subject: slow fileutil::foreachLine
From: Mark Summerfield
Newsgroups: comp.lang.tcl
Date: Mon, 17 Jun 2024 07:02 UTC
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!border-1.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!nntp.brightview.co.uk!news.brightview.co.uk.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 17 Jun 2024 07:02:59 +0000
From: mark@qtrac.eu (Mark Summerfield)
Subject: slow fileutil::foreachLine
Newsgroups: comp.lang.tcl
MIME-Version: 1.0
User-Agent: Pan/0.154 (Izium; 517acf4)
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Message-ID: <nY-dnZ1fQb8-QvL7nZ2dnZfqn_GdnZ2d@brightview.co.uk>
Date: Mon, 17 Jun 2024 07:02:59 +0000
Lines: 32
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-YJKry1RtIaQEjZ4F+grvLEmoARJjX2X7gLc+9ZikTHxvkR/J72r7u0iAN+WJH1YMnwS5lrjG1FJW36H!xAM6jDSIoB8u9UqL/5/o81UeP1/5Q4g7VFS5Tb2PKHxIMlJEWVPceXzKvGOCbdvrgX4qvk1ofRKm!D0HVCeM0kn6Oafi0dDeo+dgqBA==
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
View all headers

I have this function:

proc ws::get_words {wordfile} {
set in [open $wordfile r]
try {
while {[gets $in line] >= 0} {
if {[regexp {^[a-z]+$} $line matched]} {
lappend ::ws::Words [string tolower $matched]
}
}
} finally {
close $in
}
}

It reads about 100_000 lines and ends up keeping about 65_000 of them
(from /usr/share/dict/words)

I tried replacing it with:

proc ws::get_words {wordfile} {
::fileutil::foreachLine line $wordfile {
if {[regexp {^[a-z]+$} $line matched]} {
lappend ::ws::Words [string tolower $matched]
}
}
}

The first version loads "instantly"; but the second version (with
foreachLine) takes seconds.

I'm using Tcl/Tk 9.0b2

Subject: Re: slow fileutil::foreachLine
From: Rich
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Mon, 17 Jun 2024 15:40 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: slow fileutil::foreachLine
Date: Mon, 17 Jun 2024 15:40:29 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <v4pldd$noq7$1@dont-email.me>
References: <nY-dnZ1fQb8-QvL7nZ2dnZfqn_GdnZ2d@brightview.co.uk>
Injection-Date: Mon, 17 Jun 2024 17:40:30 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="73d1b486d2204051b1609329071341c3";
logging-data="779079"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX190VFqK3iJohl4QZgrfKvqp"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Cancel-Lock: sha1:XZ6VyMhYGBGE3bwXRLyDN/BuKAM=
View all headers

Mark Summerfield <mark@qtrac.eu> wrote:
> I have this function:
>
> proc ws::get_words {wordfile} {
> set in [open $wordfile r]
> try {
> while {[gets $in line] >= 0} {
> if {[regexp {^[a-z]+$} $line matched]} {
> lappend ::ws::Words [string tolower $matched]
> }
> }
> } finally {
> close $in
> }
> }
>
> It reads about 100_000 lines and ends up keeping about 65_000 of them
> (from /usr/share/dict/words)
>
> I tried replacing it with:
>
> proc ws::get_words {wordfile} {
> ::fileutil::foreachLine line $wordfile {
> if {[regexp {^[a-z]+$} $line matched]} {
> lappend ::ws::Words [string tolower $matched]
> }
> }
> }
>
> The first version loads "instantly"; but the second version (with
> foreachLine) takes seconds.

If you check the implementation of fileutil::foreachLine, you find:

set code [catch {uplevel 1 $cmd} result options]

Where "$cmd" is a variable holding a string of the "command" passed to
foreachLine.

Your original copy is all in a single procedure, so it will be bytecode
compiled, and for all but the first execution will run that compiled
bytecode.

The foreachLine version, since the "cmd" is a string, will receive
little to no byte code compiling, and the difference in time is the
overhead of not being able to bytecode compile the "command" string
passed to foreachLine.

1

rocksolid light 0.9.8
clearnet tor