Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

Bank error in your favor. Collect $200.


comp / comp.lang.tcl / Re: Problem with tdbc

SubjectAuthor
* Problem with tdbcAlan Grunwald
+- Re: Problem with tdbcsaito
`* Re: Problem with tdbcgreg
 `* Re: Problem with tdbcAlan Grunwald
  `* Re: Problem with tdbcgreg
   `- Re: Problem with tdbcAlan Grunwald

1
Subject: Problem with tdbc
From: Alan Grunwald
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Mon, 29 Apr 2024 13:03 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam.nurdglaw@gmail.com (Alan Grunwald)
Newsgroups: comp.lang.tcl
Subject: Problem with tdbc
Date: Mon, 29 Apr 2024 14:03:13 +0100
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <v0o5ru$1omi0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 29 Apr 2024 15:03:59 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="560740ec1b7ba16d94810aada2114789";
logging-data="1858112"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6VFinX70ypWN7iZPZW83QvmxrZ6xU/2M="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:5tl0gaqNl72N0QBdKbXjmQE+r0E=
Content-Language: en-US
View all headers

I'm writing a script to scrape data from a website and save it in a
database.

The web site offers games and supports multiple users, who play against
each other in multiple tournaments every day. The first time the script
is run, the idea is to capture data, including their opponents, for a
specified user over the past 24 hours. On subsequent runs, we capture
data for all users that we know about.

So, I expect that I'll be scraping more and more data as time goes by.
I've run it for a few days writing data to a SQLite database but after a
couple of weeks it now takes more than 24 hours to run to completion.
I'd like to see how long it takes if I use a MySQL database.
Unfortunately, on the second run the script reports the error:

"Can't create more that max_prepared_stmt_count statements (current
value 16382)"

Now, I could simply increase the value of the MySQL variable, maybe to
its maximum allowed value of 4194304. However, when it failed, it was
trying to handle 1000 users. I know from the SQLite prototype that I
need to be able to hand more than 80000 users. Upping the limit might
work and might not kill the machine hosting the database, but I'd really
like to find a better solution.

As far as I'm aware I'm only creating about a dozen (12) prepared
statements. I have noticed, however, that I have a large number of
tdbc::mysql::resultset objects - 16375 when the server complained, and I
wonder whether these objects create a prepared statement that they
release when they are destroyed. (I saw the same growth - of
tdbc::sqlite3::resultset objects - when saving to a SQLite database.)

I am nearly certain that I'm not creating these resultset objects
directly myself, I think all my interaction with tdbc is via db prepare
and statement foreach/allrows calls but I imagine that freach and/or
allrows will create a temporary resultset that it (should) manage itself.

Does the above description ring any bells with anyone here? If not, any
suggestions as to how I should proceed?

Alan

Subject: Re: Problem with tdbc
From: saito
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Mon, 29 Apr 2024 16:36 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: saitology9@gmail.com (saito)
Newsgroups: comp.lang.tcl
Subject: Re: Problem with tdbc
Date: Mon, 29 Apr 2024 12:36:43 -0400
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <v0oiar$1rj07$1@dont-email.me>
References: <v0o5ru$1omi0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 29 Apr 2024 18:36:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="dd0f19e839f54eee62c9245d48674694";
logging-data="1952775"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18iEDh+grflrPHgvUR97jJ0"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:adfBRCbP5yY8dtcytpWRsurWbeU=
In-Reply-To: <v0o5ru$1omi0$1@dont-email.me>
Content-Language: en-US
View all headers

On 4/29/2024 9:03 AM, Alan Grunwald wrote:
>
> So, I expect that I'll be scraping more and more data as time goes by.
> I've run it for a few days writing data to a SQLite database but after a
> couple of weeks it now takes more than 24 hours to run to completion.
> I'd like to see how long it takes if I use a MySQL database.
> Unfortunately, on the second run the script reports the error:
>
> "Can't create more that max_prepared_stmt_count statements (current
> value 16382)"
>
....
>
> Does the above description ring any bells with anyone here? If not, any
> suggestions as to how I should proceed?
>
> Alan

I have used oratcl but without any tdbc wrapper so my experience with
tdbc is basically nil. But it looks like you may not be releasing some
of the objects that are created for you during normal operation. This
includes result sets, even though in your case, they may be empty.

Alternatively, have you tried using sqlite directly without tdbc around
it? Because sqlite already comes with a pretty good interface
especially from tcl.

Subject: Re: Problem with tdbc
From: greg
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Mon, 29 Apr 2024 19:28 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: gregor.ebbing@gmx.de (greg)
Newsgroups: comp.lang.tcl
Subject: Re: Problem with tdbc
Date: Mon, 29 Apr 2024 21:28:32 +0200
Organization: A noiseless patient Spider
Lines: 86
Message-ID: <v0osd1$1u46d$1@dont-email.me>
References: <v0o5ru$1omi0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 29 Apr 2024 21:28:33 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2f3970681de8a577714720ddfaf943ed";
logging-data="2035917"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+RYkJwkHVz86Z/c2GR/b+7+MFnG0Sh6GU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:t7QlL0gr/D5KwxlFpF5SFkBdqyo=
Content-Language: de-DE
In-Reply-To: <v0o5ru$1omi0$1@dont-email.me>
View all headers

Am 29.04.24 um 15:03 schrieb Alan Grunwald:
> I'm writing a script to scrape data from a website and save it in a
> database.
>
> The web site offers games and supports multiple users, who play against
> each other in multiple tournaments every day. The first time the script
> is run, the idea is to capture data, including their opponents, for a
> specified user over the past 24 hours. On subsequent runs, we capture
> data for all users that we know about.
>
> So, I expect that I'll be scraping more and more data as time goes by.
> I've run it for a few days writing data to a SQLite database but after a
> couple of weeks it now takes more than 24 hours to run to completion.
> I'd like to see how long it takes if I use a MySQL database.
> Unfortunately, on the second run the script reports the error:
>
> "Can't create more that max_prepared_stmt_count statements (current
> value 16382)"
>
> Now, I could simply increase the value of the MySQL variable, maybe to
> its maximum allowed value of 4194304. However, when it failed, it was
> trying to handle 1000 users. I know from the SQLite prototype that I
> need to be able to hand more than 80000 users. Upping the limit might
> work and might not kill the machine hosting the database, but I'd really
> like to find a better solution.
>
> As far as I'm aware I'm only creating about a dozen (12) prepared
> statements. I have noticed, however, that I have a large number of
> tdbc::mysql::resultset objects - 16375 when the server complained, and I
> wonder whether these objects create a prepared statement that they
> release when they are destroyed. (I saw the same growth - of
> tdbc::sqlite3::resultset objects - when saving to a SQLite database.)
>
> I am nearly certain that I'm not creating these resultset objects
> directly myself, I think all my interaction with tdbc is via db prepare
> and statement foreach/allrows calls but I imagine that freach and/or
> allrows will create a temporary resultset that it (should) manage itself.
>
> Does the above description ring any bells with anyone here? If not, any
> suggestions as to how I should proceed?
>
> Alan

Are you using close?

https://www.tcl.tk/man/tcl/TdbcCmd/tdbc_statement.htm
The close object command removes a statement and any result sets that
it has created. All system resources associated with the objects are freed.

https://www.tcl.tk/man/tcl/TdbcCmd/tdbc_resultset.htm
The close object command deletes the result set and frees any associated
system resources.

> I've run it for a few days writing data to a SQLite database but after a
> couple of weeks it now takes more than 24 hours to run to completion.

Are you using transaction and commit?
https://www.tcl-lang.org/man/tcl/TdbcCmd/tdbc_connection.htm
db begintransaction
db commit

https://www.magicsplat.com/articles/tdbc.html#_transactions

a example:
proc sqlInsertCsv {dbconn datalist} {
$dbconn begintransaction
set stmt [$dbconn prepare {
INSERT INTO csvdaten (one,two,three) VALUES (:one,:two,:three)}]

foreach dataline $datalist {
foreach {one two three} $dataline {
$stmt execute
}
}
$dbconn commit
$stmt close
}

Gregor

Subject: Re: Problem with tdbc
From: Alan Grunwald
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Mon, 29 Apr 2024 22:00 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam.nurdglaw@gmail.com (Alan Grunwald)
Newsgroups: comp.lang.tcl
Subject: Re: Problem with tdbc
Date: Mon, 29 Apr 2024 23:00:15 +0100
Organization: A noiseless patient Spider
Lines: 81
Message-ID: <v0p5aa$206kv$1@dont-email.me>
References: <v0o5ru$1omi0$1@dont-email.me> <v0osd1$1u46d$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 30 Apr 2024 00:00:42 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="fa6d8c85d83e18f5ca0688420acb95e0";
logging-data="2103967"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/sGWcXR3sFhQH8vjdncxJV0KN+9VrHo/k="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:kPy4q98D5GTqRcKpaB6vpNdd+ig=
In-Reply-To: <v0osd1$1u46d$1@dont-email.me>
Content-Language: en-US
View all headers

On 29/04/2024 20:28, greg wrote:

<snipped problem statement>

> Are you using close?
>
> https://www.tcl.tk/man/tcl/TdbcCmd/tdbc_statement.htm
> The close object command removes a statement and any result sets that
> it has created. All system resources associated with the objects are
freed.
>
> https://www.tcl.tk/man/tcl/TdbcCmd/tdbc_resultset.htm
> The close object command deletes the result set and frees any associated
> system resources.
>
I'm not consciously creating any resultset objects, so am not closing
any either.

Everything is wrapped up in a 'database' object, including around a
dozen long-lived statement objects. The statement objects are destroyed
and the database connection is closed when my 'database' object is
destroyed.

>
> > I've run it for a few days writing data to a SQLite database but
after a
> > couple of weeks it now takes more than 24 hours to run to completion.
>
> Are you using transaction and commit?
The whole web-scraping activity is wrapped in a try statement; the first
statement within the try is begintransation; if execution reaches the
end of the try the transaction is committed; if if there is an error,
the trasaction is rolled back.

I'm hoping to debug where and why the resultset leak is happening. If I
manage to sort that out and it doesn't fix the prepared statement leak,
I'll have to go and search for that.

> https://www.tcl-lang.org/man/tcl/TdbcCmd/tdbc_connection.htm
> db begintransaction
> db commit
>
> https://www.magicsplat.com/articles/tdbc.html#_transactions
>
>
> a example:
> proc sqlInsertCsv {dbconn datalist} {
> $dbconn begintransaction
> set stmt [$dbconn prepare {
> INSERT INTO csvdaten (one,two,three) VALUES (:one,:two,:three)}]
>
> foreach dataline $datalist {
> foreach {one two three} $dataline {
> $stmt execute
> }
> }
> $dbconn commit
> $stmt close
> }
>
>
>
> Gregor
>

I've just had another look at the tdbc_statement manual page and I see
that this may well be designed behaviour; I see that [statement execute]
returns a resultset object. I had not noticed that before
and am very sure that I haven't closed these objects.

It's far too late for me to start looking through my code to see whether
I do use the execute object command, and whether if I delete the
returned objects I can prevent the buildup - I'd be amazed if it
doesn't, but it's just about possible that I'm using the allrows object
command everywhere (which is what I usually do, and which manages the
resultset object internally.

I'll post to this thread again tomorrow with results and news on whether
this fixes the prepared statement buildup on the server side.

Thanks for pointing me at the documentation!

Subject: Re: Problem with tdbc
From: greg
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Tue, 30 Apr 2024 04:15 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: gregor.ebbing@gmx.de (greg)
Newsgroups: comp.lang.tcl
Subject: Re: Problem with tdbc
Date: Tue, 30 Apr 2024 06:15:03 +0200
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <v0pr88$28jqr$1@dont-email.me>
References: <v0o5ru$1omi0$1@dont-email.me> <v0osd1$1u46d$1@dont-email.me>
<v0p5aa$206kv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 30 Apr 2024 06:15:04 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3f3e08e5b4137efa9498a6294b3e7452";
logging-data="2379611"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+1lX73e3Ly434Huu1HtT1rZJCY62tMtbI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:+CwTykhpT2J3VzxnbRFgmedqTE8=
In-Reply-To: <v0p5aa$206kv$1@dont-email.me>
Content-Language: de-DE
View all headers

Am 30.04.24 um 00:00 schrieb Alan Grunwald:

> The whole web-scraping activity is wrapped in a try statement; the first
> statement within the try is begintransation; if execution reaches the
> end of the try the transaction is committed; if if there is an error,
> the trasaction is rolled back.

Yes, the better way is to use try.

> I've just had another look at the tdbc_statement manual page and I see
> that this may well be designed behaviour; I see that [statement execute]
> returns a resultset object. I had not noticed that before
> and am very sure that I haven't closed these objects.
>

According to the manual it should already happen with close of the tdbc
statement.

https://www.tcl.tk/man/tcl/TdbcCmd/tdbc_statement.htm
The close object command removes a statement and any result sets that
it has created. All system resources associated with the objects are freed.
!

I only got involved with introspection here after your question. I found
something new about tdbc for myself.

https://www.magicsplat.com/articles/tdbc.html#_introspection

and

https://www.tcl-lang.org/man/tcl/TdbcCmd/tdbc_connection.htm
db statements
The statements object command returns a list of statements that have
been created by prepare and preparecall statements against the given
connection and have not yet been closed.

db resultsets
The resultsets object command returns a list of result sets that have
been obtained by executing statements prepared using the given
connection and not yet closed.

Subject: Re: Problem with tdbc
From: Alan Grunwald
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Tue, 30 Apr 2024 14:23 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam.nurdglaw@gmail.com (Alan Grunwald)
Newsgroups: comp.lang.tcl
Subject: Re: Problem with tdbc
Date: Tue, 30 Apr 2024 15:23:55 +0100
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <v0quuq$2gm06$1@dont-email.me>
References: <v0o5ru$1omi0$1@dont-email.me> <v0osd1$1u46d$1@dont-email.me>
<v0p5aa$206kv$1@dont-email.me> <v0pr88$28jqr$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 30 Apr 2024 16:24:27 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="6ac23b4b0718647390ef3529080f1117";
logging-data="2643974"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+fWHKmiqCDIL1ovQ98L1fEdI1HQp0y+dk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:xdVIqXIvQvaBXsayaJhuE86MvrU=
Content-Language: en-US
In-Reply-To: <v0pr88$28jqr$1@dont-email.me>
View all headers

On 30/04/2024 05:15, greg wrote:
> Am 30.04.24 um 00:00 schrieb Alan Grunwald:
>
>
> > The whole web-scraping activity is wrapped in a try statement; the first
> > statement within the try is begintransation; if execution reaches the
> > end of the try the transaction is committed; if if there is an error,
> > the trasaction is rolled back.
>
> Yes, the better way is to use try.
>
>
>> I've just had another look at the tdbc_statement manual page and I see
>> that this may well be designed behaviour; I see that [statement
>> execute] returns a resultset object. I had not noticed that before
>> and am very sure that I haven't closed these objects.
>>
>
> According to the manual it should already happen with close of the tdbc
> statement.
>
> https://www.tcl.tk/man/tcl/TdbcCmd/tdbc_statement.htm
>  The close object command removes a statement and any result sets that
> it has created. All system resources associated with the objects are freed.
> !
>
>
> I only got involved with introspection here after your question. I found
> something new about tdbc for myself.
>
> https://www.magicsplat.com/articles/tdbc.html#_introspection
>
> and
>
> https://www.tcl-lang.org/man/tcl/TdbcCmd/tdbc_connection.htm
> db statements
>  The statements object command returns a list of statements that have
> been created by prepare and preparecall statements against the given
> connection and have not yet been closed.
>
> db resultsets
> The resultsets object command returns a list of result sets that have
> been obtained by executing statements prepared using the given
> connection and not yet closed.
>
>
>
Executive summary

I had a look through my code and found that I _was_ using execute object
commands in several places and wasn't closing the resultset objects that
they returned. I changed my code so that I _did_ close the returned
resultset objects and that not only prevented the buildup of resultset
objects in my script, but also fixed the server-side prepared statements
problem.

Thank you

tl;dr

My database has a a few columns that refer to rows in other tables, so
the script has lots of code that goes

Execute a SELECT statement find the row in the reference table
if no such row
Execute an INSERT statement to add a reference row
endif

I was using a foreach object command to execute the SELECT statements,
but since they didn't return any database information I used execute for
the INSERT statements. I now understand why that is a VERY BAD idea.

1

rocksolid light 0.9.8
clearnet tor