Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #222: I'm not sure. Try calling the Internet's head office -- it's in the book.


comp / comp.lang.tcl / Re: tdom encoding

SubjectAuthor
* tdom encodingsaito
+* Re: tdom encodinggreg
|+* Re: tdom encodingsaito
||`- Re: tdom encodingRich
|`- Re: tdom encodingAlan Grunwald
+- Re: tdom encodingRich
`* Re: tdom encodingRolf Ade
 `* Re: tdom encodingsaito
  `* Re: tdom encodingHarald Oehlmann
   `- Re: tdom encodingsaito

1
Subject: tdom encoding
From: saito
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Tue, 17 Dec 2024 00:01 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: saitology9@gmail.com (saito)
Newsgroups: comp.lang.tcl
Subject: tdom encoding
Date: Mon, 16 Dec 2024 19:01:02 -0500
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <vjqf00$1c0s9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 17 Dec 2024 01:01:04 +0100 (CET)
Injection-Info: dont-email.me; posting-host="9c74dcd093d167909e7ac65e73c5e755";
logging-data="1442697"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+WSgfUkFLH9DcqiYhgBGdX"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:erSDeyiJ9PDzQTEolnZs4NM6kbM=
Content-Language: en-US
View all headers

I am trying to see why tdom is failing on this json snippet.

package req tdom
set x {{"name":"Jeremi"}}
dom parse -json $x

==> error "JSON syntax error" at position 15
"{"name":"Jeremi <--Error-- "}"

If it doesn't get removed by the newsgroup editors, there is a weird
character at the very end of x. It looks almost like "[]" but it is
not. When you edit it, it acts as if it has multiple characters in it.

Another problem is that tdom man page talks about a command "dom
setResultEncoding ?encodingName?" but trying it results in an unknown
command error.

Subject: Re: tdom encoding
From: greg
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Tue, 17 Dec 2024 02:13 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: gregor.ebbing@gmx.de (greg)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Tue, 17 Dec 2024 03:13:14 +0100
Organization: A noiseless patient Spider
Lines: 82
Message-ID: <vjqmnq$1dblc$1@dont-email.me>
References: <vjqf00$1c0s9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 17 Dec 2024 03:13:15 +0100 (CET)
Injection-Info: dont-email.me; posting-host="b6f99e43277ae82a6f3ec1d641542956";
logging-data="1486508"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QC/Wo+BcK3VZfKKa5ajOTX/F9Bvz2Ogc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:vrg+FRizF5IN1I/kW+mOwgTOOA0=
In-Reply-To: <vjqf00$1c0s9$1@dont-email.me>
Content-Language: de-DE
View all headers

Am 17.12.24 um 01:01 schrieb saito:
> I am trying to see why tdom is failing on this json snippet.
>
> package req tdom
> set x {{"name":"Jeremi"}}
> dom parse -json $x
>
> ==> error "JSON syntax error" at position 15
> "{"name":"Jeremi <--Error-- "}"
>
>
> If it doesn't get removed by the newsgroup editors, there is a weird
> character at the very end of x.  It looks almost like "[]" but it is
> not.  When you edit it, it acts as if it has multiple characters in it.
>
>
> Another problem is that tdom man page talks about a command "dom
> setResultEncoding ?encodingName?" but trying it results in an unknown
> command error.
>
Hello,

The unknown character is 007 or BELL.
Probably not allowed as a char in string.
Instead: \u0007

Gregor

package req tdom

proc chr c {
if {[string length $c] > 1 } {
error "chr: arg should be a single char"
}
set v 0
scan $c %c v
return $v
}

# Check character types and provide additional information
proc charInfo char {
if {[string is control $char]} {
return "control character"
} elseif {[string is space $char]} {
return "space character"
} elseif {[string is digit $char]} {
return "digit character"
} elseif {[string is lower $char]} {
return "lowercase alphabetic character"
} elseif {[string is upper $char]} {
return "uppercase alphabetic character"
} elseif {[string is punct $char]} {
return "punctuation character"
} elseif {[string is graph $char]} {
return "graphical character"
} elseif {[string is print $char]} {
return "printable character"
} else {
return "unknown character type"
}
}

proc infochar {x} {
puts $x
set i 0
while {$i<[string length $x]} {
set c [string index $x $i]
puts "$i is $c [charInfo $c] [chr $c] "
incr i
}
}

set x {{"name":"Jeremi"}}
infochar $x
catch {dom parse -json $x} mess
puts "mess: $mess"

set x {{"name":"Jeremi\u0007"}}
set doc [dom parse -json $x]
puts [$doc asXML]

Subject: Re: tdom encoding
From: Rich
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Tue, 17 Dec 2024 04:20 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Tue, 17 Dec 2024 04:20:54 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <vjqu76$1i1a9$1@dont-email.me>
References: <vjqf00$1c0s9$1@dont-email.me>
Injection-Date: Tue, 17 Dec 2024 05:20:55 +0100 (CET)
Injection-Info: dont-email.me; posting-host="876264b18c00e4986a1fe1d9ed3feb32";
logging-data="1639753"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KFQLfGddsaCbImOqOVIur"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Cancel-Lock: sha1:n91MHD5LQ0kBMnLMDbKUboia7aE=
View all headers

saito <saitology9@gmail.com> wrote:
> I am trying to see why tdom is failing on this json snippet.
>
> package req tdom
> set x {{"name":"Jeremi^G"}}
> dom parse -json $x
>
> ==> error "JSON syntax error" at position 15
> "{"name":"Jeremi^G <--Error-- "}"

Assuming the ^G that did come through properly represnts the
character, then greg is right, it is an ASCII bell character, and per
the JSON spec [1] raw control characters are not allowed to be part of
a JSON string.

Which is why Tdom is telling you 'error' at the ^G output.

Are you on linux? If yes the hexdump, objdump, or xxd (xxd is easiest
to use) commands will show you exactly what raw byte values exist in
the file.

[1] https://www.json.org/json-en.html

Subject: Re: tdom encoding
From: saito
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Tue, 17 Dec 2024 04:51 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: saitology9@gmail.com (saito)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Mon, 16 Dec 2024 23:51:11 -0500
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <vjr001$1ibvi$1@dont-email.me>
References: <vjqf00$1c0s9$1@dont-email.me> <vjqmnq$1dblc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 17 Dec 2024 05:51:13 +0100 (CET)
Injection-Info: dont-email.me; posting-host="5317140b3eb0a6513a6ec3b374baabad";
logging-data="1650674"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19WTg64aOOHyiVvvlCjmmkp"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:urfN8VxhFFc9OcZC0+lQ1alIC1E=
Content-Language: en-US
In-Reply-To: <vjqmnq$1dblc$1@dont-email.me>
View all headers

On 12/16/2024 9:13 PM, greg wrote:

> Hello,
>
> The unknown character is 007 or BELL.
> Probably not allowed as a char in  string.
> Instead: \u0007
>
> Gregor
>

Thank you and Rich for the wonderful info and the code.

The json data is what I receive from an api. I first thought it had to
do with encoding issues. It happens frequently so I maybe I will ask
them to be more careful with their json data generation.

Subject: Re: tdom encoding
From: Rich
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Tue, 17 Dec 2024 04:59 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rich@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Tue, 17 Dec 2024 04:59:22 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <vjr0fa$1i1a9$3@dont-email.me>
References: <vjqf00$1c0s9$1@dont-email.me> <vjqmnq$1dblc$1@dont-email.me> <vjr001$1ibvi$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 17 Dec 2024 05:59:22 +0100 (CET)
Injection-Info: dont-email.me; posting-host="876264b18c00e4986a1fe1d9ed3feb32";
logging-data="1639753"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+WS0KmLyKPSM+DKvhgXes1"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Cancel-Lock: sha1:nrM9XJUpzcGKMZ9QVRJ4ORokQoE=
View all headers

saito <saitology9@gmail.com> wrote:
> On 12/16/2024 9:13 PM, greg wrote:
>
>> Hello,
>>
>> The unknown character is 007 or BELL.
>> Probably not allowed as a char in  string.
>> Instead: \u0007
>>
>> Gregor
>>
>
> Thank you and Rich for the wonderful info and the code.
>
> The json data is what I receive from an api. I first thought it had
> to do with encoding issues. It happens frequently so I maybe I will
> ask them to be more careful with their json data generation.

If you are getting it from an API then you've found a bug if the API
is /really/ sending raw control characters as part of a JSON string.

Subject: Re: tdom encoding
From: Rolf Ade
Newsgroups: comp.lang.tcl
Organization: Me
Date: Wed, 18 Dec 2024 14:04 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: rolf@pointsman.de (Rolf Ade)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Wed, 18 Dec 2024 15:04:07 +0100
Organization: Me
Lines: 29
Message-ID: <87y10drstk.fsf@pointsman.de>
References: <vjqf00$1c0s9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net hw38I0Q5Gl+pNNx5RAqfcAj2kaqIwOjNYA5dJxyOaG89UDHZk=
Cancel-Lock: sha1:YWjmqi5bEziCB69KS9CjArQDEkc= sha1:CgpvJ4SKdJYO6a9yzxj6IdML6ys= sha256:TPf0PJYWJGRhxEHVnD9UP2JNUqAwYeHffR7HWdHoh/A=
User-Agent: Gnus/5.13 (Gnus v5.13)
View all headers

saito <saitology9@gmail.com> writes:
> I am trying to see why tdom is failing on this json snippet.
>
> package req tdom
> set x {{"name":"Jeremi"}}
> dom parse -json $x
>
> ==> error "JSON syntax error" at position 15
> "{"name":"Jeremi <--Error-- "}"

Rich already pointed out rightly that control characters are not allowed
literally in JSON strings. As tDOM rightly complains your input is not
JSON.

[snip]
> Another problem is that tdom man page talks about a command "dom
> setResultEncoding ?encodingName?" but trying it results in an unknown
> command error.

You obviously use a (very) old tDOM version. The dom method
setResultEncoding is a relict out of the times as tDOM still supported
Tcl 8.0 (and the functionality was only needed / useful if build/used
with Tcl 8.0).

The documentation and implementation of this method was removed with
tDOM 0.9.1 (more than six years ago). Most recent version is 0.9.5.

rolf

Subject: Re: tdom encoding
From: saito
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Wed, 18 Dec 2024 19:57 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: saitology9@gmail.com (saito)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Wed, 18 Dec 2024 14:57:11 -0500
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <vjv9en$2ehb6$1@dont-email.me>
References: <vjqf00$1c0s9$1@dont-email.me> <87y10drstk.fsf@pointsman.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 18 Dec 2024 20:57:12 +0100 (CET)
Injection-Info: dont-email.me; posting-host="0bcc21d4a77cde843b62e2e6b200cac1";
logging-data="2573670"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Cq7i4oSJ5qeAnxffvR6oQ"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ANORH2uau93rnyHZkVUOWIXic/k=
In-Reply-To: <87y10drstk.fsf@pointsman.de>
Content-Language: en-US
View all headers

On 12/18/2024 9:04 AM, Rolf Ade wrote:
>
> You obviously use a (very) old tDOM version. The dom method
> setResultEncoding is a relict out of the times as tDOM still supported
> Tcl 8.0 (and the functionality was only needed / useful if build/used
> with Tcl 8.0).
>
> The documentation and implementation of this method was removed with
> tDOM 0.9.1 (more than six years ago). Most recent version is 0.9.5.
>

Thanks for the info. I am using version 0.9.5 I downloaded from its
official site some time ago. It comes with no documentation so I did an
internet search. I guess that piece of info is from an outdated web
page obviously, which I kind of guessed.

Subject: Re: tdom encoding
From: Harald Oehlmann
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Wed, 18 Dec 2024 20:49 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wortkarg3@yahoo.com (Harald Oehlmann)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Wed, 18 Dec 2024 21:49:14 +0100
Organization: A noiseless patient Spider
Lines: 7
Message-ID: <vjvcga$2cou9$1@dont-email.me>
References: <vjqf00$1c0s9$1@dont-email.me> <87y10drstk.fsf@pointsman.de>
<vjv9en$2ehb6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 18 Dec 2024 21:49:18 +0100 (CET)
Injection-Info: dont-email.me; posting-host="a7d95710d283a97ea99aa7edc8614a3c";
logging-data="2515913"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19YBUNsUC7TrLkoth4KiM1V"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:uAqL9M1/fxbcVcLSF8OB1Q4CQmg=
In-Reply-To: <vjv9en$2ehb6$1@dont-email.me>
Content-Language: en-GB
View all headers

Am 18.12.2024 um 20:57 schrieb saito:
> Thanks for the info. I am using version 0.9.5 I downloaded from its
> official site some time ago.  It comes with no documentation so I did an
> internet search.  I guess that piece of info is from an outdated web
> page obviously, which I kind of guessed.

http://tdom.org/index.html/doc/trunk/doc/index.html

Subject: Re: tdom encoding
From: saito
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Wed, 18 Dec 2024 22:29 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: saitology9@gmail.com (saito)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Wed, 18 Dec 2024 17:29:54 -0500
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <vjvid2$2g573$1@dont-email.me>
References: <vjqf00$1c0s9$1@dont-email.me> <87y10drstk.fsf@pointsman.de>
<vjv9en$2ehb6$1@dont-email.me> <vjvcga$2cou9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 18 Dec 2024 23:29:55 +0100 (CET)
Injection-Info: dont-email.me; posting-host="534d76ab6215f49a31844de988229406";
logging-data="2626787"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+5YXllmVfhH+SktzrP0Ob2"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:HDsOalI/DOqogbK6lZ6FgxuU4rg=
Content-Language: en-US
In-Reply-To: <vjvcga$2cou9$1@dont-email.me>
View all headers

On 12/18/2024 3:49 PM, Harald Oehlmann wrote:
> Am 18.12.2024 um 20:57 schrieb saito:
>> Thanks for the info. I am using version 0.9.5 I downloaded from its
>> official site some time ago.  It comes with no documentation so I did
>> an internet search.  I guess that piece of info is from an outdated
>> web page obviously, which I kind of guessed.
>
> http://tdom.org/index.html/doc/trunk/doc/index.html

Thanks, good to know.

Subject: Re: tdom encoding
From: Alan Grunwald
Newsgroups: comp.lang.tcl
Organization: A noiseless patient Spider
Date: Thu, 19 Dec 2024 16:36 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam.nurdglaw@gmail.com (Alan Grunwald)
Newsgroups: comp.lang.tcl
Subject: Re: tdom encoding
Date: Thu, 19 Dec 2024 16:36:20 +0000
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <vk1i4j$2ug1o$1@dont-email.me>
References: <vjqf00$1c0s9$1@dont-email.me> <vjqmnq$1dblc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 19 Dec 2024 17:37:39 +0100 (CET)
Injection-Info: dont-email.me; posting-host="07689d641a9ee56fc27f0caf8ec297c4";
logging-data="3096632"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+gowIScvo16NmqzeIaMTYCVoyF0ZPB4bw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:lil4zyqvSKIp53vNJMKxt3KTnqw=
Content-Language: en-US
In-Reply-To: <vjqmnq$1dblc$1@dont-email.me>
View all headers

On 17/12/2024 02:13, greg wrote:

<snip>

> proc chr c {
>   if {[string length $c] > 1 } {
>     error "chr: arg should be a single char"
>   }
>   set v 0
>   scan $c %c v
>   return $v
> }
>
> # Check character types and provide additional information
> proc charInfo char {
>   if {[string is control $char]} {
>     return "control character"
>   } elseif {[string is space $char]} {
>     return "space character"
>   } elseif {[string is digit $char]} {
>     return "digit character"
>   } elseif {[string is lower $char]} {
>     return "lowercase alphabetic character"
>   } elseif {[string is upper $char]} {
>     return "uppercase alphabetic character"
>   } elseif {[string is punct $char]} {
>     return "punctuation character"
>   } elseif {[string is graph $char]} {
>     return "graphical character"
>   } elseif {[string is print $char]} {
>     return "printable character"
>   } else {
>     return "unknown character type"
>   }
> }<snip>

Many thanks from me too for the above procs, which have made their way
(with acknowledgement) into my personal library of utility routines.

Alan

1

rocksolid light 0.9.8
clearnet tor