Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

You have been selected for a secret mission.


comp / comp.text.pdf / Re: How to remove a link in a PDF that is found in a thousand pages

SubjectAuthor
* How to remove a link in a PDF that is found in a thousand pagesAndrew
+- Re: How to remove a link in a PDF that is found in a thousand pagesPaul
+- Re: How to remove a link in a PDF that is found in a thousand pagesLawrence D'Oliveiro
+- Re: How to remove a link in a PDF that is found in a thousand pagesKingfisher
+- Re: How to remove a link in a PDF that is found in a thousand pagesHerbert Kleebauer
+- Re: How to remove a link in a PDF that is found in a thousand pagesPeter Johnson
`- Re: How to remove a link in a PDF that is found in a thousand pagesPeter Flynn

1
Subject: How to remove a link in a PDF that is found in a thousand pages
From: Andrew
Newsgroups: alt.comp.os.windows-10, comp.text.pdf, comp.editors
Organization: BWH Usenet Archive (https://usenet.blueworldhosting.com)
Date: Fri, 24 May 2024 00:04 UTC
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!tncsrv06.tnetconsulting.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: andrew@spam.net (Andrew)
Newsgroups: alt.comp.os.windows-10,comp.text.pdf,comp.editors
Subject: How to remove a link in a PDF that is found in a thousand pages
Date: Fri, 24 May 2024 00:04:52 -0000 (UTC)
Organization: BWH Usenet Archive (https://usenet.blueworldhosting.com)
Message-ID: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 24 May 2024 00:04:52 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="65777"; mail-complaints-to="usenet@blueworldhosting.com"
Cancel-Lock: sha1:f9A6OH//WYWxWMlrTWfldKybpcs= sha256:9K2Y89CC5Kjn2vqvvQnnyaBSGyZ2rho3xjMqgYFNNYc=
sha1:wu9NWZ+WF62EgOZkyXn+YUthcwg= sha256:96hR3MgEzFFLV2Q/niHMWSIXTYZ08HNcoU+ILSamuWo=
X-Newsreader: PiaoHong.Usenet.Client.Free:1.65
View all headers

I have a PDF with a link in it of the form:
http://domain.com
in a million places (usually at the top, bottom or middle of a page that is
mostly empty - where all I want to do is delete it completely.

I want to delete those links, and the only PDF editor I know of that will
delete them easily is the Adobe Acrobat (writer) but it deletes them one by
one. Yuck. I'm doing that, but is there a better way?

Googling, I find that Calibre will delete them but oh my god, is that a
complicated action, where you have do css rules and crazy stuff like that.

You can't just search and replace for some godforsaken reason.

Hence I implore you for help... where the PDF can be easily converted to
any epub format if there's another way other than a PDF editor to do it.

Subject: Re: How to remove a link in a PDF that is found in a thousand pages
From: Paul
Newsgroups: alt.comp.os.windows-10, comp.text.pdf, comp.editors
Organization: A noiseless patient Spider
Date: Fri, 24 May 2024 03:00 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nospam@needed.invalid (Paul)
Newsgroups: alt.comp.os.windows-10,comp.text.pdf,comp.editors
Subject: Re: How to remove a link in a PDF that is found in a thousand pages
Date: Thu, 23 May 2024 23:00:28 -0400
Organization: A noiseless patient Spider
Lines: 129
Message-ID: <v2ovse$259bh$1@dont-email.me>
References: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 May 2024 05:00:31 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2f0d1b7aac18875650d043940c54aaa9";
logging-data="2270577"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Eu4xI84hg82PRPKZtw/gChvJ2hAlt47Q="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:iUcHrSiNdH6QxGcAaJufWeloZxc=
Content-Language: en-US
In-Reply-To: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
View all headers

On 5/23/2024 8:04 PM, Andrew wrote:
> I have a PDF with a link in it of the form:
> http://domain.com
> in a million places (usually at the top, bottom or middle of a page that is
> mostly empty - where all I want to do is delete it completely.
>
> I want to delete those links, and the only PDF editor I know of that will
> delete them easily is the Adobe Acrobat (writer) but it deletes them one by
> one. Yuck. I'm doing that, but is there a better way?
>
> Googling, I find that Calibre will delete them but oh my god, is that a
> complicated action, where you have do css rules and crazy stuff like that.
>
> You can't just search and replace for some godforsaken reason.
>
> Hence I implore you for help... where the PDF can be easily converted to
> any epub format if there's another way other than a PDF editor to do it.
>

PDF files are normally "binary" in appearance. But they can be
translated to "ascii". Notice there is a gubbin near the top, which
is not ASCII, and that continues to make the file binary. For example,
some scripting you might do, might have an issue with the four binary
characters. (That binary thing, could be different on a different
version of PDF file.)

I don't know if this file has integrity or not. It's just
intended to show how simple the format could have been. (Normal files
will NOT be simple, so you can forget that right now.)

*********************** PDF in Text Mode ***********************
%PDF-1.1
%¥±ë

1 0 obj
<< /Type /Catalog
/Pages 2 0 R
>>
endobj

2 0 obj
<< /Type /Pages
/Kids [3 0 R]
/Count 1
/MediaBox [0 0 300 144]
>>
endobj

3 0 obj
<< /Type /Page
/Parent 2 0 R
/Resources
<< /Font
<< /F1
<< /Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
>>
>>
/Contents 4 0 R
>>
endobj

4 0 obj
<< /Length 55 >>
stream
BT
/F1 18 Tf
0 0 Td
(Hello World) Tj
ET
endstream
endobj

xref
0 5
0000000000 65535 f
0000000018 00000 n
0000000077 00000 n
0000000178 00000 n
0000000457 00000 n
trailer
<< /Root 1 0 R
/Size 5
>>
startxref
565
%%EOF
*********************** PDF in Text Mode ***********************

If you just delete the string in question, it's going to say
"this file is damaged".

The document has consistency checks, and that's how it can
tell the file has been edited.

You can tell from this, they were just screwing with us. The
format before this, PostScript, didn't have counters. When you
found a section in PostScript that said "Do not delete this section",
you just deleted it :-) Well, when they invented PDF, they messed
with it a bit, in the bomb-squad sense.

Adobe makes a "book" available about the PDF standard, and
you could use that. But that's a learning experience.

The only command of note, in my Notes file, is this, and I have
not placed any comments to tell me what it does :-) This makes
the ASCII-like flavor of file.

mutool.exe convert -F pdf -O decompress,clean -o output.pdf input.pdf

And when we talk of "binary to ascii", there is DEFINITELY binary
still in there. The commercial fonts can be encoded somehow, and they are
still transferred as a binary blob. If not handled properly, you will break
the fonts. This puts some constraints on how you work on the file, for sure.
I could use HxD for example, while keeping another tool open to better
be able to read the file as the ASCII portion.

There are various ways to obscure text in the document. Even in
"ASCII mode", nothing says you will see "https://www.something.com".
You might see bunches of numbers instead. If this string of yours
is intended as a watermark, then of course the file will be augmented
for maximum annoyance. A lot of the watermarks we played with as kids,
they were not hardened. You might have concluded nobody cared to do
a good job. I can assure you that some commercial tools, definitely
take their watermark design seriously.

Paul

Subject: Re: How to remove a link in a PDF that is found in a thousand pages
From: Lawrence D'Oliv
Newsgroups: alt.comp.os.windows-10, comp.text.pdf, comp.editors
Organization: A noiseless patient Spider
Date: Fri, 24 May 2024 03:45 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: alt.comp.os.windows-10,comp.text.pdf,comp.editors
Subject: Re: How to remove a link in a PDF that is found in a thousand pages
Date: Fri, 24 May 2024 03:45:51 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 8
Message-ID: <v2p2hf$25i62$3@dont-email.me>
References: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 May 2024 05:45:52 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="0cff0ba9dc44f1983d14765f610aa6bd";
logging-data="2279618"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18IrhjjCXmXJBxKW8SxXZKz"
User-Agent: Pan/0.155 (Kherson; fc5a80b8)
Cancel-Lock: sha1:LJ/omqfiijTIRS0PEAXmqAgE+/E=
View all headers

On Fri, 24 May 2024 00:04:52 -0000 (UTC), Andrew wrote:

> ... is there a better way?

Write a program using a PDF-manipulation toolkit.

I have had good results writing Python code using pikepdf
<https://github.com/pikepdf/pikepdf>.

Subject: Re: How to remove a link in a PDF that is found in a thousand pages
From: Kingfisher
Newsgroups: alt.comp.os.windows-10, comp.text.pdf, comp.editors
Organization: The Random Precision Radio Network
Date: Fri, 24 May 2024 06:00 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bottled@source.com (Kingfisher)
Newsgroups: alt.comp.os.windows-10,comp.text.pdf,comp.editors
Subject: Re: How to remove a link in a PDF that is found in a thousand pages
Date: Thu, 23 May 2024 23:00:35 -0700
Organization: The Random Precision Radio Network
Lines: 20
Message-ID: <v2pae5$26sva$1@dont-email.me>
References: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 24 May 2024 08:00:38 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b1b51100ec6a40ce45f95ed378bddaf9";
logging-data="2323434"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18oJo7+VBGRP+CutkSqh6UiooGkdyPp7DE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:n7XmnV01GWPylKxcC7XxhSwHaqU=
Content-Language: en-US
In-Reply-To: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
View all headers

On 5/23/24 17:04, Andrew wrote:
> I have a PDF with a link in it of the form:
> http://domain.com
> in a million places (usually at the top, bottom or middle of a page that is
> mostly empty - where all I want to do is delete it completely.
>
> I want to delete those links, and the only PDF editor I know of that will
> delete them easily is the Adobe Acrobat (writer) but it deletes them one by
> one. Yuck. I'm doing that, but is there a better way?
>
> Googling, I find that Calibre will delete them but oh my god, is that a
> complicated action, where you have do css rules and crazy stuff like that.
>
> You can't just search and replace for some godforsaken reason.
>
> Hence I implore you for help... where the PDF can be easily converted to
> any epub format if there's another way other than a PDF editor to do it.

LibreOffice Writer will open PDF, edit, and export as PDF. It has a Find
and Replace function that can get all the links in one shot.

Subject: Re: How to remove a link in a PDF that is found in a thousand pages
From: Herbert Kleebauer
Newsgroups: alt.comp.os.windows-10, comp.text.pdf, comp.editors
Organization: A noiseless patient Spider
Date: Fri, 24 May 2024 06:40 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: klee@unibwm.de (Herbert Kleebauer)
Newsgroups: alt.comp.os.windows-10,comp.text.pdf,comp.editors
Subject: Re: How to remove a link in a PDF that is found in a thousand pages
Date: Fri, 24 May 2024 08:40:21 +0200
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <v2pcok$27266$1@dont-email.me>
References: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 24 May 2024 08:40:21 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3a208d4bc7cb8d9ca5a096e889f9bf58";
logging-data="2328774"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/7nH+ECQFPDj85h/EICULNAWf1dLg32wQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:NB1CGwunMfmT3cZBM+GfcXiAXhg=
Content-Language: en-US
In-Reply-To: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
View all headers

On 24.05.2024 02:04, Andrew wrote:
> I have a PDF with a link in it of the form:
> http://domain.com
> in a million places (usually at the top, bottom or middle of a page that is
> mostly empty - where all I want to do is delete it completely.
>
> I want to delete those links, and the only PDF editor I know of that will
> delete them easily is the Adobe Acrobat (writer) but it deletes them one by
> one. Yuck. I'm doing that, but is there a better way?

If you have Acrobat, save the file as uncompressed pdf. If you are
lucky, you will find "http://domain.com" as simple text in the file.
Replace any occurrence with exactly the same number of blanks. But
you have to use an Editor which preserves the few binary bytes at
the beginning of the file.

Subject: Re: How to remove a link in a PDF that is found in a thousand pages
From: Peter Johnson
Newsgroups: alt.comp.os.windows-10, comp.text.pdf, comp.editors
Organization: A noiseless patient Spider
Date: Fri, 24 May 2024 15:04 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: peter@parksidewood.nospam (Peter Johnson)
Newsgroups: alt.comp.os.windows-10,comp.text.pdf,comp.editors
Subject: Re: How to remove a link in a PDF that is found in a thousand pages
Date: Fri, 24 May 2024 16:04:00 +0100
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <fsa15j1ml0pbrd6ergso96fbreii8lq04b@4ax.com>
References: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 24 May 2024 17:03:58 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f80183eee815d37b37916d4ba468216a";
logging-data="2505931"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19EMi9I7g1h7onuSpxQ9VwHEwoKUIHzvmw="
User-Agent: ForteAgent/8.00.32.1272
Cancel-Lock: sha1:YuAJWMFCcU3nf1d44kxzEcCZEzc=
View all headers

On Fri, 24 May 2024 00:04:52 -0000 (UTC), Andrew <andrew@spam.net>
wrote:

>I have a PDF with a link in it of the form:
> http://domain.com
>in a million places (usually at the top, bottom or middle of a page that is
>mostly empty - where all I want to do is delete it completely.
>
>I want to delete those links, and the only PDF editor I know of that will
>delete them easily is the Adobe Acrobat (writer) but it deletes them one by
>one. Yuck. I'm doing that, but is there a better way?
>
>Googling, I find that Calibre will delete them but oh my god, is that a
>complicated action, where you have do css rules and crazy stuff like that.
>
>You can't just search and replace for some godforsaken reason.
>
>Hence I implore you for help... where the PDF can be easily converted to
>any epub format if there's another way other than a PDF editor to do it.

How important is the formatting?
You could extract the text into a Word (or similar) file, run
find/exchange on it and then create a new PDF. Which might or might
not change the formatting, but you could probably fix that before you
created the new PDF.

Subject: Re: How to remove a link in a PDF that is found in a thousand pages
From: Peter Flynn
Newsgroups: comp.text.pdf, comp.editors
Organization: Usenet Labs Bozon Detector Facility
Date: Sun, 26 May 2024 21:04 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: peter@silmaril.ie (Peter Flynn)
Newsgroups: comp.text.pdf,comp.editors
Subject: Re: How to remove a link in a PDF that is found in a thousand pages
Date: Sun, 26 May 2024 22:04:13 +0100
Organization: Usenet Labs Bozon Detector Facility
Lines: 18
Message-ID: <lbhmedFeld4U1@mid.individual.net>
References: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 66Pm4ATgaxrTk4m4DX0sLgKFE7ZhNmES0L1mSgzdOO/Lm9X1K0
Cancel-Lock: sha1:Yua0YryjaB/oYO4njvUAMkA+DSI= sha256:p0Bk/LR80O/H0EcauCyEVQP0dODHh76lOtz/75NtwFk=
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <v2olj3$207h$1@nnrp.usenet.blueworldhosting.com>
View all headers

On 24/05/2024 01:04, Andrew wrote:
> I have a PDF with a link in it of the form:
> http://domain.com
> in a million places (usually at the top, bottom or middle of a page
> that is mostly empty - where all I want to do is delete it
> completely.
>
> I want to delete those links, and the only PDF editor I know of that
> will delete them easily is the Adobe Acrobat (writer) but it deletes
> them one by one. Yuck. I'm doing that, but is there a better way?

I have in the past had good success by converting the document to
Postscript, finding the pattern of the offending links, and running a
stream editor, then converting back to PDF, eg

pdf2ps foo.pdf | sed -e "s+http://domain.com++g" | ps2pdf >foo2.pdf

Peter

1

rocksolid light 0.9.8
clearnet tor