Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

BOFH excuse #79: Look, buddy: Windows 3.1 IS A General Protection Fault.


comp / comp.text.pdf / Re: OCR on Windows

SubjectAuthor
* OCR on WindowsBill Powell
+* Re: OCR on Windowsmicky
|+- Re: OCR on WindowsBill Powell
|`* Re: OCR on WindowsPeter Flynn
| `- Re: OCR on WindowsPaul
+* Re: OCR on WindowsNewyana2
|+* Re: OCR on Windowsmicky
||`* Re: OCR on WindowsJeff Barnett
|| `* Re: OCR on WindowsNewyana2
||  `* Re: OCR on Windowsmicky
||   `- Re: OCR on WindowsWolf Greenblatt
|`* Re: OCR on WindowsPaul in Houston TX
| `* Re: OCR on WindowsNick Cine
|  `- Re: OCR on WindowsBill Powell
+* Re: OCR on Windowscable shill
|`* Re: OCR on WindowsStan Brown
| `* Re: OCR on WindowsJan K.
|  `- Re: OCR on WindowsBig Al
+* Re: OCR on WindowsStan Brown
|+* Re: OCR on WindowsNewyana2
||`- Re: OCR on Windowsdavid
|`* Re: OCR on WindowsEnrico Papaloma
| `* Re: OCR on WindowsJoerg Walther
|  `- Re: OCR on WindowsStan Brown
+* Re: OCR on WindowsHerbert Kleebauer
|+* Re: OCR on Windowsknuttle
||+- Re: OCR on WindowsIsaac Montara
||+- Re: OCR on WindowsStan Brown
||`* Re: Irfanview on Windowswasbit
|| `* Re: Irfanview on WindowsSteve Hayes
||  `- Re: Irfanview on WindowsAndrew
|`* Re: OCR on WindowsStan Brown
| +* Re: OCR on WindowsJørgen Nielsen
| |`- Re: OCR on WindowsStan Brown
| `* Re: OCR on WindowsHerbert Kleebauer
|  +* Re: OCR on WindowsPaul
|  |`- Re: OCR on WindowsHerbert Kleebauer
|  +* Re: OCR on WindowsStan Brown
|  |`- Re: OCR on WindowsPaul
|  `- Re: OCR on Windowswasbit
+- Re: OCR on WindowsJim the Geordie
`- Re: OCR on WindowsMr. Man-wai Chang

Pages:12
Subject: OCR on Windows
From: Bill Powell
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Hispagatos.org
Date: Sun, 14 Jul 2024 00:46 UTC
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.hispagatos.org!.POSTED!not-for-mail
From: bill@anarchists.org (Bill Powell)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: OCR on Windows
Date: Sun, 14 Jul 2024 02:46:04 +0200
Organization: Hispagatos.org
Message-ID: <v6v74c$80bq$1@matrix.hispagatos.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-15"; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Jul 2024 00:46:05 -0000 (UTC)
Injection-Info: matrix.hispagatos.org;
logging-data="262522"; mail-complaints-to="abuse@hispagatos.org"
User-Agent: XanaNews/1.19.1.372 (x86; Portable ISpell)
View all headers

I have a series of one-page PDFs that are really images and not text even
though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

Subject: Re: OCR on Windows
From: micky
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Tweaknews
Date: Sun, 14 Jul 2024 01:57 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!border-1.nntp.ord.giganews.com!border-3.nntp.ord.giganews.com!nntp.giganews.com!news-out.netnews.com!s1-1.netnews.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!feeder.cambriumusenet.nl!feed.tweaknews.nl!posting.tweaknews.nl!fx12.ams1.POSTED!not-for-mail
From: NONONOmisc07@fmguy.com (micky)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Message-ID: <k1c69jdmh1hj59pc5nbdaiefs8aak31u84@4ax.com>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
X-Newsreader: Forte Agent 5.00/32.1171
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Antivirus: AVG (VPS 240713-4, 7/13/2024), Outbound message
X-Antivirus-Status: Clean
Lines: 11
X-Complaints-To: abuse@tweaknews.nl
NNTP-Posting-Date: Sun, 14 Jul 2024 01:59:41 UTC
Organization: Tweaknews
Date: Sat, 13 Jul 2024 21:57:19 -0400
X-Received-Bytes: 1330
X-Original-Bytes: 1143
View all headers

In alt.comp.os.windows-10, on Sun, 14 Jul 2024 02:46:04 +0200, Bill
Powell <bill@anarchists.org> wrote:

>I have a series of one-page PDFs that are really images and not text even
>though they look like they're just a page of simple text in the same font.
>
>Is there a way to easily OCR a PDF to actual text on Windows for free?

Aren't there lots of websites that do this, but you have to upload the
file. I've resisted that but would be really happpy if I could do it
inside my computer.

Subject: Re: OCR on Windows
From: Newyana2
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 02:22 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: newyana@invalid.nospam (Newyana2)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sat, 13 Jul 2024 22:22:11 -0400
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <v6vco6$3v9nu$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 04:21:58 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a0598b6a1b4577e9a5ee2d9fd1cc1424";
logging-data="4171518"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/sRz7IcYJ5Zc+Fv25rZbAC7k3CDYG7wQ="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.3.1
Cancel-Lock: sha1:G03vuQUhy1VTRWQa1f+Sg+LoKk4=
Content-Language: en-US
In-Reply-To: <v6v74c$80bq$1@matrix.hispagatos.org>
View all headers

On 7/13/2024 8:46 PM, Bill Powell wrote:
> I have a series of one-page PDFs that are really images and not text even
> though they look like they're just a page of simple text in the same font.
>
> Is there a way to easily OCR a PDF to actual text on Windows for free?

I have a program called FreeOCR that will do it without having to scan
or extract the pages. Quality depends on fonts, words, etc, but general
it comes out well.

Subject: Re: OCR on Windows
From: micky
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Tweaknews
Date: Sun, 14 Jul 2024 02:52 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!feeder.cambriumusenet.nl!feed.tweaknews.nl!posting.tweaknews.nl!fx12.ams1.POSTED!not-for-mail
From: NONONOmisc07@fmguy.com (micky)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Message-ID: <6bf69j98eeh9ra8pj8ftqv1hlaeqjikf9k@4ax.com>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <v6vco6$3v9nu$1@dont-email.me>
X-Newsreader: Forte Agent 5.00/32.1171
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Antivirus: AVG (VPS 240713-4, 7/13/2024), Outbound message
X-Antivirus-Status: Clean
Lines: 16
X-Complaints-To: abuse@tweaknews.nl
NNTP-Posting-Date: Sun, 14 Jul 2024 02:55:14 UTC
Organization: Tweaknews
Date: Sat, 13 Jul 2024 22:52:52 -0400
X-Received-Bytes: 1535
View all headers

In alt.comp.os.windows-10, on Sat, 13 Jul 2024 22:22:11 -0400, Newyana2
<newyana@invalid.nospam> wrote:

>On 7/13/2024 8:46 PM, Bill Powell wrote:
>> I have a series of one-page PDFs that are really images and not text even
>> though they look like they're just a page of simple text in the same font.
>>
>> Is there a way to easily OCR a PDF to actual text on Windows for free?
>
> I have a program called FreeOCR that will do it without having to scan
>or extract the pages. Quality depends on fonts, words, etc, but general
>it comes out well.

http://www.freeocr.net/
http://www.paperfile.net/
https://www.google.com/search?client=firefox-b-1-d&q=FreeOCR

Subject: Re: OCR on Windows
From: Bill Powell
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Hispagatos.org
Date: Sun, 14 Jul 2024 03:02 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.hispagatos.org!.POSTED!not-for-mail
From: bill@anarchists.org (Bill Powell)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 05:02:26 +0200
Organization: Hispagatos.org
Message-ID: <v6vf42$87lj$1@matrix.hispagatos.org>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <k1c69jdmh1hj59pc5nbdaiefs8aak31u84@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-15"; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Jul 2024 03:02:27 -0000 (UTC)
Injection-Info: matrix.hispagatos.org;
logging-data="270003"; mail-complaints-to="abuse@hispagatos.org"
User-Agent: XanaNews/1.19.1.372 (x86; Portable ISpell)
View all headers

On Sat, 13 Jul 2024 21:57:19 -0400, micky wrote:

>>Is there a way to easily OCR a PDF to actual text on Windows for free?
>
> Aren't there lots of websites that do this, but you have to upload the
> file. I've resisted that but would be really happpy if I could do it
> inside my computer.

These are scanned medical records.

Subject: Re: OCR on Windows
From: cable_shill@comcast.net
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Committee to Wire The Planet
Date: Sun, 14 Jul 2024 04:06 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!border-1.nntp.ord.giganews.com!border-3.nntp.ord.giganews.com!nntp.giganews.com!news-out.netnews.com!s1-1.netnews.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
From: cable_shill@comcast.net
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Organization: Committee to Wire The Planet
Message-ID: <jlj69j558op4ftd36g1fjj8b1507f0av38@4ax.com>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
X-Newsreader: Forte Agent 4.2/32.1118
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 10
X-Complaints-To: abuse@easynews.com
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Sat, 13 Jul 2024 21:06:54 -0700
X-Received-Bytes: 1046
X-Original-Bytes: 914
View all headers

Windows Power Toys - Text extractor.

On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell <bill@anarchists.org>
wrote:

>I have a series of one-page PDFs that are really images and not text even
>though they look like they're just a page of simple text in the same font.
>
>Is there a way to easily OCR a PDF to actual text on Windows for free?

Subject: Re: OCR on Windows
From: Paul in Houston TX
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 04:23 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Paul@Houston.Texas (Paul in Houston TX)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sat, 13 Jul 2024 23:23:55 -0500
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <v6vk3o$blt$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
<v6vco6$3v9nu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Jul 2024 06:27:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="89bc86740fc0b582edd8c76dee995efd";
logging-data="11965"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+K8jVl9p/kEl+4fBhPAO95"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101
Firefox/127.0 SeaMonkey/2.53.8
Cancel-Lock: sha1:5PQ4FvjuY76ctVK9gES9o69t/FA=
In-Reply-To: <v6vco6$3v9nu$1@dont-email.me>
View all headers

Newyana2 wrote:
> On 7/13/2024 8:46 PM, Bill Powell wrote:
>> I have a series of one-page PDFs that are really images and not text even
>> though they look like they're just a page of simple text in the same
>> font.
>>
>> Is there a way to easily OCR a PDF to actual text on Windows for free?
>
>   I have a program called FreeOCR that will do it without having to scan
> or extract the pages. Quality depends on fonts, words, etc, but general
> it comes out well.

+1

Subject: Re: OCR on Windows
From: Stan Brown
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Oak Road Systems
Date: Sun, 14 Jul 2024 05:45 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: the_stan_brown@fastmail.fm (Stan Brown)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sat, 13 Jul 2024 22:45:38 -0700
Organization: Oak Road Systems
Lines: 33
Message-ID: <MPG.40fd05da559d2e4b99030b@news.individual.net>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 6F3Vrb/6KSTtSbMmqXflRwrgZt0NzUq3YAsYkNnS6wYSJjHR5a
Cancel-Lock: sha1:i4kMce+6RvRpYWR0Hwlhq7tnlR4= sha256:nQJ/VgxGbfGCOIy25N5RCH8rDRqv8sJC6NGFYlNJWxU=
User-Agent: MicroPlanet-Gravity/3.0.11 (GRC)
View all headers

On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell wrote:
>
> I have a series of one-page PDFs that are really images and not text even
> though they look like they're just a page of simple text in the same font.
>
> Is there a way to easily OCR a PDF to actual text on Windows for free?

OPTION A (if you have OneNote, which is part of MS Office):

1. Paste the image into OneNote.
2. Right-click into the pasted image and select "Copy text from
picture".
3. In your favorite text editor, press Ctrl+V to paste the text.
4. Proofread and make any needed corrections.

I have Office 2010, not Office 365, but I believe OneNote is included
in Office 365.

OPTION B:

Which PDF reader are you using? PDF-Xchange (free) has a menu
selection to perform OCR, putting the text as an extra layer in the
PDF. You can then copy the text from the PDF and paste it into your
editor.

And I'm sure there are other free PDF viewers that have OCR
capability, though PDF-Xchange is the only one I use.

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

Subject: Re: OCR on Windows
From: Stan Brown
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Oak Road Systems
Date: Sun, 14 Jul 2024 05:58 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: the_stan_brown@fastmail.fm (Stan Brown)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sat, 13 Jul 2024 22:58:17 -0700
Organization: Oak Road Systems
Lines: 24
Message-ID: <MPG.40fd08d652bb667199030c@news.individual.net>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <jlj69j558op4ftd36g1fjj8b1507f0av38@4ax.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net N+S2JrKH70vIT+wZWiIZAQSthFv9jGT1CqW8J+j7ISnXiUs1sL
Cancel-Lock: sha1:ya0VReq9tiBV52N18NUCDrtOB3E= sha256:LeKqcQitiH+Ql3D9w1tJOEmcSudUwhN74Bxjb6XuG9M=
User-Agent: MicroPlanet-Gravity/3.0.11 (GRC)
View all headers

On Sat, 13 Jul 2024 21:06:54 -0700, cable_shill@comcast.net wrote:

> On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell <bill@anarchists.org>
> wrote:
>
> >I have a series of one-page PDFs that are really images and not text even
> >though they look like they're just a page of simple text in the same font.
> >
> >Is there a way to easily OCR a PDF to actual text on Windows for free?
>
> Windows Power Toys - Text extractor.

You forgot to give the URL:
https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

That one says it's "based on Joe Finney's TextGrab", and links to
https://github.com/TheJoeFin/Text-Grab

Has anyone tried both, and can speak to whether one does a better job
of text extraction than the other?

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

Subject: Re: OCR on Windows
From: Jeff Barnett
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 06:35 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jbb@notatt.com (Jeff Barnett)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 00:35:44 -0600
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <v6vrk6$1clv$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
<v6vco6$3v9nu$1@dont-email.me> <6bf69j98eeh9ra8pj8ftqv1hlaeqjikf9k@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 08:35:50 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="42e0fd07f0799fba91e29e856728bfcc";
logging-data="45759"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/qIY2XDB/knZPjmoStRiBUt2+EfYqDwYc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:/WVJ3Upw2SbIFf851iqhmOOhblo=
In-Reply-To: <6bf69j98eeh9ra8pj8ftqv1hlaeqjikf9k@4ax.com>
X-Antivirus: AVG (VPS 240713-4, 7/13/2024), Outbound message
Content-Language: en-US
X-Antivirus-Status: Clean
View all headers

On 7/13/2024 8:52 PM, micky wrote:
> In alt.comp.os.windows-10, on Sat, 13 Jul 2024 22:22:11 -0400, Newyana2
> <newyana@invalid.nospam> wrote:
>
>> On 7/13/2024 8:46 PM, Bill Powell wrote:
>>> I have a series of one-page PDFs that are really images and not text even
>>> though they look like they're just a page of simple text in the same font.
>>>
>>> Is there a way to easily OCR a PDF to actual text on Windows for free?
>>
>> I have a program called FreeOCR that will do it without having to scan
>> or extract the pages. Quality depends on fonts, words, etc, but general
>> it comes out well.
>
> http://www.freeocr.net/

Several pointers embedded at the URL above elicit "blacklisted site"
messages from AVG.

> http://www.paperfile.net/
> https://www.google.com/search?client=firefox-b-1-d&q=FreeOCR
--
Jeff Barnett

Subject: Re: OCR on Windows
From: Herbert Kleebauer
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 07:25 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: klee@unibwm.de (Herbert Kleebauer)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 09:25:09 +0200
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <v6vugl$1lsq$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 09:25:10 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="cc6708673bed4727d3e78fa3d490b4e4";
logging-data="55194"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ThHVOu24o6EmLxqZYBuGMlxlwoTN+CpI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:W+b7qRMUm35z9IWFgyoaOzHUIkU=
In-Reply-To: <v6v74c$80bq$1@matrix.hispagatos.org>
Content-Language: en-US
View all headers

On 14.07.2024 02:46, Bill Powell wrote:

> I have a series of one-page PDFs that are really images and not text even
> though they look like they're just a page of simple text in the same font.
>
> Is there a way to easily OCR a PDF to actual text on Windows for free?

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

Or you can use Firefox to display the pdf and and use an OCR
plug-in.

Subject: Re: OCR on Windows
From: knuttle
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 10:54 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: keith_nuttle@yahoo.com (knuttle)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 06:54:16 -0400
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <v70aoo$3pl7$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
<v6vugl$1lsq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 12:54:17 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="fd7728d3cbf33ad0b5c2aa13e4e8aa83";
logging-data="124583"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19uiSXrCwuNJhmZpIFsZcko"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:KSSBHGmO0J03MIuUhZlHrFZLlvA=
In-Reply-To: <v6vugl$1lsq$1@dont-email.me>
Content-Language: en-US
View all headers

On 07/14/2024 3:25 AM, Herbert Kleebauer wrote:
> On 14.07.2024 02:46, Bill Powell wrote:
>
>> I have a series of one-page PDFs that are really images and not text even
>> though they look like they're just a page of simple text in the same
>> font.
>>
>> Is there a way to easily OCR a PDF to actual text on Windows for free?
>
> For only a few lines of text you can use the Snipping Tool: press
> <WIN><SHIFT>S and select the part of the screen with the text.
> When the Snipping Tool opens, select the OCR function.
>
> Or you can use Firefox to display the pdf and and use an OCR
> plug-in.
>
I use Irfanveiw for all my image and OCR projects.

You need Irfanview and the OCR plugin.

Open the PDF file in Irfanvieiw, high lite the text and activate the
OCR function.

Subject: Re: OCR on Windows
From: Newyana2
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 12:45 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: newyana@invalid.nospam (Newyana2)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 08:45:02 -0400
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <v70h7v$5566$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
<v6vco6$3v9nu$1@dont-email.me> <6bf69j98eeh9ra8pj8ftqv1hlaeqjikf9k@4ax.com>
<v6vrk6$1clv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 14:44:48 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a0598b6a1b4577e9a5ee2d9fd1cc1424";
logging-data="169158"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/x7228Uo/aiksPMcUuulPdh56ibfu3oEY="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.3.1
Cancel-Lock: sha1:akC638F/D2Nk2I6n7bHo+QJgzqM=
In-Reply-To: <v6vrk6$1clv$1@dont-email.me>
Content-Language: en-US
View all headers

On 7/14/2024 2:35 AM, Jeff Barnett wrote:

> Several pointers embedded at the URL above elicit "blacklisted site"
> messages from AVG.

I should have posted the URL. freeocr.net is just a listing site.
paperfile.net is the host of FreeOCR.

I researched this awhile back. I'd been using something that I'd got
from a magazine CD in the late 90s and it actually worked pretty well.
Textbridge Pro. (Along with Lotus WordPro 95. Those magazine CDs
served me well.)

But I decided to look around for something more up-to-date because
I sometimes want to convert things like photo-PDFs to plain text.

FreeOCR seems to be simple, quick and no-nonsense. It saves the step
of having to extract images from PDFs. The only down
side is that it came out in early Win10 days and it has a kiddie interface
with a silly fading window at close, with no option to change that.
However... it might be Fischer-Price, but it works. :)

There's an explanation at the site. If I remember correctly, the system
it uses is OSS and while there are newer versions, I didn't find anything
else that was all put together. What I mean is that you can find more recent
updates of the Tesseract OCR code, https://github.com/tesseract-ocr,
but it's OSS that's hard to find as finished software.

The program seems to be a fairly simple .Net wrapper around a compiled
EXE version of Tesseract, but it's well designed, making Tesseract usable
and convenient.

Subject: Re: OCR on Windows
From: Newyana2
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 13:04 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: newyana@invalid.nospam (Newyana2)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 09:04:33 -0400
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <v70icj$5c3b$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
<MPG.40fd05da559d2e4b99030b@news.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 15:04:19 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a0598b6a1b4577e9a5ee2d9fd1cc1424";
logging-data="176235"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18J42+6vBX0LuhZVDXl41LMxXHJIL4cbCo="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.3.1
Cancel-Lock: sha1:uvYnperDVfTadTsktDVoKjY8rJ0=
Content-Language: en-US
In-Reply-To: <MPG.40fd05da559d2e4b99030b@news.individual.net>
View all headers

On 7/14/2024 1:45 AM, Stan Brown wrote:

> And I'm sure there are other free PDF viewers that have OCR
> capability, though PDF-Xchange is the only one I use.
>

I also use PDFXV free and love it. I had to get a new version
for Win10. Build 322.10. Lucky it was stil available free. My older
version on XP didn't work right on 10.

PDFXV is quick, does search well, allows me to edit PDFs by
extracting pages as images and pasting them in that way...
I've done my taxes that way -- both fillable forms and non-fillable.
And the whole thing is about 25 MB.

I think Adobe's monstrosity
Reader is something like 300+ MB these days. I went to take a
look, but their version has become even more creepy than before.
First, Adobe wouldn't load a webpage without script, which I didn't
want to enable. Then I found through Major Geeks that the current
version is ad-supported. So I'm guessing they want people to sign
up so they can target the ads... Just when I thought Adobe couldn't
get any more creepy.

I'd never noticed the OCR function in PDFXV. It's not very intuitive,
but it seems to work. I finally figured out that I needed to pick the
selection tool, select all, then copy, to get the converted text.

Subject: Re: OCR on Windows
From: micky
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Tweaknews
Date: Sun, 14 Jul 2024 14:09 UTC
References: 1 2 3 4 5
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!feeder.cambriumusenet.nl!feed.tweaknews.nl!posting.tweaknews.nl!fx14.ams1.POSTED!not-for-mail
From: NONONOmisc07@fmguy.com (micky)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Message-ID: <5sm79jhl6kr5urcqkrapkfmla2a5mfo659@4ax.com>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <v6vco6$3v9nu$1@dont-email.me> <6bf69j98eeh9ra8pj8ftqv1hlaeqjikf9k@4ax.com> <v6vrk6$1clv$1@dont-email.me> <v70h7v$5566$1@dont-email.me>
X-Newsreader: Forte Agent 5.00/32.1171
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Antivirus: AVG (VPS 240714-4, 7/14/2024), Outbound message
X-Antivirus-Status: Clean
Lines: 40
X-Complaints-To: abuse@tweaknews.nl
NNTP-Posting-Date: Sun, 14 Jul 2024 14:11:49 UTC
Organization: Tweaknews
Date: Sun, 14 Jul 2024 10:09:26 -0400
X-Received-Bytes: 2823
View all headers

In alt.comp.os.windows-10, on Sun, 14 Jul 2024 08:45:02 -0400, Newyana2
<newyana@invalid.nospam> wrote:

>On 7/14/2024 2:35 AM, Jeff Barnett wrote:
>
>> Several pointers embedded at the URL above elicit "blacklisted site"
>> messages from AVG.
>
> I should have posted the URL. freeocr.net is just a listing site.
>paperfile.net is the host of FreeOCR.

And it doesn't mention win10 or 11. I can assume you've been using it
with one of those two.

I thought of just installing it to see if it works, but who knows, maybe
installing old, no longer compaitble software could mess up my OS??

> I researched this awhile back. I'd been using something that I'd got
>from a magazine CD in the late 90s and it actually worked pretty well.
>Textbridge Pro. (Along with Lotus WordPro 95. Those magazine CDs
>served me well.)
>
> But I decided to look around for something more up-to-date because
>I sometimes want to convert things like photo-PDFs to plain text.
>
> FreeOCR seems to be simple, quick and no-nonsense. It saves the step
>of having to extract images from PDFs. The only down
>side is that it came out in early Win10 days and it has a kiddie interface
>with a silly fading window at close, with no option to change that.
>However... it might be Fischer-Price, but it works. :)
>
> There's an explanation at the site. If I remember correctly, the system
>it uses is OSS and while there are newer versions, I didn't find anything
>else that was all put together. What I mean is that you can find more recent
>updates of the Tesseract OCR code, https://github.com/tesseract-ocr,
>but it's OSS that's hard to find as finished software.
>
> The program seems to be a fairly simple .Net wrapper around a compiled
>EXE version of Tesseract, but it's well designed, making Tesseract usable
>and convenient.

Subject: Re: OCR on Windows
From: Enrico Papaloma
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Gegeweb News Server
Date: Sun, 14 Jul 2024 19:57 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.gegeweb.eu!gegeweb.org!.POSTED.public-nat-06.vpngate.v4.open.ad.jp!not-for-mail
From: enrico@papaloma.net (Enrico Papaloma)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 21:57:02 +0200
Organization: Gegeweb News Server
Message-ID: <v71aie$2j22$1@news.gegeweb.eu>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <MPG.40fd05da559d2e4b99030b@news.individual.net>
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Jul 2024 19:57:03 -0000 (UTC)
Injection-Info: news.gegeweb.eu; posting-account="adibella@usenet.local"; posting-host="public-nat-06.vpngate.v4.open.ad.jp:219.100.37.238";
logging-data="85058"; mail-complaints-to="abuse@gegeweb.eu"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.6.1?Content-Type: text/plain; charset=UTF-8; format=flowed
Cancel-Lock: sha256:qoSEl2QXXApXrv6Pv7QHiNNo19PAwJK5T0n6woI3SeM=
Content-Language: en-US
View all headers

On 7/14/2024 7:45 AM, Stan Brown wrote:
> And I'm sure there are other free PDF viewers that have OCR
> capability, though PDF-Xchange is the only one I use.

Which of these three files is the one with the OCR?
https://pdf-xchange.eu/DL/pdf-xchange-editor.htm

Download PDF-XChange Editor/Plus (32/64 Bit Version) (as ZIP File)
Download PDF-XChange Editor PORTABLE (32/64 Bit Version) (as ZIP File)
Download PDF-XChange Editor PORTABLE ohne OCR (32/64 Bit Version) (as ZIP File)

It says "ohne OCR". What does "ohne" mean anyway?
Also, it says it puts a watermark in all files - does it do that for OCR?

Subject: Re: OCR on Windows
From: david
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: i2pn2 (i2pn.org)
Date: Sun, 14 Jul 2024 20:01 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: this@is.invalid (david)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 14:01:01 -0600
Organization: i2pn2 (i2pn.org)
Message-ID: <bf7b7ec8d39404aa0972f434ffecb03459b91047@i2pn2.org>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <MPG.40fd05da559d2e4b99030b@news.individual.net> <v70icj$5c3b$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Jul 2024 20:01:02 -0000 (UTC)
Injection-Info: i2pn2.org;
logging-data="3290117"; mail-complaints-to="usenet@i2pn2.org";
posting-account="CaHBDtkhV1D5Bt+NHXWn2/AL80wOBYc5Yj9RDiDOZCs";
User-Agent: Unison/2.1.10
X-Spam-Checker-Version: SpamAssassin 4.0.0
View all headers

Using <news:v70icj$5c3b$1@dont-email.me>, Newyana2 wrote:

> I also use PDFXV free and love it. I had to get a new version
> for Win10. Build 322.10. Lucky it was stil available free. My older
> version on XP didn't work right on 10.

I can't find any download for PDFXV.
https://www.google.com/search?q=windows+%2Bpdfxv+download

Subject: Re: OCR on Windows
From: Isaac Montara
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 20:11 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: IsaacMontara@nospam.com (Isaac Montara)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 16:11:53 -0400
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <v71be9$9sig$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <v6vugl$1lsq$1@dont-email.me> <v70aoo$3pl7$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Jul 2024 22:11:55 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f7a2df8ef12f7f66726689b228b9f0e8";
logging-data="324176"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18p/lU/RAuQa64sPBZcUVal"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:102.0) Gecko/20100101 Thunderbird/102.10.0
Cancel-Lock: sha1:xh/0zMPuemm8NWVzW/ncp+Umz4A=
Content-Language: en-US
View all headers

On Sun, 14 Jul 2024 06:54:16 -0400, knuttle wrote:

> I use Irfanveiw for all my image and OCR projects.
>
> You need Irfanview and the OCR plugin.
>
> Open the PDF file in Irfanvieiw, high lite the text and activate the
> OCR function.

Nice! Once you figure it out, Irfanview with the plugin is great!

I opened a scanned-page bitmap PDF image in Irfanview.
Irfanview:File > Open > scan.jpg
Irfanview:Options > Start OCR...(Plugin)
This opened up the page of bitmap text in yellow highlight at the left.
At the right of the full-size display was a bunch of buttons.
None of them was a copy command.

The plugin appears to be a KADMOS Recognition Engine, version 4.4y but all
I want is a way to copy the highlighted text inside the bitmap image.

The text is yellow. But you can't copy it to your clipboard. Or save it.

It took a good couple of minutes of futzing around before I realized what
you have to do is use your left mouse button as if you're going to crop
something and choose a box from top left of the text to top right.

The instant you "crop" out that text, you get a "KADMOS recognition
results" window popping up, with the OCR results in now-selectable text.

The results looked accurate in the one test I just gave it just now.

Subject: Re: OCR on Windows
From: Nick Cine
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Date: Sun, 14 Jul 2024 20:26 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: nickcine@is.invalid (Nick Cine)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 14:26:47 -0600
Message-ID: <v71ca7$jqp2$1@solani.org>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <v6vco6$3v9nu$1@dont-email.me> <v6vk3o$blt$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Jul 2024 20:26:48 -0000 (UTC)
Injection-Info: solani.org;
logging-data="650018"; mail-complaints-to="abuse@news.solani.org"
User-Agent: Usenapp/0.93/l for MacOS - Full License
Cancel-Lock: sha1:yHG+v7DTqAN5WvpAEHDOloTsUJc=
X-User-ID: eJwFwQkBwDAIA0BL5QkUOQwa/xJ2BwuJSQ+Eg+Bk0Ev6M7BXqznQ8oSYrB7LutPO7fPy3ZEfHocRNA==
View all headers

On Sat, 13 Jul 2024 23:23:55 -0500, Paul in Houston TX wrote:

>> � I have a program called FreeOCR that will do it without having to scan
>> or extract the pages. Quality depends on fonts, words, etc, but general
>> it comes out well.
>
> +1

There is a GNU OCR engine called "GOCR" (or sometimes JOCR) out there.
https://jocr.sourceforge.net/
There's no mention it uses the modern Tesseract scan engine though.
Which may be why it makes so many errors that it's not really useful.

What you want is to invoke the Tessseract scan engine directly somehow.

There is a way to invoke the Tesseract scan engine directly, but I don't
know how to do it. Much like most of the youtube downloading GUIs run the
yt-dlp command-line tool under the covers, most of the OCRs tools run the
command line for Tesseract under the sheets.

The question then would be how to run the Tesseract OCR engine directly?

Subject: Re: OCR on Windows
From: Bill Powell
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Hispagatos.org
Date: Sun, 14 Jul 2024 20:37 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.hispagatos.org!.POSTED!not-for-mail
From: bill@anarchists.org (Bill Powell)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 22:37:25 +0200
Organization: Hispagatos.org
Message-ID: <v71cu5$9i71$1@matrix.hispagatos.org>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <v6vco6$3v9nu$1@dont-email.me> <v6vk3o$blt$1@dont-email.me> <v71ca7$jqp2$1@solani.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-15"; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Jul 2024 20:37:26 -0000 (UTC)
Injection-Info: matrix.hispagatos.org;
logging-data="313569"; mail-complaints-to="abuse@hispagatos.org"
User-Agent: XanaNews/1.19.1.372 (x86; Portable ISpell)
View all headers

On Sun, 14 Jul 2024 14:26:47 -0600, Nick Cine wrote:

> There is a GNU OCR engine called "GOCR" (or sometimes JOCR) out there.
> https://jocr.sourceforge.net/
> There's no mention it uses the modern Tesseract scan engine though.

I had tried the GNU OCR command line before opening the thread.
http://www-e.uni-magdeburg.de/jschulen/ocr/gocr049.exe
Name: gocr049.exe
Size: 153600 bytes (150 KiB)
SHA256: 1FFC4CD29A5B275F40FBC5F6F9194ED72B8D2BCCBD46019F088C9E5DE2923F59

It makes so many spelling errors that it would be easier to type the text
out by hand - which is why I opened this thread to find an OCR that worked.

Looking up the hints you gave me, I think there are many potential Linux,
Mac, Windows, Android & iOS OCR scanning candidates in this github table.
https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty.html

What is a bit disconcertingly strange is that of all the tools mentioned so
far in this thread, none of them show up in that table and yet that table
has dozens of tools that do OCR so I'm not sure why none of the mentioned
tools showed up.

Subject: Re: OCR on Windows
From: Jan K.
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: news.chmurka.net
Date: Sun, 14 Jul 2024 20:44 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.chmurka.net!.POSTED.public-nat-08.vpngate.v4.open.ad.jp!not-for-mail
From: janicekoziol@nie.ma.spamu.prosze.com (Jan K.)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 22:44:51 +0200
Organization: news.chmurka.net
Message-ID: <v71dc3$ili$1@news.chmurka.net>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <jlj69j558op4ftd36g1fjj8b1507f0av38@4ax.com> <MPG.40fd08d652bb667199030c@news.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 20:44:52 -0000 (UTC)
Injection-Info: news.chmurka.net; posting-account="koziolja"; posting-host="public-nat-08.vpngate.v4.open.ad.jp:219.100.37.240";
logging-data="19122"; mail-complaints-to="abuse-news.(at).chmurka.net"
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:45.0) Gecko/20100101 Thunderbird/45.7.1
Cancel-Lock: sha1:4dO7FGcS5wRlgiDnEcHOCHmsawI= sha256:9PNST+LGkHZLz26SNCGusE9PyArtMwijuHzxX3aEb4M=
sha1:hshMON//O7pssITpe1A96yh42Vw= sha256:uk6xCypod10glNNHA2sSBx/9RHbvKuX8kVgzR/gS2v8=
View all headers

W Sat, 13 Jul 2024 22:58:17 -0700, Stan Brown napisal:

>> Windows Power Toys - Text extractor.
>
> You forgot to give the URL:
> https://learn.microsoft.com/en-us/windows/powertoys/text-extractor
>
> That one says it's "based on Joe Finney's TextGrab", and links to
> https://github.com/TheJoeFin/Text-Grab
>
> Has anyone tried both, and can speak to whether one does a better job
> of text extraction than the other?

I've tried something similar to Microsoft Office for OCR on Windows.
What I tried was a MS Office clone called WPS Office, which I found here.
https://www.wps.com/office/pdf/

The company appears to be "Kingsoft" and their webstubb installer is here.
https://wdl1.pcfg.cache.wpscdn.com/wpsdl/wpsoffice/onlinesetup/distsrc/600.1022/wpsinst/wps_office_inst.exe

Name: wps_lid.lid-u8MZl7zT7a0C.exe
Size: 5864848 bytes (5727 KiB)
SHA256: 81E09F93F6B1C7F9488D912CFD82560D978262CB75ECF7B7953403A8A706259B

Since that looks scary, I ran it by a virustotal which cleared it clean.
https://www.virustotal.com/gui/file/81e09f93f6b1c7f9488d912cfd82560d978262cb75ecf7b7953403a8a706259b

You have to be careful as it will change your PDF defaults.
Select "Custom Settings" (not "Install Now").
Change from:
[x] Use WPS Office to open pdf files by default
[x] Use WPS Office as the default program for documents
[x] Use WPS Photos to open JPG, PNG, and other image formats by default

Change to:
[_] Use WPS Office to open pdf files by default
[_] Use WPS Office as the default program for documents
[_] Use WPS Photos to open JPG, PNG, and other image formats by default

Then hit the big blue "Install Now" button.
It will say "Downloading WPS Office" so you know it was just a stub.

It will create a wps_download directory containing:
Name: 132ca6c802422ed94a59d10cbcc9f47b-15_setup_XA_mui_Free.exe.600.1022.exe
Size: 244193632 bytes (232 MiB)
SHA256: B6B462DCDA4578D716E207D9747D391597110EC8F4A22C9AC29417E68A86A525

After taking forever downloading & installing WPS Office,
WPS Office will try to trick you into installing "360 Total Security".
Do not select the box [_]Yes, I agree to install 360 Total Security...
Click the big blue box "Get Started with WPS".

Start WPS Office and click away the sell-up advertising.
Tools > PDF OCR > Select File > filename.pdf > Perform OCR > Sign in

You have to sign in to what in order to convert a PDF to OCR with WPS.
I guess in the end it's maybe an online converter - but it's hard to tell.
I didn't create an account so I never was able to find out how it works.

All I know is it's a Microsoft Office clone that says it does OCR for free.

Subject: Re: OCR on Windows
From: Wolf Greenblatt
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Private News Server
Date: Sun, 14 Jul 2024 20:50 UTC
References: 1 2 3 4 5 6
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.hispagatos.org!news.samoylyk.net!.POSTED.public-nat-14.vpngate.v4.open.ad.jp!not-for-mail
From: wolf@greenblatt.net (Wolf Greenblatt)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 16:50:37 -0400
Organization: Private News Server
Message-ID: <v71dmt$2l17d$1@news.samoylyk.net>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <v6vco6$3v9nu$1@dont-email.me> <6bf69j98eeh9ra8pj8ftqv1hlaeqjikf9k@4ax.com> <v6vrk6$1clv$1@dont-email.me> <v70h7v$5566$1@dont-email.me> <5sm79jhl6kr5urcqkrapkfmla2a5mfo659@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 20:50:38 -0000 (UTC)
Injection-Info: news.samoylyk.net; posting-host="public-nat-14.vpngate.v4.open.ad.jp:219.100.37.246";
logging-data="2786541"; mail-complaints-to="abuse@samoylyk.net"
View all headers

On Sun, 14 Jul 2024 10:09:26 -0400, micky wrote:

>> I should have posted the URL. freeocr.net is just a listing site.
>>paperfile.net is the host of FreeOCR.
>
> And it doesn't mention win10 or 11. I can assume you've been using it
> with one of those two.
>
> I thought of just installing it to see if it works, but who knows, maybe
> installing old, no longer compaitble software could mess up my OS??

There's something called Simple OCR https://www.simpleocr.com/download/
which says it's free but I've never tried it so I can't vouch for it.

Subject: Re: OCR on Windows
From: Big Al
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: A noiseless patient Spider
Date: Sun, 14 Jul 2024 20:54 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: alan@invalid.com (Big Al)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Sun, 14 Jul 2024 16:54:22 -0400
Organization: A noiseless patient Spider
Lines: 67
Message-ID: <v71dtu$99ls$1@dont-email.me>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
<jlj69j558op4ftd36g1fjj8b1507f0av38@4ax.com>
<MPG.40fd08d652bb667199030c@news.individual.net>
<v71dc3$ili$1@news.chmurka.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 14 Jul 2024 22:54:22 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="5c0294ceb01db16ee3b53259107412a3";
logging-data="304828"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+hgZ50TDb5yfjdWMm5D6yHCugVDRFDHko="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:iabrXo7h+z9hXofj9eOMIGafuj4=
Content-Language: en-US
In-Reply-To: <v71dc3$ili$1@news.chmurka.net>
View all headers

On 7/14/24 04:44 PM, Jan K. wrote:
> W Sat, 13 Jul 2024 22:58:17 -0700, Stan Brown napisal:
>
>>> Windows Power Toys - Text extractor.
>>
>> You forgot to give the URL:
>> https://learn.microsoft.com/en-us/windows/powertoys/text-extractor
>>
>> That one says it's "based on Joe Finney's TextGrab", and links to
>> https://github.com/TheJoeFin/Text-Grab
>>
>> Has anyone tried both, and can speak to whether one does a better job of text extraction than the
>> other?
>
> I've tried something similar to Microsoft Office for OCR on Windows.
> What I tried was a MS Office clone called WPS Office, which I found here.
> https://www.wps.com/office/pdf/
>
> The company appears to be "Kingsoft" and their webstubb installer is here.
> https://wdl1.pcfg.cache.wpscdn.com/wpsdl/wpsoffice/onlinesetup/distsrc/600.1022/wpsinst/wps_office_inst.exe
>
> Name: wps_lid.lid-u8MZl7zT7a0C.exe
> Size: 5864848 bytes (5727 KiB)
> SHA256: 81E09F93F6B1C7F9488D912CFD82560D978262CB75ECF7B7953403A8A706259B
>
> Since that looks scary, I ran it by a virustotal which cleared it clean.
> https://www.virustotal.com/gui/file/81e09f93f6b1c7f9488d912cfd82560d978262cb75ecf7b7953403a8a706259b
>
> You have to be careful as it will change your PDF defaults.
> Select "Custom Settings" (not "Install Now").
> Change from:
> [x] Use WPS Office to open pdf files by default
> [x] Use WPS Office as the default program for documents
> [x] Use WPS Photos to open JPG, PNG, and other image formats by default
>
> Change to:
> [_] Use WPS Office to open pdf files by default
> [_] Use WPS Office as the default program for documents
> [_] Use WPS Photos to open JPG, PNG, and other image formats by default
>
> Then hit the big blue "Install Now" button.
> It will say "Downloading WPS Office" so you know it was just a stub.
>
> It will create a wps_download directory containing:
> Name: 132ca6c802422ed94a59d10cbcc9f47b-15_setup_XA_mui_Free.exe.600.1022.exe
> Size: 244193632 bytes (232 MiB)
> SHA256: B6B462DCDA4578D716E207D9747D391597110EC8F4A22C9AC29417E68A86A525
>
> After taking forever downloading & installing WPS Office,
> WPS Office will try to trick you into installing "360 Total Security".
> Do not select the box [_]Yes, I agree to install 360 Total Security...
> Click the big blue box "Get Started with WPS".
>
> Start WPS Office and click away the sell-up advertising.
> Tools > PDF OCR > Select File > filename.pdf > Perform OCR > Sign in
>
> You have to sign in to what in order to convert a PDF to OCR with WPS.
> I guess in the end it's maybe an online converter - but it's hard to tell.
> I didn't create an account so I never was able to find out how it works.
>
> All I know is it's a Microsoft Office clone that says it does OCR for free.
Years ago I used and really liked Kingsoft. Then LibreOffice got better and I switched. But
Kingsoft did a great job (or good) reading/writing MS Word stuff.
--
Linux Mint 21.3, Cinnamon 6.0.4, Kernel 5.15.0-113-generic
Al

Subject: Re: OCR on Windows
From: Joerg Walther
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: Easynews - www.easynews.com
Date: Mon, 15 Jul 2024 08:10 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx16.iad.POSTED!not-for-mail
From: joerg.walther@magenta.de (Joerg Walther)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Message-ID: <0am99j948eg65l5khojo9dnkhocb4uf59o@joergwalther.my-fqdn.de>
References: <v6v74c$80bq$1@matrix.hispagatos.org> <MPG.40fd05da559d2e4b99030b@news.individual.net> <v71aie$2j22$1@news.gegeweb.eu>
X-Newsreader: Forte Agent 6.00/32.1186
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 13
X-Complaints-To: abuse@easynews.com
Organization: Easynews - www.easynews.com
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Mon, 15 Jul 2024 10:10:05 +0200
X-Received-Bytes: 1210
View all headers

Enrico Papaloma wrote:

>Download PDF-XChange Editor/Plus (32/64 Bit Version) (as ZIP File)
>Download PDF-XChange Editor PORTABLE (32/64 Bit Version) (as ZIP File)
>Download PDF-XChange Editor PORTABLE ohne OCR (32/64 Bit Version) (as ZIP File)
>
>It says "ohne OCR". What does "ohne" mean anyway?

Ohne is German,meaning "without".

-jw-
--
And now for something completely different...

Subject: Re: OCR on Windows
From: Jim the Geordie
Newsgroups: alt.comp.os.windows-10, alt.comp.os.windows-10, comp.text.pdf
Organization: To protect and to server
Date: Mon, 15 Jul 2024 18:16 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!newsfeed.bofh.team!paganini.bofh.team!not-for-mail
From: jim@jimXscott.co.uk (Jim the Geordie)
Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-10,comp.text.pdf
Subject: Re: OCR on Windows
Date: Mon, 15 Jul 2024 19:16:58 +0100
Organization: To protect and to server
Message-ID: <v73p2r$28m8q$2@paganini.bofh.team>
References: <v6v74c$80bq$1@matrix.hispagatos.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 15 Jul 2024 18:16:59 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="2382106"; posting-host="6K5//5YBz49XwRKkX71Erg.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
User-Agent: MicroPlanet-Gravity/3.0.4
X-Notice: Filtered by postfilter v. 0.9.3
View all headers

In article <v6v74c$80bq$1@matrix.hispagatos.org>, bill@anarchists.org
says...
>
> I have a series of one-page PDFs that are really images and not text even
> though they look like they're just a page of simple text in the same font.
>
> Is there a way to easily OCR a PDF to actual text on Windows for free?

Just come over this post.
Has anyone mentioned ABBYY FineReader?
I use it all the time.
Saves to Word and PDF with no problems.

--
Jim the Geordie

Pages:12

rocksolid light 0.9.8
clearnet tor