• Searches show no results

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #44387  by xtrailrunner
 
I have installed openKM 6.3.4 on top of Linux Mint 17.3.
As the next step I uploaded several documents into different folders with keywords, categories and notes (as user admin).
But I cannot see a preview of the documents and searches show no results.
I added German lexer to the OpenKM.cfg file and rebuild indexes but still no results when searching.
Am I missing something or should I use a user other than admin to upload the documents ?
Any advice is welcome.
Regards Juergen
 #44395  by jllort
 
If you are trying to search by contents, take in mind documents go into queue ( Administration / stats / pending extraction queue ) and are processed each 5 minutes ( cron tab task named "text extraction" takes control on it ).

Focus the attention in a single document, with this procedure can check text extraction process ( https://docs.openkm.com/kcenter/view/ok ... ction.html -> copy document uuid, paste there, and execute. With it you can check text extraction process and the extracted text ).

With database query you can check if a document is yet extracted or not ( https://docs.openkm.com/kcenter/view/ok ... query.html ), the query for it is:
Code: Select all
select * from OKM_NODE_DOCUMENT WHERE NBS_UUID ="HERE THE UUID OF YOUR DOCUMENT";
The field NDC_TEXT_EXTRACTED = 'T' or 'F' indicate if the document has been processed or not ( true / false )

About preview, please do not merge several topics in the same post, add a new topic for it, thanks.
 #45780  by xtrailrunner
 
Thanks a lot!
So far I have uploaded scanned documents in PDF format but obviously the PDF contains only images per page (which file type I don't know).

After adding Tesseract OCR engine to my configuration I could extract key words but still cannot find all documents using a specific keyword. So I used the function "check extraction" for a document I could not find. Because of the bad quality of the text extraction (German) the keyword was not identified in the text.
What should I do:
- replace the engine by another one
- add an additonal engine.
Will quality of extraction improve if I scan my documents to PNG or JPG ?
A disadvantage would be that a multi-page document would end up as multiple files to upload.
What recommendations you would give in such a situation ?
Regards Juergen
 #45783  by jllort
 
I suggest making some test.
1- Scan document with 300-600 as png ( then check the results ) also tiff multipage
2- When you succeed in the previous step you have two options -> try scanning directly to PDF or first to images and then to pdf

You can also check our scanner tool for it, take a look in our OpenKM download section https://www.openkm.com/en/download.html

When you discover the right format you can go ahead with all. About changing OCR engine, tesseract is the only what really works in open source world, with other OCR engine you should add some cost. We had make some test in the past with ocr4linux ( https://www.ocr4linux.com/en:start ) what goes really right, but where you have a restriction of the number of pages that can be processed ( it depends on your license ).
 #45785  by xtrailrunner
 
Thanks. OpenKM scanner seems to be available only for Windows. Because I'm using openKM on Linux I'm not sure if I could integrate it running the sanner in a virtual machine with Windows.
Regards Juergen
 #45791  by jllort
 
You can get OpenKM Working into VM although is not the best scenario. Configuring there the scanner is another history :)

Some clarification is not so important the tool as discover what is the right configuration for scanning, that's why I suggested making some testing with several image formats etc... to isolate what is the best configuration for you if exists one. You can begin from the top 600 ppp with png format. If the OCR engine does not work with this configuration, forget the open source and you mandatory must go ahead for commercial one.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.