• Automatic Upload & Indexing

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #41684  by KAD
 
Hello,

I'm new to OpenKM but I'm very impressed so far.

Before I go to the trouble of downloading & installing the software, I have a quick question to see if it has the functionality I require.

I want to be able to place my PDF files (which have already been OCR'd) into a folder where the OpenKM installation will watch. As soon as the file lands, I would want the software to import the file(s) into the database, index the text contained within and place the file in a folder relevant to a key word found in the text.

For example if I scanned a bank statement, the software would find this new file and import it into the database (deleting the original in the process). It would then index all the text within the file so it can be searched for later. It would also look for a key word (the name of the bank in this example) and then place the file into the Bank folder in the software.

This means, to find the file, I could either search for any word in that document or I could just drill down to the folder and find it that way.

I would really appreciate if you could tell me if this is possible with OpenKM and if so, some instructions on how best to achieve it.

Thanks in advanced for your help.

Cheers, Keith.
 #41703  by jllort
 
There are several ways for doing it. You can do everything in a crontab task or in combination with crontab task and automation action.

From here you can get some inspiration http://wiki.openkm.com/index.php/Utilities ( specially the crontab importers ). About automation actions you can start reading from here http://wiki.openkm.com/index.php/Automation

Also take a look at the openkm portable edition what comes with some samples https://sourceforge.net/projects/openkmportabledev/
 #41718  by KAD
 
Hello!

Thank you very much for your response!

I have now set up an OpenKM instance and I'm really impressed so far.

I have already set up the automated import using the Cron job and a scheduled backup of the entire system using a simple batch file.

The one thing I am unable to do is move the imported files about based on words in the contents of the document. I have been able to set up an Automation job which triggers after the Text Extraction job is complete, but these automated tasks only seem to be able to do a lookup of the Keywords associated with the document and not the contents of the document itself.

Can you please explain how I would be able to move the files into a different folder based on key words found in the extracted text after the extraction job has run.

Thanks again for your help.

Regards, Keith.
 #41727  by jllort
 
If text is yet extracted you can use something like that:
Code: Select all
// Get extracted text
NodeDocument docNode = NodeDocumentDAO.getInstance().findByPk(uuid);
String text = docNode.getText();

if (text == null) {
    text = "";
} else {
    text = text.toLowerCase();
}
or force extracting:
Code: Select all
// Document extractor
TextExtractorWork tew = new TextExtractorWork();
tew.setDocUuid(uuid);
tew.setDocPath(docPath);
tew.setDocVerUuid(docVerUuuid);
 
// Execute extractor
NodeDocumentDAO.getInstance().textExtractorHelper(tew);

About move to another location, based on your own logic should set the final path and then create the final location and move there
Code: Select all
OKMFolder.getInstance.createMissingFolders(null, finalPath );
OKMDocument.getIntance.move(null, uuid, finalPath);
 #41735  by KAD
 
Hello,

Thanks again for your help, but I'm afraid I'm still struggling to understand.

To move the documents, I had hoped that I could set a list of Automation Tasks with these properties:
Code: Select all
Name : Move to folder xxx
Event : Text Extraction
Validation : Body text contains value yyy
Action : Run script to move the file to folder xxx
I would have to have an automation task for every folder which I wish to move documents into. Every time a document has it's text extracted, the list of rules would be run which would move the document to the correct folder. This means the text has already been extracted from the PDF before the tasks are run.

The problem I have is that there is no Validation task to check against the body text, only HasKeyWord which none of my documents have.

I assume that I need to remove the validation section so the rule always triggers and write some code to look up the contents of the document for the key words and if they are found move the document to the specified folder. This is where I'm stuck.

Can you please tell me how I would code this lookup of body text for a list of key words so I can insert it into the Execute Scripting action. I understand the command to move which you showed me previously, but I don't know how to check for the specified words before moving.

Also, if you think I am going about this the wrong way and there is a more efficient way to achieve the same results, I'd love to know.

I'm really impressed with your software, so if I can get this last stage working I will be delighted and I can show all my friends how good it is! :)

Thanks again, Keith.
 #41749  by jllort
 
You should create an action. Into this action - when the text is already extracted - must looking for your keywords and then decide the destination path. The logic for the destination path can be implemented in a single class for all the documents. Is not necessary create one for each case.

Also you can pass parameters to the actions if you want to have some dynamic behaviour of the action class. But what will be more easy for you is implementing the logic in the code into a single class ( I suppose you are not changing your catalog logic every day ? and if you change every day you could use external table to take control on it -> another way for doing the logic in the class more dynamic ).

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.