• Jackrabbit repository datastore with Oracle db

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #30512  by mernst0
 
Hello,

I'm evaluating OpenKM 6.4.14 Professional Trial for possible use in my company. A special requirement is: for safety / security reasons everything has to be stored in oracle db. As far as I can see this means to openKM: Document Metadata and Jackrabbit datastore (document data). Q.: Are there any more? I already fiddled about how to store metadata in oracle, this seems to work fine (even with openKM 6.3 CE...). Then I created jackrabbit repository.xml, wich, I can see in catalina.log, is recognized by openKM. Creation of datastore tables failed, Idon't know why, so I created them manually with 4 oracle.ddl which I found in jackrabbit distribution. On catalina start, there are no errors in catalina.log, But when I try to log on, openKM gives me java.lang.NullPointerException and in tomcat console window I see

Exception in thread "Update Info" java.lang.NullPointerException
at com.openkm.api.OKMStats.getMailsByContext(Unknown Source)
at com.openkm.core.RepositoryInfo.runAs(Unknown Source)
at com.openkm.core.RepositoryInfo.run(Unknown Source)
at com.openkm.core.x.run(Unknown Source)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)

This is select table_name from user_tables:
Code: Select all
TABLE_NAME                   
------------------------------
FSENTRY                        
JOURNAL                        
GLOBAL_REVISION                
LOCAL_REVISIONS                
BUNDLE                         
REFS                           
BINVAL                         
NAMES                          
NODE                           
PROP                           
DATASTORE                      
This is repository.xml (masked connect info):
Code: Select all
<?xml version="1.0"?>
<!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 1.6//EN"
                            "http://jackrabbit.apache.org/dtd/repository-1.6.dtd">
<Repository>
    <!-- virtual file system where the repository stores global state
        (e.g. registered namespaces, custom node types, etc.) -->
    <!-- <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
        <param name="path" value="${rep.home}/repository"/>
    </FileSystem> -->

	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
       <param name="url" value="jdbc:oracle:thin:@server:port:sid"/>
       <param name="user" value="***"/>
       <param name="password" value="***"/>
       <param name="schemaObjectPrefix" value=""/>
       <param name="tableSpace" value=""/>
  </FileSystem>

    <!-- Security configuration -->
    <Security appName="OpenKM">
        <!-- Access manager: FQN of class implementing the AccessManager interface -->
        <AccessManager class="com.openkm.core.OKMAccessManager"/>
        <!-- <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager"/> -->
        <!-- <AccessManager class="org.apache.jackrabbit.core.security.DefaultAccessManager"> -->
            <!-- <param name="config" value="${rep.home}/access.xml"/> -->
        <!-- </AccessManager> -->
    </Security>

    <!-- Location of workspaces root directory and name of default workspace -->
    <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default"/>

    <!-- Workspace configuration template:
         used to create the initial workspace if there's no workspace yet -->
    <Workspace name="${wsp.name}">
        <!-- Virtual file system of the workspace:
             class: FQN of class implementing the FileSystem interface -->
        <!-- <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${wsp.home}"/>
        </FileSystem> -->

		<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
	       <param name="url" value="jdbc:oracle:thin:@server:port:sid"/>
	       <param name="user" value="***"/>
	       <param name="password" value="***"/>
	       <param name="schemaObjectPrefix" value=""/>
	       <param name="tableSpace" value=""/>
	  </FileSystem>

        <!-- Persistence manager of the workspace:
             class: FQN of class implementing the PersistenceManager interface -->
        <!-- <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
          <param name="driver" value="com.mysql.jdbc.Driver"/>
          <param name="url" value="jdbc:mysql://localhost:3306/okm_repo?autoReconnect=true&useUnicode=true&characterEncoding=UTF8"/>
          <param name="schema" value="mysql"/>
          <param name="user" value="openkm"/>
          <param name="password" value="*****"/>
          <param name="schemaObjectPrefix" value=""/>
          <param name="externalBLOBs" value="false"/>
        </PersistenceManager> -->

        <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
		    <param name="consistencyFix" value="false"/>
          <param name="driver" value="oracle.jdbc.driver.OracleDriver"/>
          <param name="url" value="jdbc:oracle:thin:@server:port:sid"/>
          <param name="user" value="***"/>
          <param name="password" value="***"/>
		    <param name="schema" value=""/>
          <param name="schemaObjectPrefix" value=""/>
		    <param name="blockOnConnectionLoss" value="false"/>
		    <param name="externalBLOBs" value="false"/>
		    <param name="bundleCacheSize" value="8"/>
		    <param name="consistencyCheck" value="false"/>
		    <param name="minBlobSize" value="16384"/>
		    <param name="tableSpace" value=""/>
		    <param name="errorHandling" value=""/>
        </PersistenceManager>

        <!-- Search index and the file system it uses.
             class: FQN of class implementing the QueryHandler interface -->
        <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="textFilterClasses" value="
            org.apache.jackrabbit.extractor.PlainTextExtractor,
            org.apache.jackrabbit.extractor.MsWordTextExtractor,
            org.apache.jackrabbit.extractor.MsExcelTextExtractor,
            org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
            org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
            org.apache.jackrabbit.extractor.RTFTextExtractor,
            org.apache.jackrabbit.extractor.HTMLTextExtractor,
            org.apache.jackrabbit.extractor.XMLTextExtractor,
            org.apache.jackrabbit.extractor.PngTextExtractor,
            org.apache.jackrabbit.extractor.MsOutlookTextExtractor,
            com.openkm.extractor.PdfTextExtractor,
            com.openkm.extractor.AudioTextExtractor,
            com.openkm.extractor.ExifTextExtractor,
            com.openkm.extractor.TiffTextExtractor,
            com.openkm.extractor.SourceCodeTextExtractor,
            com.openkm.extractor.MsOffice2007TextExtractor"/>
            <param name="extractorPoolSize" value="2"/>
            <param name="supportHighlighting" value="false"/>
            <param name="indexingConfiguration" value="${wsp.home}/../../../indexing_configuration.xml"/>
        </SearchIndex>
    </Workspace>

    <!-- Configures the versioning -->
    <Versioning rootPath="${rep.home}/version">
        <!-- Configures the filesystem to use for versioning for the respective
             persistence manager -->
        <!-- <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${rep.home}/version" />
        </FileSystem> -->

        	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
	       <param name="url" value="jdbc:oracle:thin:@server:port:sid"/>
	       <param name="user" value="***"/>
	       <param name="password" value="***"/>
	       <param name="schemaObjectPrefix" value=""/>
	       <param name="tableSpace" value=""/>
  			</FileSystem>

        <!-- Configures the persistence manager to be used for persisting version state.
             Please note that the current versioning implementation is based on
             a 'normal' persistence manager, but this could change in future
             implementations. -->
        <!-- <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
          <param name="driver" value="com.mysql.jdbc.Driver"/>
          <param name="url" value="jdbc:mysql://localhost:3306/okm_repo?autoReconnect=true&useUnicode=true&characterEncoding=UTF8"/>
          <param name="schema" value="mysql"/>
          <param name="user" value="openkm"/>
          <param name="password" value="*****"/>
          <param name="schemaObjectPrefix" value=""/>
          <param name="externalBLOBs" value="false"/>
        </PersistenceManager> -->

        <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
		    <param name="consistencyFix" value="false"/>
          <param name="driver" value="oracle.jdbc.driver.OracleDriver"/>
          <param name="url" value="jdbc:oracle:thin:@server:port:sid"/>
          <param name="user" value="***"/>
          <param name="password" value="***"/>
		    <param name="schema" value=""/>
          <param name="schemaObjectPrefix" value=""/>
		    <param name="blockOnConnectionLoss" value="false"/>
		    <param name="externalBLOBs" value="false"/>
		    <param name="bundleCacheSize" value="8"/>
		    <param name="consistencyCheck" value="false"/>
		    <param name="minBlobSize" value="16384"/>
		    <param name="tableSpace" value=""/>
		    <param name="errorHandling" value=""/>
        </PersistenceManager>
    </Versioning>

    <!-- Search index for content that is shared repository wide
         (/jcr:system tree, contains mainly versions) -->
    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        <param name="path" value="${rep.home}/repository/index"/>
        <param name="textFilterClasses" value=""/>
        <param name="extractorPoolSize" value="2"/>
        <param name="supportHighlighting" value="false"/>
    </SearchIndex>

    <!-- DataStore improve file handling performance -->
    <!-- <DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
        <param name="path" value="${rep.home}/repository/datastore"/>
        <param name="minRecordLength" value="100"/>
    </DataStore> -->

	<DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
        <param name="url" value="jdbc:oracle:thin:@server:port:sid"/>
        <param name="user" value="***"/>
        <param name="password" value="***"/>
        <param name="databaseType" value="oracle"/>
        <param name="driver" value="oracle.jdbc.driver.OracleDriver"/>
        <param name="minRecordLength" value="1024"/>
        <param name="maxConnections" value="3"/>
        <param name="copyWhenReading" value="true"/>
        <param name="tablePrefix" value=""/>
        <param name="schemaObjectPrefix" value=""/>
    </DataStore>
</Repository>
I suppose, jackrabbit still does not find database table. Can you guide me through configuration?

Thanks,
 #30535  by jllort
 
You got some confusion what I will try to solve. Until OpenKM version 5.1.11 we've been using jackrabbit, from version 6.2 we've create our own repository. OpenKM metadata by default is stored in database, and binary information depending the datastore configuration class can be stored into database or file system. Normally all dms suggest store information in file system, the reason is perfomance and backup. I do not know the reason why you want to store binary information to Oracle and I would like to understand. Imagine one milion of docs ( normally 1mb per doc ) than means 1Tera byte of information in your oracle for only one milion rows ( do you follow me ?) a huge backup and complex if you want to doing incremental etc...
Is possible store in database ? yes. I suggest it ? no in openkm and other dms for storing binary data.

Think in the scenario of retrieving a document:
1- user does a UI call
2- arrive to server
3- connect to database ( database query latency )
4- create tmp file to store binary data from database to file system ( you can not doing it on memory, this is done transparently to you by any application ). Extra time for creation and more memory and resources used.
5- open file system send across server
6- delete tmp file

In file system scenario:
1- user does a UI call
2- arrive to server
3- open file system send across server

Other advantage
- A lot of tools for doing file system incremental backup ( in database scenario is more complex )

Hope at least you thing on it. Select openkm or other product, the scenario is exactly the same. We got some customer in this scenario for high level security reason. But sincerally I continue thinking the security you got from server could be similar than you got on database, or better. I do not like much, take as is only an opinion.
 #30546  by mernst0
 
Hello jllort , thanks for your answer!

no jackrabbit anymore? I fell into outdated wiki, thanks for pointing that out.
I do not know the reason why you want to store binary information to Oracle and I would like to understand.
We must use Oracle archivelog and recovery. Without that, there will be data loss. It's not the question if there will be data loss - there will be. It's only the question when data loss will happen. See https://asktom.oracle.com/pls/asktom/f? ... 9003975499 where "noarchivelog" and "file system" ist pretty much the same. I already paid for these Oracle features - so why not use them. This is ko criterion.
Imagine one milion of docs ( normally 1mb per doc ) than means 1Tera byte of information in your oracle for only one milion rows ( do you follow me ?)
Yes, actually there is an oracle based dms in use, with about 6 million docs (counted docs only, not versions). We'd like to switch over to a new dms.
a huge backup and complex if you want to doing incremental etc...
No problem, this is already there.

Refering to your db-scenario-list:
3- connect to database ( database query latency )
This is already solved by connection pooling.
4- create tmp file to store binary data from database to file system
Why that? LOB data arrives from database as array of byte, which can directly be dumped into response.
6- delete tmp file
no tmp file, no delete

Only 1, 2, 5 remaining...
- A lot of tools for doing file system incremental backup ( in database scenario is more complex )
I cannot imagine something easier than copying oracle archive files to a backup destination. It's not much more difficult to put a tablespace temporarily into backup mode. Again: without something like oracle's archivelog there is possible data loss up from last cold backup.

Can you give me a hint how to configure OpenKM datastore with Oracle?
 #30555  by jllort
 
OpenKM community edition does not support Oracle, there's some optimizations only present in professional to get OpenKM running with Oracle. You should use trial. If you need information about how configure it, contact us across website contact form and our sales dept marketing will help you on it. http://www.openkm.com/en/contact.html

About "Why that? LOB data arrives from database as array of byte, which can directly be dumped into response." I think you're not considering your Oracle is behind tomcat application, when file stream goes from Oracle to tomcat response, althought you configure a lot of memory in your tomcat, you can not take control about what happens there. In almost cases will be created tmp file by tomcat and then flush across response stream, in other will be going across tomcat assigned ram. But consider almost cases will be across tmp file, and you can not take control on it. Must add latency time between database and tomcat etc...

This is not OpenKM issue, is something general in all applications ( webserver application, etc... including Oracle tools) what flush stream binary info from some database.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.