Page 1 of 1

mtime spoils incremental backup of repository

PostPosted:Sun Oct 02, 2011 8:51 am
by snowman
Hello,

I am referring to a thread from 2010 which as died.
I automatically do backups of the repository and database (OpenKM 5.1.7) . I tried to implement incremental backup on the repository but always the full repository is backed up.
I found out that each file of the repository gets an updated atime and mtime. I do not understand what and why modfies mtime since all files in the repo are PDFs.

Does anybody know what touches mtime in the repo every night and maybe why?
Mor important how to turn it off?

Best rgards,
Snowman

Re: mtime spoils incremental backup of repository

PostPosted:Mon Oct 03, 2011 11:03 am
by pavila
Need to be studied in depth because is related to Jackrabbit, which is the repository used by OpenKM to store the documents. Anyway we perform incremental backups with rdiff-backup and rsync under Linux with no problems.

Re: mtime spoils incremental backup of repository

PostPosted:Wed Oct 05, 2011 8:46 pm
by snowman
How can I help investigating? I have no idea how jackrabbit works and how it could in their forum.

The service is running but was not touched for several days now. No user interaction, no import.
I have 495 files in the datastore found by "find datastore -type f | wc -w" of which 495 have been modified less than 24 hours ago: "find datastore -type f -mtime 0 | wc -w".

When I "stat" a file exemplary it returns:
Code: Select all
  File: `03/fa/b4/03fab4f1ad8268d2921b743b8b90562dfb4908a9'
  Size: 69761           Blocks: 144        IO Block: 4096   regular file
Device: fd11h/64785d    Inode: 399189      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2011-10-05 01:05:31.419387817 +0200
Modify: 2011-10-05 00:00:07.030000000 +0200
Change: 2011-10-05 00:00:05.028873604 +0200
 Birth: -
The access time was changed by my backup system which is bacula.
The file is originally a pdf.
I have not consciously changed the default configuration regarding the repo.

Re: mtime spoils incremental backup of repository

PostPosted:Thu Oct 06, 2011 5:45 am
by snowman
Today I did another stat on the same file:
Code: Select all
  File: `03/fa/b4/03fab4f1ad8268d2921b743b8b90562dfb4908a9'
  Size: 69761           Blocks: 144        IO Block: 4096   regular file
Device: fd11h/64785d    Inode: 399189      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2011-10-06 01:29:40.506123084 +0200
Modify: 2011-10-06 00:00:08.507000000 +0200
Change: 2011-10-06 00:00:06.505535372 +0200
 Birth: -
Access time originates from my backup. No other interaction was done.
I also tried rsync. It shows the expected behavior. Every file is synced again the next day.

Can anyone tell me where the jackrabbit configuration files are?

Re: mtime spoils incremental backup of repository

PostPosted:Wed Oct 12, 2011 7:23 am
by pavila
Seems to be a collateral effect of the DataGarbageCollector daemon which removed orphan files from DataStore. I will try to find the reason. Created this issue http://issues.openkm.com/view.php?id=1831

In rsync you can bypass the mod-time check using the --checksum parameter. Also take a look at --size-only parameter.

More info on these parameters at Rsync difference between --checksum and --ignore-times options.