• XML enable indexing text in windows server

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #25555  by kknd
 
Hi !!!

My XML files are not indexed, how do I activate?

I noticed in setting these line:

"org.apache.jackrabbit.extractor.XMLTextExtractor"

but does not work, and i use a windows

See:
Image

Original XML part file:
Code: Select all
    <transp>
      <modFrete>0</modFrete>
      <transporta>
        <CNPJ>9917123400191</CNPJ>
        <xNome>Distribuidora.</xNome>
        <IE>171999999119</IE>
        <xEnder>rua do centro</xEnder>
        <xMun>SAO PAULO</xMun>
        <UF>SP</UF>
      </transporta>
      <veicTransp>
        <placa>BXI1717</placa>
        <UF>SP</UF>
        <RNTC>123123789</RNTC>
      </veicTransp>
      <reboque>
        <placa>BXI112318</placa>
        <UF>SP</UF>
        <RNTC>12123789</RNTC>
      </reboque>
      <vol>
        <qVol>10000</qVol>
        <esp>CASDA</esp>
        <marca>LIASYA</marca>
        <nVol>500</nVol>
        <pesoL>1000000000.000</pesoL>
        <pesoB>1200000000.000</pesoB>
        <lacres>
          <nLacre>XYZ23423486</nLacre>
        </lacres>
      </vol>
    </transp>
    <infAdic>
      <infAdFisco>de exemplo</infAdFisco>
    </infAdic>
 #25574  by kknd
 
in registered.text.extractors
Code: Select all
 	org.apache.jackrabbit.extractor.PlainTextExtractor
org.apache.jackrabbit.extractor.MsWordTextExtractor
org.apache.jackrabbit.extractor.MsExcelTextExtractor
org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
org.apache.jackrabbit.extractor.RTFTextExtractor
org.apache.jackrabbit.extractor.HTMLTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
org.apache.jackrabbit.extractor.PngTextExtractor
org.apache.jackrabbit.extractor.MsOutlookTextExtractor
com.openkm.extractor.PdfTextExtractor
com.openkm.extractor.AudioTextExtractor
com.openkm.extractor.ExifTextExtractor
com.openkm.extractor.CuneiformTextExtractor
com.openkm.extractor.SourceCodeTextExtractor
com.openkm.extractor.MsOffice2007TextExtractor 
 #25658  by jllort
 
Can you test in our online demo if problem happens there too. And if it happens indicate the file path, I would like to see the contents if there're some reason why the parser does not like it.
 #25672  by jllort
 
I've tested with other xml and seems goes right, you can see at http://demo.openkm.com/OpenKM/index.jsp ... 59da2612f3

I take a look into xml and seems is signed document. I suspect could be some error in xml or similar that could cause when parser goes across xml tag find the error and system goes to some Exception. Basically xml extractor goes across all nodes and parse only values or attributes not xml tags ... if he find some error then break. I think is what's happening.
 #25682  by pavila
 
I have found a couple of errors in the NFe_falhaSchema.xml document:

- It has several spaces before "<?xml version="1.0" encoding="utf-8"?>" and the XML parser does not like that.
- The ending "</NFe>" is missing and the XML parser can't validate it.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.