Discussion:
xercesc dom input structure size limit, RAM limit?
thomak
2006-08-24 10:39:26 UTC
Permalink
Hi,

I tried to read large XML files using DOM on HP, using version .27.
Until 20 MB it works fine, but there is a limit, that I get,
when the structure gets bigger, it means also, the file is mostly bigger
too.
Problem:
During reading, the Reader (see below) quit reading returning exception.
It happens, when the process acheived the mark of 674 MB of used RAM.
I tried it on different machines and it is allways the same.
Is there a memory limit for xerces?

Thanks and greetings

Thomas

parser->setValidationScheme( XercesDOMParser::Val_Auto ) ;
parser->setDoNamespaces( false ) ;
parser->setDoValidation(true);
parser->setLoadExternalDTD( false ) ;
ErrorHandler* errHandler = (ErrorHandler*) new HandlerBase();
parser->setErrorHandler(errHandler);

try{

INFO("XML file to parse[" << xmlFile << "]\n");
parser->parse( xmlFile.c_str() ) ;
--
View this message in context: http://www.nabble.com/xercesc-dom-input-structure-size-limit%2C-RAM-limit--tf2157920.html#a5961473
Sent from the Xerces - C - Users forum at Nabble.com.
Ciaran McHale
2006-08-24 10:45:24 UTC
Permalink
Hi Thomas,
Post by thomak
I tried to read large XML files using DOM on HP, using version .27.
Until 20 MB it works fine, but there is a limit, that I get,
when the structure gets bigger, it means also, the file is mostly bigger
too.
During reading, the Reader (see below) quit reading returning exception.
It happens, when the process acheived the mark of 674 MB of used RAM.
I tried it on different machines and it is allways the same.
Is there a memory limit for xerces?
I don't know whether or not Xerces has any limits in the size of files
that it can process. However...

Perhaps the problem is not in Xerces but rather in the amount of virtual
memory available to your process. Type "man ulimit" to look at the manual
page for the "ulimit" command. In particular, your system administrator
might have placed a limit on the amount of virtual memory available to
applications. You can check for this with "ulimit -v". From memory, I
think the command to unset this limit is "ulimit -v unlimited". Another
possibility might be that the entire computer is running out of virtual
memory. If the "top" command is available on your computer then run it in
one shell window while running your application in another windows, and
look at the details in "top" regarding the amount of virtual memory or
swap space that is available. If this shrinks close to zero then lack of
virtual memory is your problem.


Regards,
Ciaran.
--
Ciaran McHale, Ph.D. Email: ***@iona.com
Principal Consultant Tel: +44-(0)7866-416-134 (mobile)
IONA Technologies, UK Tel: +44-(0)118-954-6632 (home office)
Fax: +44-(0)118-954-6767
thomak
2006-08-24 11:20:15 UTC
Permalink
Hi Ciaran,

thank you for your hints.
I tried with "ulimit -a" and got

time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 4293918720
stack(kbytes) 392192
memory(kbytes) unlimited
coredump(blocks) 4194303

I used "top" for checking the RAM usage. The problem seems to be xercesc
depending,
because there is enough virtual memory and a background process
using 100MB doesn't change the result, it measn,
unknow exception when reading xml, when 672-674MB used.
Can it be the stack? I am not sure.

Greetings

Thomas
Post by Ciaran McHale
Hi Thomas,
Post by thomak
I tried to read large XML files using DOM on HP, using version .27.
Until 20 MB it works fine, but there is a limit, that I get,
when the structure gets bigger, it means also, the file is mostly bigger
too.
During reading, the Reader (see below) quit reading returning exception.
It happens, when the process acheived the mark of 674 MB of used RAM.
I tried it on different machines and it is allways the same.
Is there a memory limit for xerces?
I don't know whether or not Xerces has any limits in the size of files
that it can process. However...
Perhaps the problem is not in Xerces but rather in the amount of virtual
memory available to your process. Type "man ulimit" to look at the manual
page for the "ulimit" command. In particular, your system administrator
might have placed a limit on the amount of virtual memory available to
applications. You can check for this with "ulimit -v". From memory, I
think the command to unset this limit is "ulimit -v unlimited". Another
possibility might be that the entire computer is running out of virtual
memory. If the "top" command is available on your computer then run it in
one shell window while running your application in another windows, and
look at the details in "top" regarding the amount of virtual memory or
swap space that is available. If this shrinks close to zero then lack of
virtual memory is your problem.
Regards,
Ciaran.
--
Principal Consultant Tel: +44-(0)7866-416-134 (mobile)
IONA Technologies, UK Tel: +44-(0)118-954-6632 (home office)
Fax: +44-(0)118-954-6767
--
View this message in context: http://www.nabble.com/xercesc-dom-input-structure-size-limit%2C-RAM-limit--tf2157920.html#a5962003
Sent from the Xerces - C - Users forum at Nabble.com.
Martin Harm
2006-08-24 12:10:20 UTC
Permalink
Hi Thomas,

I've got the same problem.. and it seams to me, that it is *not* a problem
of ulimit. The xerces just bounces, in my task with xml-files > 50~60MB and
ulimit ~ 2GB.
Another effect is, that the memory consumed by the dom-parser was aprox ~7 * xml-file-size.

Well, and if xml-files grow and grow ..
My solution was, to use the SAX-Api for XML-Processing of these large files.
(With some ..hm.. nice?.. helper-classes, to have a more convinient access to parts
of the xml document)

ciao
martin


________________________________

Von: thomak [mailto:***@thomak.de]
Gesendet: Do 2006-08-24 12:39
An: c-***@xerces.apache.org
Betreff: xercesc dom input structure size limit, RAM limit?




Hi,

I tried to read large XML files using DOM on HP, using version .27.
Until 20 MB it works fine, but there is a limit, that I get,
when the structure gets bigger, it means also, the file is mostly bigger
too.
Problem:
During reading, the Reader (see below) quit reading returning exception.
It happens, when the process acheived the mark of 674 MB of used RAM.
I tried it on different machines and it is allways the same.
Is there a memory limit for xerces?

Thanks and greetings

Thomas

parser->setValidationScheme( XercesDOMParser::Val_Auto ) ;
parser->setDoNamespaces( false ) ;
parser->setDoValidation(true);
parser->setLoadExternalDTD( false ) ;
ErrorHandler* errHandler = (ErrorHandler*) new HandlerBase();
parser->setErrorHandler(errHandler);

try{

INFO("XML file to parse[" << xmlFile << "]\n");
parser->parse( xmlFile.c_str() ) ;
--
View this message in context: http://www.nabble.com/xercesc-dom-input-structure-size-limit%2C-RAM-limit--tf2157920.html#a5961473
Sent from the Xerces - C - Users forum at Nabble.com.
Vitaly Prapirny
2006-08-25 07:37:53 UTC
Permalink
Hi,
Post by thomak
During reading, the Reader (see below) quit reading returning exception.
What kind of exception was thrown ?

Good luck !
Vitaly
thomak
2006-08-28 08:21:28 UTC
Permalink
Hi,
these exception are defined (see below)
and the catch(...) is thrown.
Thomas

} catch( xercesc::XMLException& e ){
throw DcError(EMPTY_OR_NOT_VALID_XML_DOC, buf.c_str() ) ;
} catch( const xercesc::DOMException& e ){
throw DcError(EMPTY_OR_NOT_VALID_XML_DOC, message.c_str()) ;
} catch (const DcError & e){
throw DcError(EMPTY_OR_NOT_VALID_XML_DOC, message.c_str()) ;
} catch (...){
delete errHandler;
throw DcError(EMPTY_OR_NOT_VALID_XML_DOC, message.c_str()) ;
}
Post by Vitaly Prapirny
Hi,
Post by thomak
During reading, the Reader (see below) quit reading returning exception.
What kind of exception was thrown ?
Good luck !
Vitaly
--
View this message in context: http://www.nabble.com/xercesc-dom-input-structure-size-limit%2C-RAM-limit--tf2157920.html#a6016213
Sent from the Xerces - C - Users forum at Nabble.com.
Vitaly Prapirny
2006-08-28 10:02:03 UTC
Permalink
Hi,
Post by thomak
these exception are defined (see below)
and the catch(...) is thrown.
Thomas
} catch( xercesc::XMLException& e ){
throw DcError(EMPTY_OR_NOT_VALID_XML_DOC, buf.c_str() ) ;
} catch( const xercesc::DOMException& e ){
throw DcError(EMPTY_OR_NOT_VALID_XML_DOC, message.c_str()) ;
} catch (const DcError & e){
throw DcError(EMPTY_OR_NOT_VALID_XML_DOC, message.c_str()) ;
} catch (...){
delete errHandler;
throw DcError(EMPTY_OR_NOT_VALID_XML_DOC, message.c_str()) ;
}
Could you try to add xerces::SAXException and std::exception to your
catch list or use debugger for narrowing what exception was thrown
exactly ?

Good luck !
Vitaly
thomak
2006-08-30 10:30:31 UTC
Permalink
Hi,

I tested the pareser on diverse machines (HP 10.20 and 11.11)
and came to the conclusion.
The parser works:
if compiled on 10.20 and executed on 10.20
if compiled on 11.11 as 64 bit version and executed on 11.11 (64 bit
machine)

It doesn't work if compiled on 10.20 or on 11.11 as a 32 version and started
on 11.11.
(unexpected exception, when used RAM achieved about 674MB).

So I compiled the parser as a 64 bit version and run it on 11.11, what
solved my problem.

Greetings

Thomas
--
View this message in context: http://www.nabble.com/xercesc-dom-input-structure-size-limit%2C-RAM-limit--tf2157920.html#a6056257
Sent from the Xerces - C - Users forum at Nabble.com.
Loading...