OHS Open Harvester Doesn't Harvest arXiv.org and CERN Document Server

Hello.
OHS doesn’t harvest this 2 archives while both base URL are working.
Problems don’t arrise with other archives.
Maybe the reason is in CERN’s and Arxive’s output format?

ccjourna@node01 [~/public_html/lor/tools]# php harvest.php 7 verbose
Selected archive: arXiv
Fetching records...
Harvest URL: http://export.arxiv.org/oai2?verb=ListRecords&metadataPrefix=oai_dc
Finished:
    0 records indexed
    3 seconds elapsed
    0.00 records per second
    0 records kept from past harvests
    0 records total.

ccjourna@node01 [~/public_html/lor/tools]# php harvest.php 9 verbose
Selected archive: CERN Document Server
Fetching records...
Harvest URL: http://cdsweb.cern.ch/oai2d?verb=ListRecords&metadataPrefix=oai_dc
Finished:
    0 records indexed
    2 seconds elapsed
    0.00 records per second
    0 records kept from past harvests
    0 records total.

Any assistance would be appreciated - thanks, Alex

Hi @AVM,

The arXiv OAI-PMH endpoint seems to use pretty heavy connection throttling – you might want to set a throttling_delay in your config.inc.php configuration file, and make sure not to make too many requests via the web interface in an interval.

Regards,
Alec Smecher
Public Knowledge Project Team