Indexing with error


We are in a process of indexing one of our journals in the national library of Spain (Fundacion Dialnet), however, they returned to us about a error on our OJS - specifically a “source problem - it contains a illegal Unicode character - 0x4”:

> We have received authorization for the accommodation of the full text of the magazine digital object.

> However, we can not begin the process of assessing its magazine facing the inclusion digital object since we have detected errors in its OJS that prevent us consult metadata Articles

> It is a problem with the power. It contains an illegal character unicode 0x4 (End-of-Transmission).

> Specifically it is a title for this record

How can I to detect this character?

PS: the article, to example is

Hi @Mauricio_Adriano,

Do you know how they’re harvesting the content? Is it via OAI, or web scraping, or something else?

Alec Smecher
Public Knowledge Project Team

Hi Alec, I think they are harvesting via OAI, because they returned the text below attach into the message mail:


<dc:title xml:lang="pt-BR">Comparação do fator de empilhamento sob diferentes condições para madeira de Eucalyptus grandisA comparison of wood piling factor under different conditions for  Eucalyptus grandis wood</dc:title>
<dc:creator>Lisboa, Gerson dos Santos; Universidade Estadual do Centro-Oeste-UNICENTRO</dc:creator>
<dc:creator>Dias, Andrea Nogueira; Universidade Estadual do Centro-Oeste-UNICENTRO</dc:creator>
<dc:creator>Valerio, Alvaro Felipe; Universidade Estadual do Centro-Oeste-UNICENTRO</dc:creator>
<dc:creator>Silvestre, Raul; Universidade Estadual do Centro Oeste</dc:creator>
<dc:subject xml:lang="pt-BR"></dc:subject>
<dc:subject xml:lang="pt-BR">Eucalyptus grandis; volume empilhado; teste de Tukey</dc:subject>
<dc:subject xml:lang="pt-BR"></dc:subject>
<dc:description xml:lang="pt-BR">O presente trabalho tem por objetivo comparar os fatores de empilhamento obtidos em três métodos distintos de empilhamento: 1) empilhamento em cima do caminhão; 2) empilhamento mecânico no pátio da fábrica e 3) empilhamento manual no pátio da fábrica. Os dados empregados são originários de um plantio de Eucalyptus grandis, visando à produção de celulose e pertencente ao GRUPO LWART, situado no município de Lençóis Paulista, SP, cortado aos sete anos de idade através do sistema semi-mecanizado. Foram analisados dados provenientes de vinte pilhas de madeira, formadas por toras de 2,80 m de comprimento e diâmetro mínimo de 6,0 cm. O fator de empilhamento de cada pilha (FE) foi obtido pela relação entre o volume da pilha em metros estéreos (st) e o correspondente volume sólido em metros cúbicos (m³). O volume sólido (m³) foi obtido pelo método de Smalian. Para calcular o volume estéreo (st) para três diferentes métodos de empilhamento foi utilizada uma régua graduada para medir a altura, a largura e o comprimento da pilha. Os três métodos de empilhamento foram comparados estatisticamente a partir de uma análise de variância, considerando um delineamento inteiramente casualizado, onde os métodos foram considerados tratamentos e as pilhas repetições. Na ocorrência de diferenças significativas entre tratamentos, o teste Tukey foi utilizado para comparar suas médias, considerando um nível de 5% de significância. A análise estatística indicou diferenças significativas entre o método de empilhamento manual ou tratamento 3, dos demais métodos, ou seja, método do empilhamento em cima do caminhão e método de empilhamento mecânico. Conclui-se então, que é errôneo aplicar um fator de empilhamento médio, se houver diferentes formas de empilhar a madeira. Abstract The research objective has been to compare the wood  piling factors  obtained from three distinct methods: 1 ) piling up on the truck; 2)  mechanical piling up of the wood at the factory patio; 3 ) manual piling  up at the factory patio. The examined data are from a Eucalyptus grandis  forest for cellulose production, which is owned by the GRUPO LWART,  located at the city of Lençóis Paulistas – São Paulo. The seven-yearold trees were cut through the semi-mechanized system. The data was  collected from twenty stacks of wood, made up with logs of 2 ,80 m  of length and a minimum diameter of 6,0 cm. The piling factor (PF)  of each stack was obtained from the relation between the stereometric  volume of the stack (st) and the correspondent volume in cubic meters  (m³). The solid volume (m³) was figured through the Smalian method.  In order to calculate the stereometric volume (st) of the three different  methods of piling up, a marked ruler was used to measure the height,  width and length of the stack. The three piling up methods were compared  statistically on the basis of variance analysis, considering a distribution  entirely at random, with the methods taken as treatments and the stacks as  repetitions. When significant differences between treatments were found,  the Tukey test was used to compare their averages, considering a mean  of 5%. The statistical analysis indicated significant differences between   the method of manual piling up (treatment 3 ) and the other methods,  namely, the piling up on the truck method and the mechanical piling up  method. It may be concluded that it is wrong to apply an average factor  of piling up when there are different forms of piling up the wood.</dc:description>
<dc:publisher xml:lang="pt-BR">Universidade Estadual do Centro-Oeste do Paraná, UNICENTRO</dc:publisher>
<dc:contributor xml:lang="pt-BR"></dc:contributor>
<dc:type xml:lang="pt-BR"></dc:type>
<dc:type xml:lang="pt-BR"></dc:type>
<dc:source xml:lang="pt-BR">AMBIÊNCIA; v. 5, n. 1 (2009): Ambiência; 81-91</dc:source>
<dc:coverage xml:lang="pt-BR"></dc:coverage>
<dc:coverage xml:lang="pt-BR"></dc:coverage>
<dc:coverage xml:lang="pt-BR"></dc:coverage>


Hi @Mauricio_Adriano,

It looks like there are some invalid characters in the abstract of that article. I’ve confirmed this via OAI:

For example, the abstract text contains:

three distinct methods: 1^D) piling up on the truck

The ^D is the end-of-transmission character.

I’m not sure how this content would have gotten in – via XML import? Or copy/paste from a word processor?

One option would be to look for this character in the article_settings table in the database, for setting_name='abstract'. But I’d suggest investigating the source before doing that.

Alec Smecher
Public Knowledge Project Team

Hi @asmecher,
We are getting a similar validation error while trying to register our content through OAI in WorldCat. For every record the field: Resource Type info:eu-repo/semantics/article throws the following error: Record contains validation errors: Too short

Hi @pcansf,

I’m afraid I’m not sure what data it’s referring to being too short. You’d probably need to check with Worldcat what their length heuristic was being applied to.

Alec Smecher
Public Knowledge Project Team

Thank you. We have opened a support ticket with them to resolve this. I will report the findings as soon as I have them.