Description of "Transform a CSV file whose separator is a semicolon into a SKOS-XML file" service
This transformation allows to generate a SKOS file from a spreadsheet (Excel, LibreOffice, etc.) saved as CSV and whose separator is a semicolon.
The input file must:
- use the indicated labels for the different fields (see below),
- have the semicolon as field separator,
- use this separator «§§» for multi-valued fields (example : hormone§§drug),
- use double quotation marks (" / quote) as text delimiter for fields that contain semicolons as ponctuation signs. Add quotation marks around such fields to avoid spliting of text at semicolon. If text contains quotes, they must be doubled.
If the file is generated from LibreOffice, use the "Save As" menu and choose the semicolon as separator.
Example of CSV file:
The data is transformed as follows:
- An XML file is created to hold the entire terminological resource.
- Each line except the first one becomes a "skos:Concept" , if an identifier is present, it is attributed to the concept; otherwise, a temporary uri is assigned to it in the "rdf:about" attribute.
- The labels in the first line are converted to their SKOS counterpart, for example, "prefLabel_en" becomes "skos:prefLabel" with an "xml:lang="en"" attribute.
- The content of each cell is put into the appropriate SKOS property. If the content is multi-valued, it is split into as many properties as values separated by the "§§" separator.
- The related and broader relationships are processed in two stages: firstly, a "skos:related" or "skos:broader" property is generated for each related or broader terms
then in a second step, it is the uri of the concept corresponding to the terms in question which is put in the attribute "rdf:resource".
- If the file has groups, a "skos:Collection" is created for each group.
In addition, the transformation also inserts two blocks at the beginning of the XML file:
- A "cc:License" block with the default Creative Commons CC-BY 4.0 license that should be changed if the resource is released under a different license.
- A "skos:ConceptScheme" block with:
- an URI derived from concept identifiers;
- properties for metadata to be completed / modified by the user at the output file level:
- English, French and Spanish titles (dc:title),
- English, French and Spanish descriptions (dc:description),
- English, French and Spanish subjects (dc:subject),
- creator name (dc:creator),
- license name (cc:license),
- English, French and Spanish names of organization / institution to which the resource must be attributed (cc:attributionName),
- web site of organization / institution to which the resource must be attributed (cc:attributionURL),
- top-concepts (skos:hasTopConcept) if the resource is highly structured,
- resource languages as calculated from language tags of preferred labels of concepts (dcterms:language with lexvo/ISO 639-3 code attribute),
- creation date (dcterms:created),
- last modification date (dcterms:modified),
- version (owl:versionInfo).
If the concepts do not have identifiers, the default URI of the resource is "http: //www.mysite/vocabs/ABC". It is also the root of the URI of concepts, relationships and possible collections.
It must be replaced as follows:
- Replace "http://www.mysite/" by the correct URL.
- Keep "/vocabs/".
- Replace "ABC" by a short alphanumerical code that will identify the resource.
At the concept level, the URI is a concatenation of the resource's URI with a unique identifier; at the collection level, the URI is a concatenation of the resource's URI with the group name by replacing the spaces with "_".
Example of a concept record:
Examples of collections:
To switch to ARK identifiers, use the transformation "Assign ARK identifiers to a valid SKOS/RDF-XML file".
List of labels to be used ("prefLabel" is mandatory):
Terminological data |
Label to use (xx = 2 digit ISO code for language) |
Comment |
Preferred label |
prefLabel_xx |
A "preflabel_en" is expected. |
Alternative label |
altLabel_xx |
|
Hidden label |
hiddenLabel_xx |
|
Definition |
definition_xx |
|
Note |
note_xx |
|
Scope note |
scopeNote_xx |
|
Editorial note |
editorialNote_xx |
|
History note |
historyNote_xx |
|
Change note |
changeNote_xx |
|
Example |
example_xx |
|
Broader term |
broader_xx |
A "broader_en" is expected. |
Related term |
related_xx |
A "related_en" is expected. |
Group (collection) |
group_xx |
A "group_en" is expected. |
Exact match |
exactMatch |
|
Close match |
closeMatch |
|
Broad match |
broadMatch |
|
Narrow match |
narrowMatch |
|
Related match |
relatedMatch |
|
Replace "xx" by 2 digit ISO code for language; example "prefLabel_en" for the English preferred label. See list of ISO 639-1 codes.