Your Data in EROMM Web Search

EROMM Web Search is kind of a fallback for data that does not fit into EROMM Classic, because the description of the reformatted items isn't detailed enough (see Your Data in EROMM Classic for details).

EROMM Web Search acquires data in two ways:

harvesting OAI-sources
extracting text from webpages (the classic search engine approach)

We prefer to get the data via OAI-PMH if possible, because it allows better display and retrieval. Extracting text from webpages should be seen as a last resort for inclusion.

OAI-PMH

EROMM Web Search harvests metadata only in Dublin Core (“oai_dc”), whose support is mandatory in the OAI-PMH (for other formats see Your Data in EROMM Classic). It uses a selection of the DC-elements for search and display and the dc:identifier element for linking. Except for dc:title all listed elements are optional and inclusion depends on the data in the OAI-source.

Search

The following elements may be used for the search-index:

dc:title
dc:creator
dc:contributor
dc:date
dc:publisher
dc:relation
dc:description
dc:source
dc:subject
dc:identifier

EROMM Web Search combines these elements into one index-field which is used for retrieval. A query on a single element is not possible.

Display

A smaller selection of the above elements can also be used to create the display of the results:

dc:title
dc:creator
dc:contributor
dc:date
dc:publisher
dc:identifier

If the dc:format element holds a MIME type, it will be used to create an icon indicating the format of the described reformatted item. Other content of dc:format is ignored.

Linking

EROMM Web Search's indexer evaluates the dc:identifier element to create links. Usually OAI-sources hold a direct link or a persistent identifier (e.g. DOI, URN, Handle) in this element. EROMM Web Search can handle both, but will always prefer a persistent identifier (if an OAI-source offers both, only the persistent identifier will be used).

If a source does not offer any link in the dc:identifier field, or a record for some reason doesn't have a link, the result in EROMM Web Search will point to the “start / search page” of the source.

EROMM Web Search can also handle multiple links for a record. They will all be presented to the user after clicking on the result.

OAI-Sets

EROMM Search harvests records on a set basis. Ideally there is one set (or the whole source) which includes all items in the scope of EROMM. Please tell us which sets you want to have included in EROMM Web Search when you submit your source.

Websites

EROMM Web Search extracts text from a website and uses it for search and display. It can crawl through a page and only follow links with certain patterns. These are set manually for each source.

All the text of a webpage goes into the search index. In the result display EROMM Web Search presents a snippet from the text around the search term(s).

Please note that EROMM Web Search can only extract text from HTML-code - text generated by scripts, embedded in Flash, images and similar “objects” can not be used. Further, EROMM Web Search respects the Robots exclusion standard and will not crawl excluded pages.

Last modified:: 2018-01-04, 16:33

Use EROMM