Golden Rules for Repository Managers

We are indexing all kinds of academically relevant resources - journals, institutional repositories, digital collections etc. - which provide an OAI interface and use OAI-PMH for providing their contents (learn more about OAI at the Open Archives Initiative or Wikipedia). In case your source does not provide an OAI interface, upload your documents to aggregators like DataCite or Zenodo, to subject repositories like RePEC or add your open access journal to DOAJ. We are indexing these sources regularly.

However, the best way to get your documents indexed by BASE is to provide an OAI interface. We have compiled some golden rules that might be helpful to optimize your OAI interface. If your OAI interface complies with these rules, we can assure fast and smooth indexing of your source. Data from your source will be presented completely and in the best possible way.

You can check some of the following items using our OAI-PMH validator OVAL.

  • OAI interface working
    Your OAI interface is stable and responds to requests. ListRecords delivers results without timeout or other issues, e.g. an XML error.
    Otherwise, it is not possible to index your source.

  • Comprehensive metadata
    Each item exposed via your OAI interface provides metadata as comprehensive as possible (title, author, abstract, publication date) using the info-eu-repo vocabulary.
    If important metadata is missing, documents from your source will be difficult to find in BASE. Using the info-eu-repo vocabulary makes sure that we can process and display hits from your source in the best way.

  • Identifier (URLs) working
    Each item provides a URL in the <dc:identifier> beginning with http oder https. This accurate identifier leads to the frontdoor of the document or directly to the fulltext (PDF). If the fulltext is not provided in a common file format like HTML or PDF, the identifier should lead to the frontdoor. When using DSpace, the handle must be configured properly - otherwise the identifier will lead to a dummy URL (123456789) that won't resolve.
    Only documents whose identifiers beginning with http(s) and without dummy URL will be indexed.

  • Providing access information (Open Access)
    Access information of the fulltext should be provided in the field <dc:rights> of each item according to the info-eu-repo vocabulary. Alternative: Open access documents are provided in an extra set (OA set). The name of this set is listed in each metadata record in the field setSpec.
    If correct access information is missing, these information can not be found in BASE. Search and refinement on "access" will not work properly for your source.

  • Providing information concerning re-use / licence (Creative Commons)
    Authors can publish their work under a Creative Commons licence in your repository. You expose the chosen license in your OAI interface within an additional <dc:rights> field, e.g. <dc:rights></dc:rights>.
    If there are no such specifications search and refinement on "re-use" will not work properly in BASE for your source.

  • Character encoding
    All content exposed via your OAI interface (title, creator, abstracts) is encoded in UTF-8.
    Other encondings or double endcodings my cause an incorrect display of hits from your source.

  • Publication date
    The publication year or publication date is provided in the field <dc:date> in ISO 8601 (YYYY-MM-DD, e.g. 2016-04-01 for the 1st of April, 2016) according to the Gregorian (western) calendar. The field <dc:date> should only be used once.
    If you do not provide correct publication dates, refining or sorting by publication year will not work properly for your source in BASE.

  • Document language
    The language of a document is provided in the field <dc:language> in ISO 639 (2 or 3 letter code, e.g. en or eng for english).
    If you do not provide correct language information, these information can not be found in BASE and search and refinement on "language" will not work properly for your source.

  • Source / Suggested citation
    The source or suggested citation of an item (e.g. journal's name, volume and issue of an journal article) is provided in <dc:source>.
    These details allow a better retrieval of your documents.

  • Items per page
    Every ListRecords includes 50-1000 items at most. The resumptionToken is working and is delivering the next 50-1000 items.
    Less than 50 items per ListRecord will increase the number of calls while we are harvesting your source. More than 1000 items per ListRecord will provoke large file sizes and increase the risk of termination of the harvesting process. If the resumptionToken is not working properly indexing is impossible.

  • Contact person
    The identify request of your OAI interface includes the field adminEmail, which contains the active e-mail address of a technical admin. The homepage of your source gives the e-mail adress of the content provider.
    Providing this information makes it possible to contact you in case of questions or issues concerning harvesting and indexing your source.

  • Changes / Updates
    Changes of the basic URL of your OAI interface, changes of the repository software or the name of your repository should be reported via our contact form.
    We are checking and correcting all sources from time to time. If you report changes directly, you can ensure that your source will be completely and correctly indexed by BASE. We will pass on this information to the global community via our OAI PMH blog.

  • Spread the word!
    Register your source in OAI registries like OpenDOAR, ROAR or Openarchives and update any changes in the registries.
    Make your source and your interfaces known to the community and consider allowing other search engines to index documents from your source.