Metadata guidelines for the UK RDTF

Consultation summary

(See consultation document: http://rdtfmetadata.jiscpress.org)

About

(See consultation document section: http://rdtfmetadata.jiscpress.org/about/)

NOTE: THE COMMENT PERIOD HAS CLOSED - YOU CAN STILL LEAVE COMMENTS HERE IF YOU LIKE BUT WE MAY NOT BE ABLE TO TAKE ACCOUNT OF THEM IN THE FINAL GUIDELINES DOCUMENT.

This DRAFT document provides a set of guidelines for how metadata associated with library, museum and archival collections should be made available for the purposes of supporting resource discovery in line with the JISC/RLUK Resource Discovery Taskforce (RDTF) Vision.

This draft has been prepared by Andy Powell and Pete Johnston of Eduserv, with funding from the JISC.

It is being made available for comment prior to submission as a final deliverable to the JISC in early March 2011. The comment period runs from Thursday 3rd February until Friday 18th February.

Comments on all aspects of the guidelines are welcome. Comments can be made directly on JISCPress or by emailing the authors at andy.powell@eduserv.org.uk. Note that we have added a number of 'questions from the authors' to the JISCPress system to encourage debate.

The guidelines themselves suggest that RDTF metadata should be made openly available using one (or more) of three approaches, referred to as 1) the community formats approach, 2) the RDF data approach and 3) the Linked Data approach. The guidelines do not consider issues of compliance. In line with the vision, these guidelines primarily concern scenarios in which metadata is aggregated as the basis of resource discovery initiatives, though no assumptions are made about the scope or scale of such activities.

For a printable version of this draft, a single document version is available from Google Docs.

Introduction

(See consultation document section: http://rdtfmetadata.jiscpress.org/introduction/)

This document provides a set of guidelines about how metadata associated with library, museum and archival collections should be made available for the purposes of supporting resource discovery in line with the Resource Discovery Taskforce (RTDF) Vision [1].

Such a vision presents a number of significant metadata challenges because it spans the library, museum and archives sectors, each of which has multiple different metadata standards in current use. The RDTF Vision is about making the metadata from organisations within these sectors available in ways that are compatible with the web and that support the development of services that are useful to researchers, teachers and students.

This set of metadata guidelines is intended to ensure that a pragmatic approach is taken, to ease the burden of metadata reuse, and to help break down any sectoral silos that currently exist. The guidelines are high level and in line with emerging practice elsewhere on the web. Whilst we are not able to recommend the adoption of a single metadata standard, we do believe that by putting this guidance in place it will be possible to create significantly more coherence in the way that metadata is created, managed and used across the library, archives and museum sectors than is currently the case.

Comments (24)

Comment from the authors: It would be good to see some general statements of support or otherwise (.overall, this looks like a reasonable approach., .this approach is not workable for the following reasons.., etc.) as well as more detailed comments on individual sections. If you want to leave such a statement, please do so here.
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-04 08:47:36)

In the "About" section, which does not have provision for comments, Andy Powell says: "Important: If you are are making your comments available here using JISCPress, please do not use the .Reply. facility to respond to other people.s comments. Instead, leave a new comment every time." But there does not seem to be provision for "new comments" - the box into which I am typing this is headed, in red, "LEAVE A REPLY". Is this where new comments are supposed to go? And _please_ can you make the input box for comments a bit bigger - do you really want to restrict comments to "tweets" of 140 characters? (I know that it scrolls, but it's nice to be able to see all that you have written!)
Leonard Will (L.Will@willpowerinfo.co.uk - 2011-02-04 12:23:03)

Leonard - just discovered that if you click the 'REPLY' option at the bottom of someone's existing comment, your reply doesn't show. On the otherhand if you fill in the form headed 'LEAVE A REPLY' this works - I think this is the meaning of Andy's note
Owen Stephens (owen@ostephens.com - 2011-02-04 13:12:42)

Leonard, I've tried to improve my wording (possibly not very successfully!). I can't modify the size of the text boxes - it's outside of my sphere of influence :-)
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-04 15:41:00)

It may reflect the vision, but I'm disappointed that there is no stated aspiration to produce a resource which melds library, museum and archive metadata into a single space representing, say, human history. We still have three separate silos.
Richard Light (richard@light.demon.co.uk - 2011-02-05 13:57:40)

First of all, I'd like to point out that I do like the approach! We've made some different choices with the EDM (I'm coming back to these in context), but the general approach seems viable and pragmatic.
Stefan Gradmann (stefan.gradmann@ibi.hu-berlin.de - 2011-02-06 12:51:57)

However, I feel that three issues have been left out (maybe on purpose) and which we haven't solved yet in Europeana, either: how do we make statements on provenance, license status and the version of aggregations? These issues are easily solved in the three star approach but become urgent with RDF and Linked Data approaches. On what level of granularity do we make such statements? Triples? Aggregations?? And if it is aggregation level - what happens when a triple is added or modified or linked to a new external resource? To my knowledge, none of the standards mentioned solves this set of problems. It may be a good idea to exclude them explicitly from the current draft and to work on them jointly!
Stefan Gradmann (stefan.gradmann@ibi.hu-berlin.de - 2011-02-06 13:00:01)

Might be worth mentioning that there are various mappings in existence or under development which help different metadata standards to interoperate.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 09:37:58)

Perhaps the range of people commenting on these guidelines (as of 8th Feb) may be a little narrow and not as representative of the different groups as it could be?
Dominic Oldman (DOldman@BritishMuseum.org - 2011-02-08 13:20:01)

Dominic, I agree, but we have done our best to reach people (thru mailing lists, etc.). Will remind people again next week. Any suggestions for how we can get more input from museums and archives in particular?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-09 11:51:56)

Richard, the vision proposes that data will be aggregated in different ways - one of the examples being 'subject' which can be seen (IMO) as cross-sectoral and would fulfil your suggestion of representing 'human history'. However, the vision of aggregations is built on institutions (GLAMs) publishing their data in such a way that aggregations can be flexibly and easily built - and this document is about how that should happen. I think that is why the aspiration for producing a resource which melds the metadata is not explicitly stated here - there is more work going on around the vision that will explore these issues - the business case for aggregations, and how they are created and used.
Owen Stephens (owen@ostephens.com - 2011-02-10 12:00:04)

One of the phrases used in relation to the RDTF vision was 'build better websites' (http://www.ukoln.ac.uk/jisc-ie/blog/2010/08/19/aggregation-and-the-resource-discovery-taskforce-vision/) While this wasn't a literal instruction, the essence of the phrase for me is about ensuring GLAM metadata is part of the larger environment of the WWW. I have a concern that these guidelines move straight past the web as it exists now. While I'm keen on the 'web of data' concept, and broadly support the suggestions contained in this document, I think for many it may seem too remote from the 'day-to-day' and feel technically challenging. I wonder if there is room for more basic, incremental, improvements that could be made - alongside the three approaches outlined in this document. Taking a typical GLAM online catalogue as a starting point, I'm thinking of things such as: An html document per record published Each metadata record published should have a simple, persistent, URI (i.e. the URL of the html document at least) Each html document representing a metadata record should include a structured representation of the metadata These documents/records should be crawlable by web bots/spiders such as the Googlebot These are basic steps that would make GLAM content more discoverable by existing web search engines (which while not universally embraced as desirable feels like a relatively easy sell, and is a concrete outcome), while moving (very slightly) towards the use of URIs as identifiers and the integration of data into the WWW. Having reflected on this over the past week or so, I'm worried that the guidelines as currently laid out risk being (or perceived as stakeholders as being) a sector specific approach that is not in line with the majority of the web. Using 20/20 hindsight on how libraries have pushed content to the web to date the two things I would pick out are: Lack of simple persistent URIs for each catalogue record Lack of exposure of catalogue records to Google These are the first two things I'd put right if I had the chance - and I'm arguing these guidelines and the RDTF vision provide this opportunity.
Owen Stephens (owen@ostephens.com - 2011-02-15 09:54:56)

I agree with Owen about the possibility of providing a more basic approach that may be seen as more achievable. Providing an html document per collection record with a persistent URI would be an improvement on how things are now. I also wonder whether it would be acceptable for institutions to provide their descriptive metadata to aggregators who could fulfill one of these options for them. This reflects the currently reality, at least as far as the Archives Hub is concerned, for many archive repositories. I believe that for some it is much more practical and they benefit from being part of a cross-searching service and opening up their data. You might say that I would say this, being the manager of the Hub, but it is the reality, especially for many small archives, that even learning about how to use a form-based template to create descriptions and click a button to send them to the Hub is a hurdle.
Jane Stevenson (jane.stevenson@manchester.ac.uk - 2011-02-15 14:36:13)

Dear friends, Please see the link in the first paragraph is not redirecting to the repository document...
Nicolaie Constantinescu (kossonon@gmail.com - 2011-02-15 20:13:30)

The graduated approach offers a relatively easy first step to encourage participation and offers a route for progression. However we share the concerns raised by other respondents that the leap from community formats to RDF/Linked data may be a significant barrier to access.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 10:27:31)

I don.t see any reference to scope of metadata. This may be deliberate, but it is not clear whether guidelines are limited to metdata that describe publications, cultural objects, etc. or also inclusive of metadata for agents or concepts; i.e. authority data?
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 10:30:02)

Thanks Nicolaie. I've fixed that now.
Andy McGregor (a.mcgregor@jisc.ac.uk - 2011-02-16 10:40:33)

I do agree that the step between community formats and RDF/Linked data is a hard one. But I can't think of something in-between... So in general I do like the general approach. And yes, my commenting does not broaden much the range of commenters...
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 12:06:21)

+1 to all of Owen's comment about "ensuring GLAM metadata is part of the larger environment of the WWW"..."exposure of catalogue records to Google" I think this also relates to the chicken and egg situation that there isn't much motive to expose and aggregate metadata in a common format until someone builds a service that uses it, but you can't build the service until the metadata is there... So it helps if exposing the metadata can be shown to useful in enhancing services that people already use.
Phil Barker (phil.barker@hw.ac.uk - 2011-02-18 09:07:42)

I have come to this very late, and my comment may well be covered elsewhere, or indeed somewhere in the draft text where I did not spot it. I noted that there was a section that was oriented to metadata approaches specific to different sectors of GLAM (eg FRBR, CIDOC etc). My concern is that any and all GLAM sectors will be dealing with data in forms that are not comfortable for their sector. University libraries, for example, are likely to be dealing with research data with extremely high variability of form and content in their collections, and the FRBR approach may not be relevant. Now I don't know what this means in detail, but it seems to me that the guidelines need to have a degree of open-endedness to them, so that they will deal with novel and non-traditional stuff as well as the traditional.
Chris Rusbridge (c.rusbridge@gmail.com - 2011-02-18 10:23:07)

Firstly, apologies if what I say has been covered by others below, as I haven't back scanned through all the comments. I general, I do like the Web-centric approach, and this does fit well with the feedback we got from the IE Technical Review project. I won't re-emphasise that general point, but it's worth keeping in mind if my comments below seem disproportionately negative. I hope they don't. First up, I think it's really important to make it clear who this document is aimed at. I fear the terminology in the doc and its abstract nature could be very off-putting to your average archivist, cataloguer, manager for example. I'm presuming the reason why the community will be interested in engaging with the RDTF vision and following these guidelines will be expanded upon elsewhere? Unfortunately I can't seem to get access to the vision document today. The pay-off for the effort needs to be really strong. Maybe an out of scope comment, but I also think there will need to be significant effort 'on the ground' (i.e. preferably in person in the Record Office) to publicise the guidelines, the benefits, and help people implement them, if this effort is to have a decent chance of being successful. That ain't cheap. Final general point. Some worked examples, either embedded with this doc, or linked from it would help people get it. Even terms like 'attribute/field/property' can be off-putting in the abstract. If you can show real world examples, people can then relate it to their own data, and see what's involved in changing it the various approaches.
Adrian Stevenson (a.stevenson@ukoln.ac.uk - 2011-02-18 12:04:39)

Just because the metadata is born in an MLA environment, that doesn't that mean that it's consumption will be limited to users in that environment? Also, I wonder about how possible consumers can raise requests for additional sorts of metadata, such as usage data around a resource?
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:08:39)

+1 "An html document per record published Each metadata record published should have a simple, persistent, URI (i.e. the URL of the html document at least) Each html document representing a metadata record should include a structured representation of the metadata These documents/records should be crawlable by web bots/spiders such as the Googlebot"
John Robertson (kavubob@gmail.com - 2011-02-18 15:26:36)

It may be to do with your brief, or it may be for some other reason, but I am uncomfortable with language that suggests this is just about library, archive, museum metadata. I think there's a distinction between the broader vision of RDTF, which is about finding 'stuff', and some of the short-term pragmatic steps. One of those is getting better linkup between archive, library and museum metadata. So, if this document is about one particular short-term pragmatic step, say so, and explain, or link to, the broader context. If it's meant to reflect the whole vision, then it's too narrow a view.
Kevin Ashley (kevin@kevinashley.net - 2011-02-18 16:48:29)

Guiding principles

(See consultation document section: http://rdtfmetadata.jiscpress.org/guiding-principles/)

These guidelines have been developed such that they:

  1. support the RDTF Vision [1];
  2. are compatible with the outcomes of the JISC IE Technical Review meeting in London, Aug 2010 [2];
  3. are in line with Linked Data principles as far as possible [3];
  4. are compatible with the W3C Linked Open Data Star Scheme [4][5];
  5. are in line with Designing URI Sets for the UK Public Sector [6];
  6. take into account the Europeana Data Model [7] and ESE [8];
  7. are informed by mainstream web practice and search engine behaviour and are broadly in line with the notion of .making better websites. across the library, museum and archives sectors.

The guidelines are intended to help libraries, museums and archives expose existing metadata (and any new metadata that is created using existing practices) in ways that 1) supports the development of aggregator services and that 2) integrates well with the web of data. The intention is not to change existing cataloguing practice in libraries, museums and archives. Note that no assumptions have been made about the nature of any resulting aggregator services, which may include everything from a simple collaboration between two museums through to a full-blown national .cultural heritage. discovery service.

Comments (11)

Question from the authors: Note that these guidelines are compatible with the Europeana EDM in a rather loose way (i.e. in the sense that any Europeana metadata will be compliant with these guidelines, but not in the sense that all metadata that is compliant with these guidelines will conform to EDM). How aligned with Europeana do we want to be?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:43:18)

Well, it would be very helpful if the guidelines were as aligned with Europeana as possible. However, the community formats approach (which we originally chose with the ESE) is now obsolete with the EDM (ESE can be used as an application format of the EDM in the future): that limits the options for alignment.
Stefan Gradmann (stefan.gradmann@ibi.hu-berlin.de - 2011-02-06 13:05:45)

"take into account" is a bit vague ... would be good to be a bit more explicit about the relationship here?
Julie Allinson (julie.allinson@york.ac.uk - 2011-02-06 20:58:40)

I agree that "take into account" doesn't really say much but, in any event, I think that both the benefits and the limitations of this should be explored.
Dominic Oldman (doldman@britishmuseum.org - 2011-02-07 14:04:17)

Currently following the new draft ICOM URI standard
Dominic Oldman (doldman@britishmuseum.org - 2011-02-07 14:07:12)

I don't think we should be too committed to Europeana. Whilst I can see its value, and I have no doubt it should be taken into account, it is for digital cultural heritage, and therefore not currently useful for the vast majority of the holdings of many archive repositories.
Jane Stevenson (jane.stevenson@manchester.ac.uk - 2011-02-15 13:51:19)

I think the current alignment is more than ok for this document: I don't feel this is too much or too little. Plus, EDM is still in implementation phase, you can't really ask more than "taking into account".
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 09:53:50)

Seems right to me that (a) those who have committed to providing Europeana metadata shouldn't have to do any more work but (b) those who have no need to conform to the Europeana specifications don't have it imposed on them.
Phil Barker (phil.barker@hw.ac.uk - 2011-02-18 09:11:43)

'Take into account' is the right phrase. I would also hope that you can take into account experience from other projects that have tried to integrate metadata from more than one source, some using complex crosswalks and others using more laissez-faire approaches. For instance, the AIM25/M25lib linkup didn't expose metadata (at least, not any more than they exposed anyway) but did think a lot about what users expected as a result of a search and what you needed to do to deliver that.
Kevin Ashley (kevin@kevinashley.net - 2011-02-18 18:32:27)

'... to a full-blown national cultural heritage discovery service'. If that's the limit of the ambition, it's way too small. The Resources that RDTF considers are much more than cultural heritage ones.
Kevin Ashley (kevin@kevinashley.net - 2011-02-18 18:34:29)

I'd agree with Kevin, but the main point here is that these guidelines can scale up. We *should* be able to replace the term 'cultural heritage' with 'resource discovery' or 'scholarly information' service.
Joy Palmer (joy.palmer@manchester.ac.uk - 2011-02-19 08:41:59)

RDTF metadata guidelines

(See consultation document section: http://rdtfmetadata.jiscpress.org/rdtf-metadata-guidelines/)

RDTF metadata should be made openly available using one or more of three approaches, referred to below as the community formats approach, the RDF data approach and the Linked Data approach.

All three approaches suggest that all RDTF metadata be made available using non-proprietary formats and under an open licence. For the purposes of these guidelines, a non-proprietary format is considered to be a format for which there is a published specification, usually maintained by a standards organization, which can be used and implemented by anyone [9]. The meaning of .open. is the focus of other work being undertaken by the taskforce. For the time being open is assumed to mean that the metadata is free to use, reuse, and redistribute (subject only, at most, to the requirement to attribute and share-alike) by anyone [10].

This means that for all metadata made available according to these guidelines, software developers who are building aggregations of metadata will be able to:

Similarly, end-users will be able to:

Making metadata available using the community formats approach is reasonably low cost and easy to do for the data provider, whilst encouraging openness and re-use. However, software developers may have to work quite hard to aggregate metadata across multiple providers. Exposing RDF data and Linked Data brings increasing value, in terms of the potential services offered to end-users, and lower barriers to re-use for the developers of aggregated services. However, these approaches are likely to cost data providers more (in terms of time and effort in preparing the metadata for exposure on the web) and software developers more (in terms of handling the complexity of the metadata). See below for an outline of the key benefits and costs for each approach.

The following sections describe each of the approaches.

Comments (21)

Question from the authors: The 3 approaches offered here correspond broadly to the 3-star, 4-star and 5-star ratings of the W3C Linked Data Star Rating Scheme. Is this 3-pronged approach a help or a hindrance in terms of encouraging widespread uptake of these guidelines by libraries, museums and archives?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:32:09)

Why the note "though this may have to be done manually"?
Owen Stephens (owen@ostephens.com - 2011-02-04 15:36:04)

Dunno... it felt right at the time but now you mention it, I'm not sure I can justify it! :-)
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-04 15:37:41)

Do you want to include a statement on commercial reuse here? This turns out to be a major issue in Europeana.
Stefan Gradmann (stefan.gradmann@ibi.hu-berlin.de - 2011-02-06 13:09:20)

I'm not convinced that the distinction between "RDF data" and "Linked Data" is going to add practical value to implementers. You characterise "RDF data" is being about RDF dumps, and "Linked Data" as being about external links and dereferenceable URLs, but in practice Linked Data can be validly delivered using the "hash" convention, i.e. as an RDF dump, and RDF dumps can validly contain deferenceable URLs (as indeed can CSV, as noted by Owen). The fact that the two sections start with almost identical descriptions is maybe telling you something ...
Richard Light (richard@light.demon.co.uk - 2011-02-07 12:34:47)

I think commercial reuse is a crucial issue to address - but I'm assuming that this will be addressed elsewhere by the RDTF.
Jane Stevenson (jane.stevenson@manchester.ac.uk - 2011-02-15 13:53:26)

BL experimented with different Creative Commons licenses for our open BNB data. In practice any restriction (such as non-commercial re-use) made the metadata unattractive for re-use by any open data initiative, as they could not redistribute it under CC0.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 10:37:44)

We recommend that institutions are advised to ensure that they have the right to expose their metadata. BL maintains a metadata asset register so that we can quickly ascertain which metadata we are entitled release and which we cannot.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 10:40:04)

There is an implicit assumption made here that all service providers will want to work with RDF and Linked Data. What facts are there to back this up? Very few services are provided off the back of linked data at this point in time and we are unsure as to why linked data/RDF would have a lower barrier to re-use for developers than, say, simple DC (especially if they are proposing to use OAI-PMH as one of only mechanism for distribution).
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 11:01:47)

Although we say here "one or more of three approaches", we probably need to be clearer that the three approaches aren't mutually exclusive, e.g. for providers offering linked data, there may be value to consumers in also providing "community formats".
Pete Johnston (pete.johnston@eduserv.org.uk - 2011-02-18 11:55:43)

I'm very struck by the fact that no one has commented on this paragraph. It may be that people just simply agree and therefore don't feel the need to comment. I happen to think this is one of the most important paragraphs in the whole document. I'd like to see this expanded to become a core part of the guidelines: *This* is what we mean by 'open', this is what we are trying to facilitate, this is how we suggest it can be achieved. Developers are a key stakeholder of the RDTF vision with its emphasis on aggregation
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 12:05:30)

Agree - suggest removing that caveat :-)
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 12:06:15)

Even for community formats, I suspect 'easy' isn't all that easy unless it's funded or it's very clear (preferably proven) that making the effort will bring in more users, bring in further funding etc.
Adrian Stevenson (a.stevenson@ukoln.ac.uk - 2011-02-18 12:15:02)

Would it be helpful to have an explicit statement that distinguishes between the openness of the metadata, and the licensing around the resource that the metadata describes? For the purposes of RDTF, to what extent should the (open) metadata necessarily contain information about the usage restrictions around the resource described.
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:15:25)

In practical terms, I think confusion about the extent to which, or way in which, attribution/share-alike licensed data has to be attributed can act as a barrier to reuse, particularly in an institutional setting with resident rights police...
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:17:40)

Using the RDF approach also makes it easier for the *publishing* institution to use the same API/query endpoint for data aggregated from across separate collections maintained by that institution, and can accommodate separate data models, if required, across those collections?
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:21:32)

Thinks... rather than RDF vs relational db, is the issue nosql vs relational databases and that some of publisher benefits we ascribe to RDF would also apply equally to other NoSQL approaches?
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:24:46)

I've said it elsewhere, but it bears repeating. I think there's two concerns about the focus on 'open' here. One is that open metadata simply isn't a worry for many of those outside the library world, so harping on about it makes people worry about things that they weren't even worried about before. (They are worried about rights to the resources that the metadata describes, though, and the distinction must be clearer than it is.) The other concern is that many would be concerned about re-uses of metadata which lost links back to the originators - their motivations typically involve drawing people into their content or services. Open licences that only require attribution aren't strong enough - they need to be variants (which I understand do exist) which place requirements on how the attribution is made. That is, I want you to link to my site/my record if you expose my metadata record, not simply provide a plain-text acknowledgement.
Kevin Ashley (kevin@kevinashley.net - 2011-02-18 18:42:52)

The mention of 'community formats' here precedes its definition. This is confusing. Some reordering is required.
Kevin Ashley (kevin@kevinashley.net - 2011-02-18 18:45:03)

Kevin made this point very well at the RDTF AG meeting. The RDTF project has made appropriate assumptions that some of the 'quick wins' are to be gained in the library context first, as a more 'mature' domain with a larger and more active development community. But through this focus we run the risk of transposing assumptions about metadata licensing concerns onto the other domains -- and even creating anxiety where none existed before. We can't ignore the quick wins to be had in the A & M sector where licensing is far less a barrier.
joy palmer (joy.palmer@manchester.ac.uk - 2011-02-19 08:46:23)

Just a point to note, we're noticing that organisations like OCLC are thinking less in terms of licensing or claiming copyright over individual records, but more in terms of the entire aggregation. The value comes in the aggregation as a whole, not individual records. This is a very interesting distinction -- may of us might be quite happen to open up records and share them individually or in batches; but what happens when that demand is at scale? (For example -- what do we do when a vendor approaches Copac and says 'can we have the lot please?') There are larger issues here than just technical concerns or worries over duplication -- it comes down to where the value lies and who 'owns' it. This is picked up by the OBDG, to a degree. And again -- this is perhaps more a concern for bib data than A&M metadata
joy palmer (joy.palmer@manchester.ac.uk - 2011-02-19 08:53:56)

The community formats approach

(See consultation document section: http://rdtfmetadata.jiscpress.org/the-community-formats-approach/)

Guidance

RDTF metadata that is exposed using the community formats approach must be made available under an open licence, using a non-proprietary file format (such as one of those listed in the examples section below).

The metadata must be made available using simple HTTP GET requests or the OAI-PMH [11].

Where HTTP is used, one or more sitemaps (conforming to the Sitemap protocol [12]) should also be made available, listing the available files. The sitemaps should be listed in a robots.txt file. Sitemaps should use the following RDTF extension to differentiate RDTF files from other content:

<?xml version='1.0' encoding='UTF-8'?>

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9

http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:rdtf="http://purl.org/rdtf"> <!-- namespace extension -->

<url>

<rdtf:loc>http://example.org/rdtf/catalogrecords.marc</rdtf:loc>

...

</url>

</urlset>

Where HTTP is used, GZip compression [13] may be used to reduce file sizes.

Where possible, all significant resources associated with the collection of interest should be described using separate records. For example, there should be separate records describing a physical museum artifact and any digital surrogates of that artifact (e.g. images). For the purposes of these guidelines, a significant resource is one that is likely to be of interest to end-users, differs from other resources in terms of format or other attributes, and may have different ownership and/or usage restrictions to other resources. Note that .resources. (as used here) may include conceptual entities (e.g. a FRBR .work.), people and organisations as well as both physical and digital objects. Where metadata is encoded using CSV [14] (or similar), a record corresponds to a row in the table. Where metadata is encoded using the OAI-PMH, a record corresponds to an OAI-PMH record.

All metadata records should contain an attribute/field/property that can be used as a label or title for the resource. Where metadata is encoded using CSV (or similar), this label should appear in a column called .label. or .title.. Where metadata is encoded using CSV (or similar), any identifier for the resource (e.g. an ISBN) should appear in a column called .identifier.. In addition, for any metadata encoded using CSV, the first row should contain the column headings, there should be no use of footnotes and all rows should be of the same length.

Metadata about the resources associated with multiple collections may be made available. In general, where HTTP is used, there should be one file per collection; where the OAI-PMH is used, collections should be partitioned into separate repositories or separate sets within a single repository. Note that .collection. (as used here) simply means any grouping of resources for curatorial, discovery or some other purpose.

Examples

For libraries, typical file format examples include library catalogue records encoded using MARC21 [15] or MODS [16], BibTeX [17], RIS [18], the CrossRef output schema [19], Dublin Core records encoded using XML[20], the Europeana Semantic Elements (ESE) format and formats based on JSON [21] Atom [22] or RSS [23].

For museums, typical file format examples include museum catalogue records encoded using SPECTRUM [24], database tables dumped as CSV files, Dublin Core records encoded using XML, the ESE format and formats based on Atom or RSS.

For archives, typical file format examples include archival descriptions encoded using EAD [25], database tables dumped as CSV files, Dublin Core records encoded using XML, the ESE format and formats based on Atom or RSS.

Benefits

As an end-user:

As a provider:

Costs/issues

As an end-user:

As an aggregator:

Comments (66)

Question from the authors: Have we got the list of example community formats right? Are things missing?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 14:21:41)

Question from the authors: Is the .provide a .label. or .title. and .identifier. column. stuff useful? What about the other CSV guidance? Should we say anything else?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 14:22:10)

Question from the authors: Is the guidance to include a column called .identifier. sufficient? Should we also (or instead) ask for a column called .uri.?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 14:22:41)

Question from the authors: This is currently the only reference to JSON, Atom and RSS. Are JSON and Atom sufficiently important that they should get a mention? If so, how?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 14:32:33)

Question from the authors: The use of an RDTF extension here is to prevent accidental harvesting of, say, large files of MARC records by Google's crawlers. Does this make sense?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 14:37:04)

You'd likely want to include LIDO in this list. http://bit.ly/gAkco0 - it's the backbone of Athena http://www.athenaeurope.org/.
Gunter Waibel (waibelg@si.edu - 2011-02-03 18:00:19)

I'd prefer an RSS or Atom feed to OAI-PMH. They certainly have broader acceptance outside the library community.
Ralph LeVan (levan@oclc.org - 2011-02-03 20:31:24)

Should you add ISAD(G), ISAAR(CPF) and other standards from the International Council on Archives? These do not give formal file formats, but they do list the elements and structure that should be included.
Leonard Will (L.Will@willpowerinfo.co.uk - 2011-02-04 12:15:25)

Is this mixing two types of mechanism?
Owen Stephens (owen@ostephens.com - 2011-02-04 12:22:33)

It feels wrong to be doing something to deliberately hide stuff from Google - isn't this up to Google?
Owen Stephens (owen@ostephens.com - 2011-02-04 12:25:06)

It doesn't seem unreasonable - but what is it for? Just as an example, if you had a MARC record for a letter it might not have a 'title' (in a 2XX field), although it would have something in the record somewhere that could be used as a substitute for a title in an index or human readable display. To think about CSV specifically if we get a CSV file of records describing letters then it would be pointless if the 'label' or 'title' column contained just the word 'Correspondence' - although within these guidelines. What I'm getting at is the perceived purpose of label/title field. Why not just say 'should include at least one field which clearly describes the item in human readable format' or some such?
Owen Stephens (owen@ostephens.com - 2011-02-04 12:59:50)

You say 'any identifier for the record' and then 'e.g. and ISBN'. Risking getting into semantics... but of course an ISBN isn't an identifier for the record, it's an identifier for something else (probably something close to the manifestation in FRBR terms) So - is the identifier expected here to identify the resource described, the record describing the resource?
Owen Stephens (owen@ostephens.com - 2011-02-04 13:03:21)

Identifiers such as ISBN are useful - wouldn't want to see these excluded. However, I do think the opportunity to ask for a URI if one is available is too good to miss. My own preference is that 'community formats with URIs' is an additional option alongside 'community format' (in the same way you have RDF vs Linked Data)
Owen Stephens (owen@ostephens.com - 2011-02-04 13:06:59)

JSON sufficiently important to section of the developer community I think
Owen Stephens (owen@ostephens.com - 2011-02-04 13:09:25)

It doesn.t seem unreasonable to ask for label/title . but what is it for? Just as an example, if you had a MARC record for a letter it might not have a .title. (in a 2XX field), although it would have something in the record somewhere that could be used as a substitute for a title in an index or human readable display. To think about CSV specifically if we get a CSV file of records describing letters then it would be pointless if the .label. or .title. column contained just the word .Correspondence. . although within these guidelines. What I.m getting at is the perceived purpose of label/title field. Why not just say .should include at least one field which clearly describes the item in human readable format. or some such?
Owen Stephens (owen@ostephens.com - 2011-02-04 13:13:54)

You say .any identifier for the record. and then .e.g. and ISBN.. Risking getting into semantics. but of course an ISBN isn.t an identifier for the record, it.s an identifier for something else (probably something close to the manifestation in FRBR terms) So . is the identifier expected here to identify the resource described, the record describing the resource?
Owen Stephens (owen@ostephens.com - 2011-02-04 13:14:22)

Identifiers such as ISBN are useful . wouldn.t want to see these excluded. However, I do think the opportunity to ask for a URI if one is available is too good to miss. My own preference is that .community formats with URIs. is an additional option alongside .community format. (in the same way you have RDF vs Linked Data)
Owen Stephens (owen@ostephens.com - 2011-02-04 13:14:42)

JSON sufficiently important to section of the developer community I think...
Owen Stephens (owen@ostephens.com - 2011-02-04 13:15:31)

Perhaps the main guidance that needs to be given for those providing CSV or other generic formats is to have accompanying documentation that clearly documents the structure and nature of the data. Interpreting formats like JSON and CSV without this will be impossible
Owen Stephens (owen@ostephens.com - 2011-02-04 13:18:56)

Owen, Re: identifier for the record - this is a simple typo - should be identifier for the resource - will correct in the text to prevent further confusion. Thanks.
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-04 13:48:17)

In my view the provenance context is easiest to preserve in this approach: one statement per record is sufficient. This gets really tricky with aggregations of RDF triples.
Stefan Gradmann (stefan.gradmann@ibi.hu-berlin.de - 2011-02-06 13:15:59)

not sure it's entirely useful to divide into three paragraphs, some formats crossover and there is some duplication here, maybe better to have a list of examples?
Julie Allinson (julie.allinson@york.ac.uk - 2011-02-06 20:33:13)

... oh, and might be worth mentioning VRA, widely used for art collections.
Julie Allinson (julie.allinson@york.ac.uk - 2011-02-06 20:50:22)

I wouldn't call OAI-PMH an "encoding": it's a delivery mechanism.
Richard Light (richard@light.demon.co.uk - 2011-02-07 12:23:33)

In response to Ralph: That would have to be Atom with several Atom extensions to achieve a harvesting framework with a functionality that is similar to that of OAI-PMH. As a matter of fact, several Atom extensions that could make this possible have been proposed. And, with some colleagues I have briefly investigated the possibility of defining a profile of Atom that provides PMH-like functionality. Given there is increased interest in that direction, an effort to this account may be launched some time soon.
Herbert Van de Sompel (hvdsomp@gmail.com - 2011-02-07 19:52:43)

Despite the "Where possible", this might be off-putting to libraries with legacy records that embed a print original and digital surrogates in a single record, and confuse libraries adopting FRBR, where a "record" typically consists of linked records for Work, Expression, Manifestation and Item. Perhaps a specific note about legacy metadata records would clarify.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 09:46:51)

Some libraries in the UK use UNIMARC; some are still using the obsolete UKMARC format. And MADS goes with MODS. To list or not to list ... is a perennial problem, but standards anxiety may lead to to "typical" being overlooked.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 09:57:18)

'Collection' for archives generally means one grouping of items created/accumulated by the same person/organisation, which may easily consist of several thousand items. I think the guidelines may have to be more explicit about what constitutes a 'significant resource' for archives. Don't want to disappear into semantics too much, but I think its a fundamental difference with museums and libraries.
Jane Stevenson (jane.stevenson@manchester.ac.uk - 2011-02-15 13:59:01)

I agree with Owen that a clear description is what is needed. I'm not sure how this will work if you have an archive that includes a series of letters - they may have the title 'letters' because the collection has the full descriptive title. You'd need both.
Jane Stevenson (jane.stevenson@manchester.ac.uk - 2011-02-15 14:05:03)

I can see what Leonard means, and thought of this too. But ISAD(G) is not a file format - EAD complies with ISAD(G). You could add EAC-CPF though, as that's the format that complies with ISAAR(CPF).
Jane Stevenson (jane.stevenson@manchester.ac.uk - 2011-02-15 14:07:04)

What do you see as the purpose of the sitemap? Just wondering if we should be recommending use of any of the 'optional' aspects of the sitemap spec? Thinking specifically of 'lastmod' (and possibly 'changefreq')
Owen Stephens (owen@ostephens.com - 2011-02-15 15:52:36)

After some twitter discussion around this with Jane Stevenson I think there is a danger that the very basic nature of what is required here is obscured by other detail that may not apply. My reading is: The 'Community Format' approach would allow a (MARC/EAD/other sector specific format) dump that could be posted to a website. For those taking this approach the only other requirement would be for a sitemap file. I think this might come as a relief to many (especially, but not only, small organisations). It feels like some of the other issues (e.g. raised in 7 & 8) could be ignored by those following this approach. If I've read this correctly it may be worth spelling out - make this entry point sound as simple as possible.
Owen Stephens (owen@ostephens.com - 2011-02-15 16:12:36)

.Where possible all significant resources.should be described using separate records.. This is aspirational from library perspective. In FRBR terms, most MARC records are composites of Work, Expression and Manifestation. Some records may aggregate description of the original work with details of a surrogate manifestation, Legacy bibliographic records may describe multiple resources, for example: manifestations issued with different bindings. It is unlikely that the modeling necessary to achieve a 1:1 relationship between records and resources could be justified in relation to the community format approach. It is an issue that we will give much more thought to in future as we develop our open data into open linked data.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 11:07:35)

While no longer supported by BL, a significant number of libaries in UK and Ireland still use UKMARC.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 11:09:51)

A tabulated display might be clearer and would highlight those standards which are used across GLAM.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 11:11:24)

CSV could also be of interest to libraries. Not all systems have the capability to output MARC. A csv output may also be more accessible to potential users than MARC.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 11:13:00)

Does this mean that duplication of metadata descriptions is expected/acceptable?
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 11:24:43)

Should there be a recommendation to explain the scope of the metadata. Although MARC is an open format there is provision for locally defined metadata. Suggest a recommendation that if locally defined metadata is made available, definitions are provided to save aggregator effort - see para 23.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 11:27:04)

I don.t understand why provenance is thought to be such an issue for this option. The integrity of the metadata seems least at risk in its native format
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 11:28:15)

It's true that complete OAI-PMH duplication would require significant enhancement of Atom feeds. But a simple feed of records can be accomplished with Atom as it is now.
Ralph (levan@oclc.org - 2011-02-16 14:52:25)

You're assuming we need the extra functionality of PMH over web feeds. Largely, nobody does any more, which is why it isn't used outside its original community, and why use is shrinking rather than growing.
Scott Wilson (scott.bradley.wilson@gmail.com - 2011-02-17 09:22:57)

Wonder if its worth mooting as a default format - you can expose MARC with OAI-PMH for the six people in the world who give a toss, but you should also provide RSS with basic details for wider impact
Scott Wilson (scott.bradley.wilson@gmail.com - 2011-02-17 09:26:27)

There's nothing at http://purl.org/rdtf/. Isn't that sending a conflicting message with the "should use the following RDTF extension"?
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 10:06:24)

Btw it may be worth mentioning that ESE re-uses Dublin Core alot.
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 10:21:09)

As the others, I think the 'separate records' principle here is good per se, but that won't make things easier with the data. Perhaps that should be acknowledged in the costs...
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 10:24:42)

Re. HTTP GET, might it be possible to say something like ". HTTP GET requests, such as when included in standard web pages", to give a sense of what this might mean. If the guidelines only aimed at technical implementors then maybe this isn't a problem, but otherwise it could be.
Adrian Stevenson (a.stevenson@ukoln.ac.uk - 2011-02-18 12:22:48)

Must this be the case? It's bound to put people off
Adrian Stevenson (a.stevenson@ukoln.ac.uk - 2011-02-18 12:24:23)

There are lots of things that could be weakly classed as an "open license" that conflict with other "open licenses". SHould the statement be phrased in terms of " licensed with an open license that is compatible with X, such as Y, Z".
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:28:38)

So HTTP POST would not be okay?
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:29:29)

Couldn't this also be recommended, though not required, as a nudge that might also lead to increased use of compression across a site? (Assuming that we think compression is a Good Thing?!;-)
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:32:09)

I'm not sure the fact that something is easy to do is really one of the benefits of doing it. It sort of feels like even community formats isn't all that easy. It could be just the mention of lots of different formats starts to seem off-putting and overwhelming. I'd almost want to keep GZip, OAI-PMH etc. out of the 'first impression', maybe by re-structuring the document somehow, so people know that getting CSVs out there is good enough.
Adrian Stevenson (a.stevenson@ukoln.ac.uk - 2011-02-18 12:32:56)

Another example would be one record per item in an Atom/GData feed.
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:33:13)

I think the newly published Dataset Publishing Language (DSPL http://code.google.com/apis/publicdata/docs/developer_guide.html ) adds a lot to the discussion of this para. The DSPL allows for the publication of one or CSV files supported by an XML file containing metadata that describes the data in the CSV files. The DSPL can be regarded in a couple of ways: 1) as a candidate representation; 2) at a more abstract level as an implementation of an idea about how best to represent and model data, with an associated vocabulary for describing that idea. So for example, terms are described for identifying data as "dimensions" (categorical) or as "metrics" (non-categorical, time-varying, numeric values), with a top-tip/handy rule of thumb recommendation that "your dataset will be more flexible if you keep metrics to a minimum, and instead create meaningful dimensions".
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:39:52)

"formats based on Atom or RSS" Add? 'such as GData [ http://code.google.com/apis/gdata/ ] or OData [ http://www.odata.org/ ]'
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:42:27)

Given all the discussion going about mechanisms for sharing metadata/ finding resources in the scholarly works community and other communities (such as Open Educational Resources) which is using RSS/ Atom quite a lot for this purpose I'm a bit surprised not to see it in this list. Yes there are limits to the use of RSS/Atom in this way but it's not stopping several discovery services using it.
John Robertson (kavubob@gmail.com - 2011-02-18 14:45:00)

Agreed, how about 'made available' ?
John Robertson (kavubob@gmail.com - 2011-02-18 14:57:35)

I still feel slightly unsure about this. The idea that a sitemap might list available 'files' doesn't seem to quite fit with the demands of the next but one paragraph (7) which calls for a fine granularity of record. We're going to quite quickly get to the point of unanticipated, just-in-time, serialisation of records based on dynamic criteria for some collections aren't we? Not sure how a sitemap works in this scenario
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:07:28)

I hope this comment is appearing in the right place - I have deliberately used the 'reply' function despite being warned not to... I agree with Owen - I would like to see a 'community formats with unique URLs as identifiers option'. In fact this would be my preferred path frankly. And yes, I think I do mean 'URL'.... ;-)
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:11:22)

Many systems will export CSV with an internal ID - typically in the first column, often an integer (e.g. 'rownum', 'recID'). Is this something we might want to actively discourage from being included in 'RDTF CSVs'? As in actually say, "please mint and include a URL *instead* of any internal, system-level ID".
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:14:11)

Re. JSON and Atom: If this document is aspirational, then yes - I think these need to be mentioned. JSON is becoming a default already for certain important domains of functionality - for example for developers building mobile interfaces.
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:17:17)

In this section (and benefits) there seems to be a mixture of end users (e.g. academic wanting resource and cataloguer taking record for local use) perhaps being more specific would be helpful. As it stands I'm unclear why citing a resource would be problematic - if I want to cite the resource I'm going to go to the resource and cite it (rather than the aggregated record) [or are we using cite differently?].
John Robertson (kavubob@gmail.com - 2011-02-18 15:19:17)

I agree with both previous comments
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:19:52)

I agree with the assertions in these paragraphs - but I think it could be pointed out that much of the programmatic join up on the Web uses this approach and works to some degree. It's not necessarily a dead end. The development of tools which are "both format- and provider-specific" could, in the end, be the most pragmatic approach, as such tools become easier to create and adapt.
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:23:21)

I agree that there is a lot of detail in this section that starts to overwhelm - see my comment elsewhere. The only thing I'd say is I'm not convinced that in GLAM csv is going to be the 'go to' format it might be in other areas. If you are a library, you'll have a MARC export. If you are an archive, EAD etc.
Owen Stephens (owen@ostephens.com - 2011-02-18 15:47:54)

Yes I think this should say something like 'ISAD(G) compatible archival descriptions encoded in EAD' - this may help constrain some of the wilder excesses possible with EAD as well! While I think this would be the minimum here we would want to encourage 'related' EAC-CPF files (compatible with ISAAR(CPF) as well!
Bill Stockting (william.stockting@bl.uk - 2011-02-18 17:07:15)

As a non-technical person - the need for the RDTF extension strikes an odd note - suggesting a lack of openness!
Bill Stockting (william.stockting@bl.uk - 2011-02-18 17:11:21)

The RDF data approach

(See consultation document section: http://rdtfmetadata.jiscpress.org/the-rdf-data-approach/)

Guidance

RDTF metadata that is exposed using the RDF data approach must be made available under an open licence as RDF datasets [26].

The RDF datasets must be made available using simple HTTP GET requests as one or more RDF dumps (e.g. files containing RDF/XML [27], N-Triples [28], N-Quads [29] or RDF/JSON [30]).

GZip compression may be used to reduce the file size of the RDF dumps.

The location of all RDF dumps must be disclosed in accordance with the Semantic Web Crawling Sitemap Extension [31].

Metadata about multiple collections may be made available. If so, the dataset corresponding to each collection should be made available using separate RDF dumps.

The dataset in each RDF dump should be described using the Vocabulary of Interlinked Datasets (VoID) [32]. VoID files should be made available over HTTP and should be listed in the sitemap(s) above.

All significant resources (as defined above) associated with the collection of interest must be assigned a unique URI. Such URIs should be .http. URIs.

Examples

For libraries, the use of accepted open ontologies (FOAF [33], BIBO [34], DC [35], ORE [36], etc) and other uses of RDF data modeled according to FRBR [37] are acceptable. Examples include the Europeana Data Model (EDM) and the British Library Catalogue Dataset in RDF/XML [38].

For museums, the use of RDF data modeled according to the CIDOC CRM [39] is acceptable.

For archives, the use of RDF data modeled according to the principles underpinning EAD is acceptable.

Benefits

As an end-user:

As an aggregator:

Costs/issues

As a provider:

Comments (28)

Question from the authors: We currently allow for RDF to be made available in any format (using RDF/XML, N-Triples, N-Quads and RDF/JSON as examples). Should we be more prescriptive?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:35:23)

This feels like it is baking in a uniform approach to 'library' data which I think is both worrying and ignores one of the key advantages we might see (for library data) in adopting RDF. While I agree for printed monographs the ontologies mentioned are likely to be appropriate, for other material types (e.g. recorded music) there are going to be much better ontologies available. I'm not sure how to express this in the guidelines without getting bogged down in lots of detail, but I think it is important :)
Owen Stephens (owen@ostephens.com - 2011-02-04 12:16:52)

I suggest to add LIDO here.
Stefan Gradmann (stefan.gradmann@ibi.hu-berlin.de - 2011-02-06 13:20:05)

As I read this, it sounds like FRBR is being required for library data. There are major problems with this: 1) even within libraries we do not have agreement on the meaning of the FRBR entities 2) FRBR is about to undergo major changes now that FRAD and FRSAD have been issued 3) the way that FRBR defines itself it is unlikely that ontologies like FOAF, BIBO, etc. could be used to create FRBR-valid data. I think that FRBR at best should be optional, and we need to recognize that as currently defined it does not mix well with other ontologies.
Karen Coyle (kcoyle@kcoyle.net - 2011-02-06 18:15:15)

I can't really see a reason to limit this (the result would be that some RDF serialisations became just a 'community format' approach in the current guidelines, which wouldn't make much sense to me)
Owen Stephens (owen@ostephens.com - 2011-02-07 12:29:00)

In the cultural heritage sector the need to consume as well as provide is very important. Research is a collaborative and sharing activity. As such although it may be slightly easier to not use a framework the long term costs of not doing so will at some point hit the provider which could otherwise have could been substantially reduced. For people trying to make use of the data robustly, a framework that helps with harmonisation must have positive cost benefits and providers should bear this in mind. This might also be very important for non-technical people using general tools as well as their own staff and researchers. The use of the CRM in the ResearchSpace project www.researchspace.org is to ensure an initial level of consistency and harmonisation to reduce costs for projects of trying to use the data effectively once it becomes available.
Dominic Oldman (doldman@britishmuseum.org - 2011-02-07 14:22:02)

No. I agree with Owen.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 09:59:01)

FRBR will undergo some change when it is consolidated with FRAD and FRSAD, but at the moment there is no reason to think it will be major.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:06:23)

ISBD should be included. It, and other ontologies bar FRBR, take the monolithic record approach; FRBR splits the "record" into 3 or 4 functional components. The distinction is important; it is why FRBR and the other ontologies appear to be incompatible.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:09:45)

It would be worth repeating the note about the definition of collection in all three approaches. Also, "should" could be usefully reinforced with a note about being pragmatic about one collection-one file/dump; e.g. not having to split a multi-collection file, or combining collections with small numbers of resources.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:29:27)

Again, ISBD needs to be included.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:30:07)

It would be worth mentioning that CIDOC CRM includes a FRBR extension, FRBRoo.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:32:16)

For libraries you talk about open ontologies, such as FOAF and DC, and I would have thought these apply equally to archives, and the use of widely used ontologies should be encouraged for all MLAs (museums, libraries, archives). I think it might be worth referencing ISAD(G) here because the data model is really more about what you are describing not how its encoded.
Jane Stevenson (jane.stevenson@manchester.ac.uk - 2011-02-15 14:25:49)

"...unique "URI or "...unique and persistent URI"?
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 14:26:46)

The benfit will be dependent on the persistence of the URIs.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 14:27:53)

This is quite a big assumption. I think the caveats may need to be given more weight. There is work going on in the context of CIDOC CRM to align the models, but even if this succeeds legacy issues and different implementation will remain.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 14:34:15)

Costs and issues for aggregators can't be ignored. Won't aggreators have a role in resolving duplication; or resolution of conflicts between literals and resources?
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 14:38:43)

Modelling is costly. Assigning URIs is non-trivial. Maintaining URIs persistently will be costly.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 14:48:40)

You could perhaps mention Turtle (http://www.w3.org/TeamSubmission/turtle/), which is getting more popular than N-Triples, and will be standardized as part of the recently launched RDF2 working group. Btw perhaps it's useful to ack that N-Quads and RDF/JSON are not standards yet, neither (but something will be done on this as well in the RDF group).
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 10:31:36)

Is this a widely used tool, outside the DERI context (Sindice)? This is a part of the SW stack I'm less expert on, but in case my intuition is right, perhaps the "must" could be downplayed a bit. Unless of course you've played with it before and are happy with it...
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 10:43:07)

What does "to the principles underpinning EAD" mean for implementers, concretely?
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 10:49:06)

I guess a benefit to the provider here would help.
Adrian Stevenson (a.stevenson@ukoln.ac.uk - 2011-02-18 12:36:51)

Or advocate this? "Publishers are encouraged to use GZip compression to reduce the file size of the data dumps". Also, you say RDF here, so this implies use of RDF, rather than saying eg "howseover represented/formatted"
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:44:50)

Doh! re: "RDF only comment" I forgot the heading of this section... {Meta comment - would be handy to be able to edit comments...Maybe I should have logged in...}
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:46:09)

erm, I'd hope that this section could be strengthened. Being able to cite and bookmark a resource is not really a compelling benefit of rdf. It is a good reason that end users might want resources to have their own web page or otherwise resolvable uri but you don't need rdf for that... How about enabling personal digital collections (by reference), automating/ one click embedding of a citation for an object, and more generally richer mashups which allow the end user to consume and discover and interact with content in new ways.
John Robertson and Lorna Campbell (kavubob@gmail.com - 2011-02-18 13:34:27)

No advantage in being prescriptive about serialisation formats here - at least not at this stage in the development in linked data. The hurdles are significant enough already!
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:25:37)

I agree that this is not an advantage of the rdf approach - and i and others have commented in the community format approach section that we would like to see this benefit delivered by all three approaches. Or, to put it another way, persistent identifiers and actionable URIs are more important and fundamental than the rdf model.
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:31:18)

I don't think that the phrase 'principles underpinning EAD' is very clear. I think I know what you mean but could be wrong and most of the UK archival community won't understand what is meant! Is this just about 'hierarchy'? If so I agree with Jane that the reference here is to the multi-level description rules in ISAD(G). I think though that this is too limited. It needs to relate to the wider data model taking into account CPF entites (as do FRBR and CIDOC). In which case the reference is to ISAD(G) and ISAAR(CPF) as endcoded in EAD/EAC-CPF. I agree with Jane re the ontologies as well - don't some at least cover all 3 domains?
Bill Stockting (william.stockting@bl.uk - 2011-02-18 16:59:19)

The Linked Data approach

(See consultation document section: http://rdtfmetadata.jiscpress.org/the-linked-data-approach/)

Guidance

RDTF metadata that is exposed using the Linked Data approach must be made available under an open licence as RDF datasets.

The RDF datasets must be made available using simple HTTP GET requests as one of more RDF dumps (e.g. files containing RDF/XML, N-Triples, N-Quads or RDF/JSON).

GZip compression may be used to reduce the file size of the RDF dumps.

The location of all RDF dumps must be disclosed in accordance with the Semantic Web Crawling Sitemap Extension.

Metadata about multiple collections may be made available. If so, the dataset corresponding to each collection should be made available using separate RDF dumps.

The dataset in each RDF dump must include links to other (external) RDF datasets, for example those describing people, organisations, topics or places.

The dataset in each RDF dump must be described using the Vocabulary of Interlinked Datasets (VoID). VoID files must be made available over HTTP and must be listed in the sitemap(s) above.

All significant resources associated with the collection of interest must be assigned a unique .http. URI.

All URIs must dereference to a human-readable HTML description and an RDF description (e.g. using RDF/XML, N-Triples, RDF/JSON or RDFa [40]) of the resource when the URI, either by using one of the patterns described in Cool URIs for the Semantic Web [41] or by combining the HTML and RDF descriptions using embedded RDFa.

Examples

For libraries, the use of RDF data modeled according to FRBR and including links to other RDF sources (such as people, organisations, topics and places) is acceptable. The JISC OpenBib project [42] provides an example of this.

For museums, the use of RDF data modeled according to the CIDOC CRM and including links to other RDF sources (such as people, organisations, topics and places) is acceptable. The CLAROS project [43] provides an example of this.

For archives, the use of RDF data modeled according to EAD and including links to other RDF sources (such as people, organisations, topics and places) is acceptable. The JISC LOCAH project [44] provides an example of this.

Benefits

As an end-user:

  1. it will be possible to cite and bookmark resources of interest using their URIs, knowing that dereferencing the URI will offer a description or a representation of the resource;
  2. browsing the resulting .web of data. will encourage the discovery of new resources in other collections.

As an aggregator:

Costs/issues

As a provider:

Comments (19)

Question from the authors: Do we want to be more prescriptive here about how URIs should be dereferenced?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:47:27)

Question from the authors: Are we happy allowing both RDF/XML and RDFa to be served from resource URIs?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:48:30)

Question from the authors: Do we want to recommend particular RDF datasets/vocabularies as the target for external links?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:49:38)

I suggest to add the EDM as an example spanning over the three communities here.
Stefan Gradmann (stefan.gradmann@ibi.hu-berlin.de - 2011-02-06 13:22:19)

Is there a reason why this is stated as "must"? What if the data itself does not require those links? Or the institution isn't interested in using them (e.g. when using simple Dublin Core)?
Karen Coyle (kcoyle@kcoyle.net - 2011-02-06 18:17:00)

I think this requirement as a 'must' is justifiable in the guidelines as they currently stand. All it means is that if you don't do this you are adopting the 'RDF data approach' not the 'Linked Data' approach - which would seem fair if you don't incorporate links to external resources? (although I can see the terminology creating objections - afterall with the RDF approach you are still making your data 'linkable' even if you aren't linking out)
Owen Stephens (owen@ostephens.com - 2011-02-07 12:33:19)

I think examples rather than anything stronger? Dbpedia obvious leading contender, and id.loc.gov for library data - but all this bound to evolve and change - not least if institutions start following these guidelines!
Owen Stephens (owen@ostephens.com - 2011-02-07 12:36:08)

Obvious, but might need stating: Must include links to RDF datasets that are intrinsic to the model/ontology/format being used; e.g. RDA and ISBD vocabularies for content and carrier types.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:36:11)

Probably not. There is wide variation in what is returned from dereferencing; is there an agreed standard or recommendation for good practice?
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:39:17)

This doesn't use the must/should terminology. The implication is that RDF data modeled on anything other than FRBR is unacceptable, which is unacceptable. ISBD needs to be included. And there will eventually be MODS/MADS in RDF ...
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:45:07)

I think it would be worth mentioning cross-domain links here, and elsewhere. "Other RDF sources" include (for museums) books and other library resources, and some archive resources; (for libraries) museum objects and archive resources; (for archives) library resources.
Gordon Dunsire (gordon@gordondunsire.com - 2011-02-08 10:49:33)

Assuming we all use the same URI, or that appropriate relationships are established between URIs representing the same object.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 14:06:18)

It's still work-in-progress, but if you're searching for examples in the library domain (may work as well in a wider LAM context!) there's http://www.w3.org/2005/Incubator/lld/wiki/Vocabularies#Reference_value_vocabularies
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 10:57:18)

Yes, there's really nothing against RDF served by both channels I think. Of course they should be aligned, ideally.
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 10:59:03)

+1. much more flexibility should be allowed, even though this part is just about examples. In fact I really don't see why the paragraph here should be different from the corresponding paragraph in the "RDF data" section!
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 11:55:33)

In terms of: a) testing; b) providing useful usage examples, would it also make sense to *require* that publishers give an example showing: 1) how to call their API; 2) show how their data can be enriched with data from a third party via the LD approach; 3) show how third party data can be enriched with data from the first party/publisher via the LD approach;
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:49:32)

FRBRised records for libraries and museums may be setting the bar rather high, at this time at least. And the FRBR apporach doesn't represent a wholly coherent and consistent approach to structuring metadata records (i.e., it is possible to represent some relationships both horizontally and vertically).
Philip Hunter (philip.hunter@ed.ac.uk - 2011-02-18 14:18:30)

I can certainly see a role for indicating preferred vocabularies for example. Preferred datasets in some cases, although cautiously perhaps....
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:36:13)

Comment posted on behalf of Markus Enders, "I think it is difficult for anybody who is not involved in the LinkedData development to get a good overview over the various datasets and vocabularies. If the guidelines could provide a list, this would be helpful. But I wouldn't consider this list to be a recommendation."
Alan Danskin (alan.danskin@bl.uk - 2011-02-22 16:40:18)

Designing 'http' URIs

(See consultation document section: http://rdtfmetadata.jiscpress.org/designing-http-uris/)

For metadata that is exposed using the RDF data or Linked Data approaches, all significant resources associated with the collection of interest must be assigned a unique URI. Where the Linked Data approach is used, such URIs must be .http. URIs.

When assigned, all .http. URIs should conform to the Designing URI Sets for the UK Public Sector guidelines.

Comments (5)

Question from the authors: Is the reference to the data.gov.uk URI patterns sufficient here? Do we want to refine or modify them in any way for our purposes, e.g. by recommending the use of institutional sub-domain names starting with 'data.'?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:46:01)

My colleague Markus Enders comments, "I think there may be issues around persistency or URIs and versioning. If I remember correctly the guidelines suggest to concatenate a date/time to the URI. While "http://documents.bl.uk/document/identifier" gets the latest information about the document; http://documents.bl.uk/document/identifier/2011-02-01 would get the information which was valid on the 1st of Febuary. I am not sure if this is feasable; and I am not even sure if I understand how well this guidelines works with ORE. A practicale example would be good to see if there are problems with the guidelines"
Alan Danskin (alan.danskin@bl.uk - 2011-02-22 16:37:53)

The British Museum URI is likely to be http://collection.britishmuseum.org/object/[inventory ID]. No 'data'.
Dominic Oldman (DOldman@britishmuseum.org - 2011-02-07 20:54:33)

@andy this concerns me a little too... Should there be an "executive summary" here of the key principles from that doc interpreted for MLA context? Also, worth bearing in mind the current HE URI design principles that are currently JISC funded (dff proj manager? Lincoln one of the project sites?) - if these guidelines were unsympathetic in anyway to the COI guidelines, it could flag issues for the recommendation made here? Given the BBC has done a lot on this in context eg of programme URIs, are there any recommendations that eg they have made in this respect to Strategic Content Alliance, for example? Or is that too joined up?!;-)
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 12:57:51)

I think this is a premature requirement to be honest. While I do think the document is good - I would like to be reassured that it has had a positive impact in its domain of origin before recommending it, let alone requiring it. Or, in other words, it looks good on paper but is it effective? Perhaps there is some research into the effectiveness of these guidelines? Perhaps this is something the RDTF could commission?
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:43:15)

Data model guidelines

(See consultation document section: http://rdtfmetadata.jiscpress.org/data-model-guidelines/)

Metadata that is exposed using the RDF data or Linked Data approaches should be modeled according to FRBR, the CIDOC CRM or the principles that underpin EAD. Where structural .containment. relationships are required, ORE should be used.

Comments (22)

Question from the authors: Is the strong encouragement to use FRBR here problematic for the library community? Ditto CIDOC CRM for the museum community?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:38:25)

Question from the authors: The current recommendation to model using FRBR, CIDOC CRM and EAD is quite open. Do we want to recommend a particular way of modelling these standards in RDF?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-03 13:39:54)

I've not seen a consensus model for FRBR as RDF, so if you have one, please publicize it and recommend its use.
Ralph LeVan (levan@oclc.org - 2011-02-03 20:36:57)

From a library perspective I think asking for FRBR data sets the bar pretty high. Library management systems don't typically offer FRBRized data (although some of the more recent 'vertical search' systems do offer some level of FRBRization).
Owen Stephens (owen@ostephens.com - 2011-02-04 09:06:25)

I haven't, and I agree this is an issue (one which probably also applies to CIDOC CRM and EAD incidentally).
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-04 09:11:05)

Ralph, Owen, we wrote "should" as an explicit alternative to "must" - i.e. we intended this as a recommendation but accepted it was optional. We could easily soften to "should, where possible, ...". Or perhaps go even further, "should consider the use of ..."? Or we could drop any recommendation about underlying models and see what happens? Or we could recommend modeling only the "items in hand" (or whatever the digital equivalent of that phrase is)?
Andy Powell (andy.powell@eduserv.org.uk - 2011-02-04 09:15:43)

Thanks Andy - appreciate the clarification. I think part of the problem is that throughout the document FRBR is repeatedly referenced - I think it would be pretty hard at the moment not to read the document and not assume that FRBR was what was expected for library data. For printed monograph data I guess there are two likely approaches by those publishing data: a. Publish the equivalent of the MARC record (I guess this is what you mean by 'item in hand') b. Publish a FRBRised version It's hard to imagine further alternatives. So I'd probably just leave modelling out of it tbh (although I might point at examples of various approaches) The other factor playing on my mind is that while FRBR would probably work relatively well for printed monographs, it's application to other types (e.g. video) is less clear and more debated. Libraries currently tend to drop everything into MARC records because that's what the systems can handle, but in terms of publishing the data it may be there are more domain specific ontologies that prove appropriate - e.g. the music ontology for musical recordings etc.
Owen Stephens (owen@ostephens.com - 2011-02-04 12:12:38)

Library colleagues at Edinburgh have been wondering about the absence of the ISBD, International Standard for Bibliographic Description, on which there's been some Linked Data work. http://www.w3.org/2005/Incubator/lld/wiki/Library_Data_Resources#International_Standard_Bibliographic_Description_.28ISBD.29
Jo Walsh (jo.walsh@ed.ac.uk - 2011-02-04 14:44:08)

Shouldn't metadata guidelines somewhere reference guidance rules? AACR2, ISBD, RDA, International Cataloging Principles? All of the standards mentioned so far seem to be data models, but the data themselves need to be created based on standards if sharing is to be plausible. Note that it isn't necessary to pick only one cataloging standard, but it is generally desirable to make clear what standard you are using.
Karen Coyle (kcoyle@kcoyle.net - 2011-02-06 18:23:06)

The main issue with ISBD (and the RDA ontology) which are both relevant for libraries (although I'm not a huge fan personally) is that they are both draft at the moment.
Owen Stephens (owen@ostephens.com - 2011-02-07 12:38:55)

Re: CIDOC CRM and RDF modelling see http://www.cidoc-crm.org/rdfs/cidoc_crm_v5.0.2_english_label.rdfs
Dominic Oldman (doldman@britishmuseum.org - 2011-02-08 17:26:46)

Comment regarding costs of CIDOC CRM RDF metadata Leaving aside the adavnatges for semantic interoperability in an RDF CIDIOC CRM approach there will be overheads in CRM RDl semantic metadata but in practice as developments take place it may be less than anticipated. It is NOT necessary to use the full extent of the CRM - common patterns of a subset of CRM entities will probably emerge in different domain areas. Tools for generating CRM (say) metadata may emerge for particular domains. This has been the aim of the STELLAR project in the archaeology domain - developing tools for a subset of the CRM as relevant to inter-site cross search building on STAR project experience. http://hypermedia.research.glam.ac.uk/kos/stellar/
Doug Tudhope (dstudhope@glam.ac.uk - 2011-03-01 13:38:38)

The guidelines don.t prescribe models, but I think that SKOS deserves special consideration. Many of the entities relevant to these use cases have controlled names in various "vocabularies". These are often published in Linked Data using SKOS despite the fact that the entities themselves are often not concepts. For example, .Alaska. is being published as a skos:Concept in the MARC codes list despite the fact that it would probably be modeled as a .Place. in FRBR: http://id.loc.gov/vocabulary/geographicAreas/n-us-ak. Other examples abound. Using SKOS like this makes it easy to coin Linked Data URIs without committing to any preordained model, but raise issues for actually using them in RDF statements where the domain/range expect something with a modeled type. (See foaf:focus for a solution.)
Jeff Young (jyoung@oclc.org - 2011-02-15 14:02:29)

There are certainly those in the library community who do not subscribe to the FRBR model. There are also practical issues to be overcome in terms of transforming MARC data. A further issue is the extent to which the linked data community is prepared to accept the constraints imposed by FRBR or indeed any data model.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 14:01:24)

Should the model make an explicit distinction between information that relates to the resource, e.g.: title, edition, publisher, ISBN; and information that relates to the metadata description e.g.: source, record identifier, version/timestamp.
Alan Danskin (alan.danskin@bl.uk - 2011-02-16 14:02:09)

Caveat on Andy's: "The current recommendation to model using FRBR, CIDOC CRM and EAD is quite open." I really don't read it like that, with the "should" in the first sentence.
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 12:00:22)

I think some guidance is useful so would rather see the status of the recommendation clarified than dropped entirely.
Mia (openobjects@miaridge.com - 2011-02-17 18:19:24)

Some links to data modelling resources, tutorials would probably help here
Adrian Stevenson (a.stevenson@ukoln.ac.uk - 2011-02-18 12:39:10)

Should there be a para here about recommendations (or at least example) of data model guidelines for the community approach too? Wherever possible, given the guidelines are guidelines and not mandatory approaches, don't we want to take every opportunity we can to nudge folk in a direction that makes reuse possible via conventional approaches, and maybe help people adopt particular conventions when they are otherwise uncertain about what approach to take?
Tony Hirst (a.j.hirst@open.ac.uk - 2011-02-18 13:02:37)

We were surprised that there was no mention at all of the ISBD, (International Standard for Bibliographic Description). This may be just an oversight. Is it just an oversight? There is a danger that we aren't making the most coherent use of what is available to us.
Philip Hunter (philip.hunter@ed.ac.uk - 2011-02-18 14:25:13)

I think this section could be re-worked to encourage the use of: - a formal model where possible/appropriate - the use of a community-accepted norm for modelling where such exists rather than introducing particular, required meta-models. Perhaps some examples, such as the ones you list, can be offered (as examples only).
Paul Walk (p.walk@ukoln.ac.uk - 2011-02-18 15:49:15)

I'd support this approach suggested by Paul
Owen Stephens (owen@ostephens.com - 2011-02-18 15:57:03)

References

(See consultation document section: http://rdtfmetadata.jiscpress.org/references/)

  1. RDTF Vision - http://ie-repository.jisc.ac.uk/475/1/JISC%26RLUK_VISION_FINAL.pdf
  2. eFoundations: Resource discovery revisited - http://efoundations.typepad.com/efoundations/2010/08/resource-discovery-revisited.html
  3. Linked Data principles - http://linkeddata.org/
  4. Linked Data Design Issues - http://www.w3.org/DesignIssues/LinkedData.html
  5. Linked Open Data Star Scheme by example - http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
  6. Designing URI Sets for the UK Public Sector -http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
  7. Europeana Data Model - http://version1.europeana.eu/c/document_library/get_file?uuid=aff89c92-b6ff-4373-a279-fc47b9af3af2&groupId=10605
  8. Europeana Semantic Elements - http://group.europeana.eu/c/document_library/get_file?uuid=a830cb84-9e71-41d6-9ca3-cc36415d16f8&groupId=10602
  9. Wikipedia: Open Format - http://en.wikipedia.org/wiki/Open_format
  10. Open Definition - http://www.opendefinition.org/
  11. OAI-PMH - http://www.openarchives.org/OAI/openarchivesprotocol.html
  12. Sitemaps XML format - http://www.sitemaps.org/protocol.php
  13. Wikipedia: gzip - http://en.wikipedia.org/wiki/Gzip
  14. Wikipedia: Comma-separated values - http://en.wikipedia.org/wiki/Comma-separated_values
  15. MARC21 - http://www.loc.gov/marc/bibliographic/
  16. MODS - http://www.loc.gov/standards/mods/
  17. BibTeX - http://www.bibtex.org/Format/
  18. Wikipedia: RIS - http://en.wikipedia.org/wiki/RIS_(file_format)
  19. CrossRef output schema - http://www.crossref.org/help/Content/CrossRef%20Schema/CrossRef%20Schema.htm
  20. Guidelines for implementing Dublin Core in XML - http://www.dublincore.org/documents/dc-xml-guidelines/
  21. JSON - http://www.json.org/
  22. Wikipedia: Atom (standard) - http://en.wikipedia.org/wiki/Atom_(standard)
  23. Wikipedia: RSS - http://en.wikipedia.org/wiki/RSS
  24. SPECTRUM - http://www.collectionstrust.org.uk/index.cfm/collection-management/spectrum/
  25. Encoded Archival Description - http://www.loc.gov/ead/
  26. Resource Description Framework - http://www.w3.org/RDF/
  27. RDF/XML - http://www.w3.org/TR/rdf-syntax-grammar/
  28. N-Triples - http://www.w3.org/TR/rdf-testcases/#ntriples
  29. N-Quads - http://sw.deri.org/2008/07/n-quads/
  30. RDF/JSON - http://n2.talis.com/wiki/RDF_JSON_Specification
  31. Semantic Web Crawling: A Sitemap Extension - http://sw.deri.org/2007/07/sitemapextension/
  32. voiD Guide - Using the Vocabulary of Interlinked Datasets - http://vocab.deri.ie/void/guide
  33. FOAF - http://www.foaf-project.org/
  34. BIBO - http://bibliontology.com/
  35. DCMI Metadata Terms - http://dublincore.org/documents/dcmi-terms/
  36. OAI Object Re-use and Exchange - http://www.openarchives.org/ore/
  37. FRBR - http://www.ifla.org/publications/functional-requirements-for-bibliographic-records
  38. British Library Catalogue Dataset in RDF/XML - http://www.archive.org/details/BritishLibraryRdf
  39. CIDOC CRM - http://www.cidoc-crm.org/
  40. RDFa in XHTML: Syntax and Processing - http://www.w3.org/TR/rdfa-syntax/
  41. Cool URIs for the Semantic Web - http://www.w3.org/TR/cooluris/
  42. JISC OpenBib Project - http://bibliography.okfn.org/p/jiscopenbib/
  43. CLAROS Project wiki - http://www.clarosnet.org/wiki/index.php?title=Main_Page
  44. Linked Open COPAC Archives Hub - http://blogs.ukoln.ac.uk/locah/

Comments (1)

EDM reference should be http://www.version1.europeana.eu/web/europeana-project/technicaldocuments/ and ESE be http://www.version1.europeana.eu/web/guest/technical-requirements/ . These two will probably break one day, but it will be later than the two links in your current list!
Antoine Isaac (aisaac@few.vu.nl - 2011-02-17 12:03:29)