Concerning the deal between LAC and Canadiana: We ask for transparency

I thought I would take this opportunity to weigh in on the deal between Library and Archives Canada and Canadiana, which calls for the transfer and digitization of the largest collection of Canadian archival records in history. I want to make it clear that in the grand scheme of things I think that this project is all in all a very good thing for archives in Canada, and is long overdue. What worries me is that the details surrounding this deal are largely unclear, and I think it is important for us, being Canadian archivists and librarians, to ask specific questions about this deal to ensure that this heritage collection is safe, and will ultimately be freely available to all Canadians who want to view it.

Canadiana has already tried to quell some of the hysteria surrounding the deal with their recently published FAQ, but if I’m honest there are a lot of  questions that I have that are still largely left unanswered. I even asked Canadiana on Twitter the other day to clarify the issues surrounding the ‘Premium’ payment that would be required if I wanted to have access to the search and discovery features they will be developing, but I have yet to hear a reply. I think this line from the FAQ deserves a more detailed explanation:

Until the completion of the project, this searchable, full-text data will be one of the premium services.

Does this mean that once the project is completed everyone will have free access to these features? If this is only one of the premium features, what else will we be missing out on if we don’t pay? These are just some of the questions I have about the deal, but more importantly, I think it is crucial that we start asking those involved (CRKN, CARL, LAC, Canadiana) how they plan to manage, describe and preserve this enormous amount of information and make sure that it will be available to Canadians for years to come. A lot of these questions have been discussed in Bibliocracy’s blog posts on the issue, but I would like to reiterate and request that the library and archives community start asking Canadiana and LAC their own questions to hopefully spur on more details about the project. To start it off, I have outlined below the questions that I would like to have answered:

How will this information be stored, and consequently transferred back to LAC once the full digitization process is complete?

Information architecture is obviously a crucial component of this project, as the collection will need to be stored someplace where it can be accessed by all. I think it is more important that we receive an answer about how all of this content will be transferred back to LAC. There are many methods and avenues this project can take in terms of placing the material in a repository or content management system to hold of all this material, and I think that both parties owe it to us to explain how this work will be completed. Will Canadiana use something like CLOCKSS to ensure that this material is preserved and made freely available forever? Or will this be the responsibility of LAC once the project is done? I would like to know that themigration of digital documents will be easily transferred back to LAC once this is over. Which brings me to my next question:

What measures will be taken regarding the digital preservation of the finalized, newly described content?

I’m hoping that having the responsibility of managing Canada’s largest archival collection will spearhead Canadiana to take measures to ensure the preservation not only of the physical content, but the newly digitized content as well. I would like to know where they plan on storing all of this information – will copies be held in a dark archive to ensure its long-term preservation? Will they use an Open Archival Information System (OAIS)? Will they use the Trusted Digital Repository model? It would be nice to see something akin to a Trustworthy Repository Audit and Certification (TRAC) so that Canadian information professionals feel confident that the proper steps are being taken to preserve this digital content.

What type of metadata schemas will be used?

This one is pretty self explanatory, but seeing as this is a Canadian initiative one would have to assume that Canada’s RAD archival description schema will be used. Seeing as linked data has become so prominent as of late, does Canadiana have plans to use RDF to encourage and support linked data within this collection? Because one of the main goals of this project is to make this content more discoverable and searchable, I think it would be helpful for us to understand how all of this transcription and metadata tagging will take place.

What do you really mean when you say that all of the content will be open access?

When I hear the term open access used to describe information content I always get excited. If this effort is truly going to make all of this digitized archival material open access, then that is fantastic. However with this deal, there are details of how open access is being described in this context that have me scratching my head. For a definition of open access, I like to use SPARC’s definition, which they define (in a nutshell) as material that has:

immediate, free availability on the public internet, permitting any users to read, download, copy, distribute, print, search or link to the full text of collections, crawl them for indexing, pass them as data to software or use them for any other lawful purpose

There have been a lot of discussions around Canadiana’s statement that they will be making the digital content available for free via a Creative Commons license. What I don’t understand is that in order to access certain features of this content, you will have to pay a premium fee. That doesn’t sound very open access to me, but a simple clarification would help with this fact. Which leads me to:

Can you please elaborate on the fees that are involved with premium access, and how this will work with the 10% of digital material released per year for 10 years?

This question has been on my mind since I heard about this deal (as I described above). What I would like to know if how this premium fee will work: what will it cost? what features are involved? Will the premium features become freely available once every 10% of the digitization process is completed?

I understand that in order to create high quality descriptive metadata for digitization you need money to do it. I don’t have as much of a problem with that, but what worries me is that these details have no been provided to us. By not answering this one glaring questions, Canadiana has made me nervous that I will have to pay, or my institution will have to pay for content over the long term? How do I know that these charges won’t continue once you finish the project?

What experts are going to be consulted for this project?

I know that CRKN and CARL have both supplied money for this project, but it would be very comforting to know that highly skilled, expert personnel will be working on this project. As a librarian and archivist, I want this effort to succeed at the highest level. In order to feel confident that this will be the case, I think it would be wise to inform the library and archival community in Canada as to who will be advising this effort. I always like specifics, and knowing that the best people are working on this effort will go a long way towards easing my mind.

In the end, all I’m asking for is a little bit of transparency. This project will have an effect on a huge number of information professionals, researchers, and the general public. I think that this project shows a lot of promise, and should be a cause for excitement amongst the Canadian information community. However, until Canadiana or LAC provide specifics about this deal, I will be holding my excitement. The lack of explanation, and vagueness of this project should be a cause of concern for everyone. Ultimately, I don’t think an open and transparent explanation of a project that affects so many Canadian people is too much to ask for.

I encourage other Canadian archivists and librarians to ask their own questions about this deal through blogs, social media, or email in hopes that it will generate enough demand that Canadiana and LAC will have to respond. I am only a small voice in this, and it would be great to see others get involved. Using #heritagedeal on Twitter could help synthesize all of this information in one place.

Thanks for reading.

readkev:

This is an excellent collection of ideas about reinventing archival methods from the Recordkeeping Roundtable blog. I wish we had more discussions like this when I was completing my Master’s. The sections on Access and Description; Professional Identity; and the discussion of how archivists bring value to the management of records. I hope you enjoy it!

Originally posted on Recordkeeping Roundtable:

On November 29 and 30 the Recordkeeping Roundtable, in partnership with the Australian Society of Archivists, held a two day workshop in Sydney; ‘Reinventing Archival Methods. Attended by almost 70 people from around Australia and even a couple of visitors from New Zealand, the workshop was stimulating, inspiring and energised many of us to look to what we can do next to examine and test the many great ideas that emerged over the two days.RAM slide A report on the event, including copies of presentations where available is provided below, along with a plan for continuing the conversation.

Why the need to reinvent archival methods?

This workshop came about following discussions amongst some of us in the…

View original 3,976 more words

Recent Digitization Projects from Librarians and Archivists: A Great Example of our Expertise

I haven’t written a post in a few weeks so I thought I would return with a short post highlighting some of the great work that is being done by librarians and archivists to preserve both print and born digital material. Enjoy!

The Salman Rushdie Archive (Emory University)

The Salman Rushdie Archive is an excellent example of what librarians and archivists can do with born digital material. In this case, Emory has chosen to preserve all of Rushdie’s  manuscripts, drawings, journals, letters and photographs. What is amazing about this collection is that they have also created the actual digital environments (several computers) that Rushdie used to produce his work. This project is the most complete set of born-digital records to date (according to Emory), and provides a gold standard for how libraries and archives can raise the bar to provide important historical information to patients. Take a look at the video below to get a glimpse of what the digital environments look like. His computers even replicate crashes like they would when he was using them!

I love seeing the old Mac iOS recreated with all of Rushdie’s records on them. As artists, authors and researchers continue to use material electronically, those who abide by strong record keeping practices will hopefully be able to have their material preserved as they used it. The Emory project is the first step in the right direction.

National Library of Medicine Exhibition Programs

The National Library of Medicine’s History of Medicine Division has done an excellent job promoting the vast amount of material they have within their library. Through the exhibition programs webpage, users have an opportunity to browse through historical images on everything from Shakespeare and the Four Humors to Forensic Views of the Human Body. By providing beautifully scanned images and comprehensive historical information, these exhibitions provide an opportunity for the general public to observe and learn about important material related to the history of medicine.

Balinese Digital Library Collection

Available through the Internet Archive, the Balinese Digital Library provides access to manuscripts that are comprised of everything from  religion, holy formulae, rituals, family genealogies, law codes, treaties on medicine, arts and architecture, calendars, prose, poems and magic! What I have found most interesting about this collection is that it contains information on important issues such as medicines and village regulations that are used in daily practice. It is also important to note that this collection is the first complete literature of the Balinese.

The Internet Archive is now home to 10 petabytes of data and is an excellent resource if you’re interested in historical material.

Wellcome Collection

Supported by the Wellcome Trust, the Wellcome Collection provides the public with an opportunity to read articles, view images and watch videos on a variety of subjects related to the human body. The collection is separated into the following categories: Life, Genes & You; Mind & Body; Sickness & Health; Time & Place; Science & Art; and Education.

This collection has so much excellent material I don’t even know where to begin. As a visitor to the site, you have an opportunity to look at everything from Crick’s preliminary sketch of DNA, to fabulous satirical medical images. Take the time to explore this site, I’ve spent hours on it already while writing this post. 

I’ve tried to highlight my favourite examples of library and archival projects here to provide a glimpse of the great work we do to provide access to historical material. I am excited to see what projects come from the inspiration of seeing the Emory Salman Rushdie Archive, as I think recreating the digital environment is an excellent idea for future collections. Science and medical researchers are exclusively using digital formats to store, share and interpret data. It is vital that as librarians and archivists we work with these groups to manage their data and preserve it in ways that will allow us to present it in a coherent way in the future. How else can we be sure that this data will be available to people in the future? More to come on this in my next post! 

Web archiving: The importance of collecting born-digital materials

Recently I had the privilege to sit in on the Board of Regents meeting at the National Library of Medicine (NLM). At this meeting the History of Medicine and Technical Services Division presented a report on an initiative to expand the NLM’s collection to born-digital web materials. The presentation involved a preliminary trial where the team collected twelve specific doctor and patient blogs to be preserved. I thought that this was an incredible idea and naturally ran up to them immediately after the presentation and asked if I could participate in the project as part of my Associate Fellowship. What I liked most about their presentation was their methodology, and the tools they used to collect this content. I thought the strategy they used was a good opportunity to write a blog post giving an overview of what tools they used in this process. 

Strategy & Guidelines

What first caught my eye was that the NLM Web Collecting and Arching Working Group has recommended that the NLM follow the ARL Code of Best Practices and Fair Use for Academic and Research Libraries. This code was created in February of 2012, and Section 8 is completely devoted to Collecting Material Posted on the World Wide Web and Making It AvailableAccording to the code collecting web material is valuable because it creates an accessible archive of what is available on the web — an environment that contains an enormous amount of important historical and research related content. The code states that:

Selecting and collecting material from the Internet in this way is highly transformative. The collecting library takes a historical snapshot of a dynamic and ephemeral object and places the collected impression of the site into a new context: a curated historical archive.

The ARL also places certain limitations on how this content should be created. This is important because it sets a standard for other libraries and archives to follow. Furthermore, it provides guidance to institutions on how to approach the creators of this content. In accordance to fair use, the ARL states that:

Captured material should be represented as it was captured, with appropriate information on mode of harvesting and date.

To the extent reasonably possible, the legal proprietors of the sites in question should be identified according to the prevailing conventions of attribution.

Libraries should provide copyright owners with a simple tool for registering objections to making items from such a collection available online, and respond to such objections promptly.

These limitations support traditional archival theory in the sense that the goal is to preserve the integrity and authenticity of the website that is captured. It also reflects the importance of acknowledging the creator of the content, and asking for permission before the material is made available to the public.

What I love about the NLM’s consideration of the use the ARL Code for Fair Use is that it is one library collaborating with another to access and preserve content for the benefit of others. The ARL is an excellent resource for academic and research libraries that should be used more often. I would also love to see more libraries collaborating with one another on this topic. Because collecting born-digital material should reflect an institutions own collection development policies, it is important that the library community communicate with one another to avoid duplication of captured content.

Now that I have gone over the guidelines and standards that were used to approach the content, I would like to speak briefly about the technology the NLM used to gather this material: Archive-it. 

Archive-it

Archive-it is a subscription web archiving service developed from the Internet Archive that helps to build, harvest and preserve digital content on the web. The program allows users to collect and manage this content in a way that preserves all the original qualities of a web page keeping its integrity in place. The program can essentially run 24 hours a day in order to harvest and capture the material on a web page. Once the content is captured it is stored in the Internet Archive data centres. The NLM staff who presented this report praised Archive-it for the ease of use and outstanding institutional support. Many other libraries have begun using Archive-it as well including the Library of Congress, University of Michigan, Tufts University and many others. Although I have not tried the program yet, when l browse through some of the collections they support it appears to do an excellent job of maintaining the look and feel of the web pages. Take a look for yourself and see; they have a large number of collections available. 

Why is this important?

The staggering amount of research available on the web is the most obvious reason for collecting this material. Every day we surf the web, gather information and use it as evidence for solving problems, answering questions and enhancing research. What we don’t consider is that if the web was ever to disappear we would no longer have this amazing research to refer to. Grey literature, social media and at-risk content are just a few types of research content that would be very useful to have a historical record of. With grey literature, many websites that provide valuable reports may only last for a limited amount of time — capturing this content will provide an opportunity for increased exposure, and safeguard against the loss of valuable research. Similarly, social media provides a wealth of information about the collaborative and interactive nature of doctors and patients. Preserving this material to better understand trends and issues among these groups can be a valuable resource. Finally capturing at-risk content can help government agencies track and gather web content that provide information about threats on public health; websites of early responders to disasters; and social media (blogs, Twitter, Facebook) that documents individuals responses to health crises. All of this information is valuable and can provide a historical record for those interested in researching it in the future. 

Not enough has been done to preserve the valuable information born on the web. So many of us use the Internet for many of our daily tasks, yet we don’t think about how in ten years from now we may never be able to access that material again. I believe that each library institution needs to think about gathering born-digital content that aligns with their own collection development policy. The fact that the NLM has launched an initiative to preserve biomedical born-digital material demonstrates that it is deemed important on a national level smaller health science libraries could start archiving regional web research as well as their institutions personal webpages that provide unique research to their patron base. 

What do you think? Should libraries be trying to do more work in this area? Should this be a part of their strategic plan? 

It is important to note that these opinions are mine and not of the National Library of Medicine. I wrote this post out of appreciation for the project and the opportunity to share my beliefs on the important role libraries can play in collecting born-digital materials.

References

Archive-it: A web archiving service to harvest and preserve digital collections. 2012. Retrieved from http://www.Archive-it.org on September 16, 2012.

Association of Research Libraries. Code of Best Practices in Fair Use for Academic and Research Libraries. Jan 19, 2012. Retrieved from: http://www.arl.org/pp/ppcopyright/codefairuse/index.shtml on September 16, 2012.

Why Medical Librarians Should Learn Archival Theory

Every morning when I turn on my computer and browse through my news feeds I am inundated with stories about data management, big data and data preservation. It is clear that these issues are a hot topic in information management, and that librarians need to start paying attention. Because librarians are at the hub of where research and development activities take place (hospitals, academia, private institutions), we have the ability and opportunity to stake a claim in the management of our patrons’ data creation. I’ve already discussed data curation in a previous post, but today I want to introduce some very basic principles of archival theory. These basic principles will help medical librarians (and anyone else involved in information management) understand the importance of trustworthiness, and how it can be applied to data management.

Trustworthiness

In the simplest terms, for a record or element of data to be considered trustworthy it must be reliable, authentic and accurate. If data or a record is missing one of these components, it cannot be trusted.

Reliability

The reliability of a record/data element as a statement of fact. Reliability exists when a record can stand for the fact it is about (ex. why was it created?), and is established by paying close attention to the completeness of a record/data element’s form and how much control was present at the process of its creation. 

Completeness

The characteristic of a record/data element that refers to the presence of all the elements required by the creator for it to be capable of generating consequences (ex. it serves a purpose and can be used to enact this purpose). Completeness also means that it (the record/data element) is the first (it is not derived from something else) and is effective in the sense that it is capable of carrying out the consequences of what the creator intended it to do. These two facets of completeness comprise what can be defined as an original record/data element.

Process of Creation

The process of creation is simply the procedure taken that governs the formation of a record/data element and the participation of the act of creating a record. This phase is very important as the process of creation is an indicator of whether the record will be complete and reliable.

Authenticity

Authenticity represents the trustworthiness of a record as a record.What this means is that the quality of a record represents what it purports to be and that  it has not been tampered with or corrupted. Authenticity is made up of two different components: integrity and identity.

Integrity

Integrity refers to the quality of the record/data element of being complete and unaltered in any essential way. If a record/data element is altered or missing something it can no longer be considered authentic, and therefore is no longer trustworthy. 

Identity

Identity represents all the characteristics of a record/data element that uniquely identify it and distinguish it from any other record/data element. 

Accuracy

Accuracy of a record/data element is present when they are precise, correct, truthful, free of error or distortion, or pertinent to the matter. To clarify pertinence, archival diplomatic theory defines a record/data element to be pertinent if its content is relevant to the purpose for which it is created and/or used. For a record/data element to be accurate all four of these components must be present. 

This post only scratches the surface of one small aspect of archival theory. The facets of trustworthiness I have addressed here were originally developed to address the issues surrounding paper records. However, these principles are ubiquitous in the sense that they can be applied to the new issues surrounding the creation, management and preservation of data. Maintaining data trustworthiness is essential for medical researchers, especially when original data is required to provide evidence of their findings. I believe medical librarians have a duty to understand these principles in order to better serve their patron base as the presence of data in medical research continues to grow. With a solid grounding in archival theory, medical librarians can apply this new knowledge to the expertise they already have managing medical information. This combination of skills will help create a smooth transition for librarians as we venture (and stake a claim!) in the new world of managing medical research data.

References

Duranti, Luciana (1998). Diplomatics: new uses for an old science. Society of American Archivists and Association of Canadian Archivists: Scarecrow Press.

Interpares 2 Project: International Research on Permanent Authentic Records in Electronic Systems. Retrieved from: http://www.interpares.org/ip2/ip2_terminology_db.cfm on August 20, 2012.