Librarians working in data management: How to avoid a data management nightmare

I thought I would quickly share a video that I created in collaboration with two of my excellent colleagues — Karen Hanson and Alisa Surkis. We are developing short data management modules for a clinical department at our institution that will cover everything from selecting the right data collection tool to file naming conventions. This first video was developed to serve as a teaser for the more focused modules.

The first module has already been well received within the department we are working with, and we hope that this will catch on with other departments as we move forward. As always, I’m happy to hear feedback or answer questions about how we developed the module or what we’re using it for in more detail. In the meantime, I hope you enjoy:


Why librarians can’t ignore data anymore

It’s here, and I have to say it came quicker than I expected – the first big stick for researchers – the PLoS data sharing policy. What this policy means for researchers is that if they refuse to share the data accompanying their publication – they can’t publish in PLoS. It also means that if they get published, but then hide their data after the fact they can have their publication retracted. This is an example of a really firm hand in an area where there hasn’t been one before. My first thought when I read this: what an amazing opportunity for librarians! It’s no secret that researchers have mixed feelings about the policy; some are angry and frustrated, others see the light and understand that this has been a long time coming. What librarians can do is ease the pain a little bit and try as best they can to reduce the burden on these researchers and provide them with options that will make this transition as easy as possible. While PLoS is the only publisher providing this big stick for data sharing, I expect Nature, Science and others will be following suit before long. Not to mention the various federal policies from the NSF, NIH, and now finally Canada with the Tri-Council looking to capitalize on Big Data.

So what can you do, even if you aren’t that familiar with data management, or data sharing policies?

Familiarize yourself with data repositories and their policies

One of the requirements of this data policy from PLoS is that they strongly recommend that researchers deposit their data in a public repository, so that their data can receive a DOI, accession number, or any other unique identifier. This is a simple step to providing researchers with valuable information. There are so many different options out there for researchers, and they may only know of a few different options – if any. Learning about what is available to researchers with respect to subject specific or even general data repositories (as well as any fees that may apply) can go a long way towards steering them in the right direction. Here are some options:

A recent blog post by John Kratz and Natsuko Nicholls on DataPub also provides valuable information about finding a suitable repository, and they do a very good job of outlining the differences between Databib and re3data.

Find out what your own institution offers

Our institution spoke with PLoS and found out that they also accept handles as a form of unique identifier. What this means is if you have an institutional repository that supports handles, DOIs, or any other type of unique identifier, you may already have a solution for your researchers. Check with those responsible for your institutional repositories to see if they can handle supporting researchers data. Questions to ask would include: does the metadata support data? What is the maximum file size you accept? Can you link multiple records in the repository together?

Let researchers know you are aware of the policy, and that the library is there to support them

At our institution, our first instinct was to see how many of our researchers have published in PLoS over the years. We found that in total, we had over 800 – but when you narrowed it down to first and last authors, we got a number closer to 130. We made an active decision to reach out to these authors know about the policy, and made an effort to find out what our institution has to offer, as well as other options they could pursue. The goal here was to let everyone know that the library was on top of it, and if they need support in this effort, we were going to be there.

Additionally, the library decided to send out a broadcast email to the entire institution to let them know about PLoS’s new policy, and that we were on top of it. We wanted to do this quickly to make sure everyone knew that the library was the place to go for these types of questions.

Go out and talk to your researchers

If anything, the one thing that can’t hurt is to try and reach out to the various areas you have – either as subject librarians or liaison librarians. We’ve just finished an exercise where we met and interviewed 30+ researchers with active grants at our institution (results to be published later this year) to learn more about issues surrounding how they manage, organize, store, preserve, reuse and share data. This exercise was invaluable as it provided us with multiple scenarios where they could be supported by the library. Even starting the conversation around how they feel about the PLoS data sharing policy is a good idea. More of these policies are going to emerge, so it’s best to start now. 

What if I don’t feel comfortable with the content yet?

That’s fine, but you could start by reading the plethora of literature out there on the topics of data management, sharing, storage, preservation, reuse – the list goes on and on. It’s also great to speak with other librarians who have been active in this area – I’m always open for a talk about research data! I’ve included some resources below that are a good start:

For librarians concerned about their role in the library, or looking for new opportunities to branch out and stake a claim in another area of the information profession – this is your chance. Talk to your researchers, learn to provide them with the support they need, and stay active in this area because research data management and sharing is only going to grow, and I know I am one librarian who does not want to be left behind. 

I’d love to hear in the comments about how your library is tackling this issue – if at all. I would also be keen to know the reasons why you won’t be pursuing this issue. Thanks for reading!

PLOS’s open data fever dream

I wanted to bring attention to this post on the fear’s of PLOS’s new open data policy from the blog of a neuroscience researcher. It addresses many of the concerns from the scientific research community concerning sharing data, and also highlights several ways that libraries can contribute. I encourage you to read through the comments section to learn more about additional (and innovative) ways researchers are working towards meeting this requirement. The PLOS policy is only the beginning, as many other requirements  will begin to emerge in the near future – including government mandates.

The publisher of the largest scientific journal in the world, PLOS, recently announced that all data relevant to every paper must be accessible in a stable repository, with a DOI and everything. Some discussion of this is going on over at Drugmonkey, and this is a comment that got out of hand, so I posted it here instead.

What is the purpose of this policy? I don’t see how anyone could be fooled into thinking this could somehow help eliminate fraud. Fraud is about intent to deceive, and one can deceive with a selective dataset as easily (or, actually, much more easily) than with Photoshop.

What else? Well, you could comb through the data of that pesky competitor or some other closely related work, looking for mistakes or things they missed that you could take advantage of. Frankly, I can’t imagine bothering. I mean, how could you not have…

View original post 692 more words

DataCite Releases Metadata Generator Tool – Here’s how it works

A post I wrote this morning for the Canadian Community of Practice for Research Data Management in Libraries about the new DataCite Metadata Generator Tool.

Canadian Community of Practice for Research Data Management in Libraries

Last week DataCite – the international registry of data citations – released a new tool designed to allow users to create metadata using text inputs through a quick and easy form in HTML.  What’s great about this tool is that it doesn’t require any software installation whatsoever, and it represents DataCite’s most recent version of their metadata schema – version 3. I tried out the tool myself and found it to be quite useful. The tool is very easy to install – DataCite’s description page of the metadata generator provides a link to a GitHub page. From there, you simply have to find the download option, and save the link with a .html file extension. Then, you can open the html file, and start generating metadata. I’ve included some screenshots of the tool below to give a clearer picture:

DataCite Mandatory Metadata


These elements represent DataCite’s most minimal metadata…

View original post 575 more words

Practicing what we preach: Data sharing & results reporting in library research

Recently I’ve been working on a survey of studies that focus on how libraries are reaching out to their institutions’ faculty and researchers about how they produce, share and store their data. Where I’m currently working we are trying to implement the same time type of research, but wanted to see what other libraries have done before launching into a project. I was even optimistic that some of the research I turned up might even give me the answers to our questions:

What type of data are biomedical researchers creating in a variety of disciplines?

Where do they stand in terms of sharing data?

How are they currently storing their data?

While I was pleased to find a number of articles that were excellent and exactly the type of research I was looking for (see the end of the post), I was ultimately disappointed in the content that I found. Let me explain the good first however, before I start with the bad.

The Good.

The methodology used in many of the articles I found was comprehensive, highly detailed, and provided me with a wealth of information about how I could go about finding out the answers to my data-related amongst my institution’s researchers and facutly. For example, many of the research studies described and provided (in detail) the interview questions that they used (Bardyn et al; Westra); focus group strategies (Adamick et al.; Jones et al.; and bibliographic analyses (Williams et al.; Xia et al.) – this was excellent material for me that I could reuse to structure my own institution’s approach to developing data-related services.

The Bad.

Where everything came apart for me was in several of the authors’  approach to the results section of their research. Very few articles excluding (Lage et al.; Scaramozzino et al.; Walters; Westra; Xia et al.) provided full results from their interviews or focus groups, and quantitative data was scarce. The reason I chose to survey existing research in the first place was to find out answers to my questions, and when I turn to research in my field, I expect to read concrete findings that will inform my own research.

For example, if I am reading  articles that state in the methodology that they surveyed their school of medicine researchers about their data-related habits, I am hoping to find data pertaining to the types and size of data their institution creates. This would be especially helpful if my institution serves similar biomedical disciplines and could ideally supplement a lot of work that would be required by a number of different libraries across the globe. Why wasn’t all of the data included in the article? Is there an underlying understanding that if I actually want to see full results I need to contact the author(s) directly to get it? This has to change.

The lack of results reporting is also a concern of mine because I have no evidence that these studies were actually completed. Sure you can say that the research study interviewed X number of people, and based on their responses you started a data management service. But what does that tell other people in our field about the behaviour, and work practices of researchers and faculty? Why omit the most interesting and useful data from the article?

The Promising.

Fortunately, I was able to find some excellent information from a select number of articles; Walters and Westra both provided articles that gave me a full indication of the types, size and department from which their data came from. Furthermore their description of their interviews were comprehensive, and strong quantitative data about their responses was collected and presented in the paper. This is what I come to expect from strong library-related research. We need to start thinking about presenting our data more clearly, and presenting all of it to our fellow information professionals.

Let it be known that I am not trying to condemn a large portion of library research because it does not provide the comprehensive level of data and results one comes to expect from quality research. Instead, I am hoping to encourage us all (myself included) to be more thorough in our data collection and results reporting, and think about who our research can be useful for. Is the purpose of publishing research just to publish? Or is it to help others advance the profession and implement products and services that have been proven to be effective?  We are a profession that prides itself on our encouragement and passion for information sharing; by following this mantra in our research more effectively I believe we have the capacity to produce outstanding research that will be of direct benefit to librarians in their work, and ultimately to the institutions that we serve. Thanks for reading – I’m happy to discuss this further in the comments if anyone is interested.


Adamick, Jessica, MJ Canavan, Steven McGinty, Rebecca Reznik-Zellen, Maxine Schmidt, and Robert Stevens. 2011. Building as We Climb: The Data Working Group at the University of Massachusetts Amherst. University of Massachusetts and New England Area Librarian e-Science Symposium.

Bardyn, Tania P., Taryn Resnick, and Susan K. Camina. 2012. “Translational Researchers’ Perceptions of Data Management Practices and Data Curation Needs: Findings from a Focus Group in an Academic Health Sciences Library.” Journal of Web Librarianship 6 (4) (October): 274–287.

Carlson, Jacob, Michael Fosmire, C.C. Miller, and Megan Sapp Nelson. 2011. “Determining Data Information Literacy Needs: A Study of Students and Research Faculty.” Portal: Libraries and the Academy 11 (2): 629 – 657.

Delserone, Leslie M. 2008. “At the Watershed: Preparing for Research Data Management and Stewardship at the University of Minnesota Libraries.” In Library Trends, 57:202–210. Urbana-Champaign, Illinois: John Hopkins University Press and the Graduate School of Library and Information Science.

Harrison, Andrew, and Sam Searle. 2010. “Not Drowning , Ingesting : Dealing with the Research Data Deluge at an Institutional Level.” In VALA2010 Proceedings.

Hruby, Gregory William, James McKiernan, Suzanne Bakken, and Chunhua Weng. 2013. “A Centralized Research Data Repository Enhances Retrospective Outcomes Research Capacity: a Case Report.” Journal of the American Medical Informatics Association : JAMIA (January 15): 1–5. doi:10.1136/amiajnl-2012-001302.

Johnson, Layne M., John T. Butler, and Lisa R. Johnston. 2012. “Developing E-Science and Research Services and Support at the University of Minnesota Health Sciences Libraries.” Journal of Library Administration 52 (8) (November): 754–769.

Jones, Sarah, Seamus Ross, and Raivo Ruusalepp. 2009. “Data Audit Framework Methodology”. Glasgow.

Lage, Kathryn, Barbara Losoff, and Jack Maness. 2011. “Receptivity to Library Involvement in Scientific Data Curation: A Case Study at the University of Colorado Boulder.” Portal: Libraries and the Academy 11 (4): 915–937.

Newton, Mark P, C C Miller, and Marianne Stowell Bracke. 2011. “Librarian Roles in Institutional Repository Data Set Collecting: Outcomes of a Research Library Task Force.” Collection Management 36 (1): 53–67.

Peters, Christie, and Anita Riley Dryden. 2011. “Assessing the Academic Library’s Role in Campus-Wide Research Data Management: A First Step at the University of Houston.” Science & Technology Libraries 30 (4) (September): 387–403.

Piwowar, Heather a. 2011. “Who Shares? Who Doesn’t? Factors Associated with Openly Archiving Raw Research Data.” PloS One 6 (7) (January): e18657. doi:10.1371/journal.pone.0018657.

Raboin, Regina, Rebecca C. Reznik-Zellen, and Dorothea Salo. 2012. “Forging New Service Paths: Institutional Approaches to Providing Research Data Management Services.” Journal of eScience Librarianship 1 (3).

Reznik-Zellen, Rebecca, Jessica Adamick, and Stephen McGinty. 2012. “Tiers of Research Data Support Services.” Journal of eScience Librarianship 1 (1): 27–35. doi:10.7191/jeslib.2012.1002.

Scaramozzino, Jeanine Marie, Marisa L. Ramirez, and Karen J. McGaughey. 2012. “A Study of Faculty Data Curation Behaviors and Attitudes at a Teaching-Centered University.” College & Research Libraries 73 (4) (July 1): 349–365.

Soehner, Catherine, Catherine Steeves, and Jennifer Ward. 2010. “E-Science and Data Support Services” (August).

Trinidad, Susan Brown, Stephanie M Fullerton, Julie M Bares, Gail P Jarvik, Eric B Larson, and Wylie Burke. 2010. “Genomic Research and Wide Data Sharing: Views of Prospective Participants.” Genetics in Medicine : Official Journal of the American College of Medical Genetics 12 (8) (August): 486–95. doi:10.1097/GIM.0b013e3181e38f9e.

Walters, Tyler O. 2009. “Data Curation Program Development in U.S. Universities: The Georgia Institute of Technology Example.” International Journal of Digital Curation 4 (3): 83–92.

Westra, Brian. 2010. “Data Services for the Sciences: A Needs Assessment.” Ariadne (64).

Williams, Sarah C. 2013. “Using a Bibliographic Study to Identify Faculty Candidates for Data Services.” Science & Technology Libraries (May 9): 1–8.

Xia, Jingfeng, and Ying Liu. 2013. “Usage Patterns of Open Genomic Data.” College & Research Libraries 74 (2) (March 1): 195–207.

Guest Post from Diana Almader-Douglas: Raising Awareness about the Importance of Culture on Health Literacy for Librarians

This isn’t something i’ve done before, but one of my fellow colleagues – Diana Almader-Douglas, has spent the last 6+ months updating some excellent resources on culture and health literacy at the National Library of Medicine. Diana is incredibly knowledgeable about these issues, and has asked if I would be willing to let her write a short post on my blog. You can read the post in its entirety below, and it is full of useful information about this issue – especially for health sciences librarians. I will make a disclaimer that this post is more focused on issues in the US, but I think that issues surrounding culture and health literacy presented here are applicable to Canada as well. Enjoy!

Diana Almader-Douglas:

Through a National Library of Medicine Associate Fellowship Project, I evaluated and enhanced the National Network of Libraries of Medicine’s (NN/LM) Health Literacy resource by adding content and resources related to culture in the context of health literacy.

By providing information about the relationship between culture and health literacy, the highly-utilized resource has the ability to impact a wider audience by encouraging the dissemination of culturally relevant health information by librarians and information professionals.

Through this project, I aimed to raise awareness about vulnerable and special populations while highlighting the connection to health disparities and health literacy.

Culture is one component of health literacy, but it is also a critical element of the complex topic of health literacy. Culture shapes communication, beliefs, and the comprehension of health information.  By enhancing the NN/LM Health Literacy Web page with content about health literacy in a cultural context, users of the page, and end users will be able to better meet the health information needs of vulnerable and diverse population groups they are serving. 

For more information about culture and health literacy, visit:

Benjamin RM. Improving Health by Improving Health Literacy. Public Health Rep. 2010, Nov-Dec; 125(6):784-785. Available from:

United States Department of Health & Human Services. Health Resources and Services Administration (HSRA). Culture, Language and Health Literacy. Available from:

United States Department of Health & Human Services. National Library of Medicine Specialized Information Services Outreach Activities & Resources.Multi-cultural Resources for Health Information. Available from:

Thanks for reading. I hope health sciences librarians will find this information to be useful. Just to add a bit of Canadian content, I have included some Canadian health literacy resources below – many of which could use the cultural focus that Diana has implemented for the NNLM:

Canadian Public Health Association Health Literacy Portal:

Canadian Council on Learning. Health Literacy in Canada: A Healthy Understanding:

Health Literacy Council of Canada:

Public Health Agency of Canada:

Podcast on Health Literacy and Cultural Competence. Centre for Literacy: 


Concerning the deal between LAC and Canadiana: We ask for transparency

I thought I would take this opportunity to weigh in on the deal between Library and Archives Canada and Canadiana, which calls for the transfer and digitization of the largest collection of Canadian archival records in history. I want to make it clear that in the grand scheme of things I think that this project is all in all a very good thing for archives in Canada, and is long overdue. What worries me is that the details surrounding this deal are largely unclear, and I think it is important for us, being Canadian archivists and librarians, to ask specific questions about this deal to ensure that this heritage collection is safe, and will ultimately be freely available to all Canadians who want to view it.

Canadiana has already tried to quell some of the hysteria surrounding the deal with their recently published FAQ, but if I’m honest there are a lot of  questions that I have that are still largely left unanswered. I even asked Canadiana on Twitter the other day to clarify the issues surrounding the ‘Premium’ payment that would be required if I wanted to have access to the search and discovery features they will be developing, but I have yet to hear a reply. I think this line from the FAQ deserves a more detailed explanation:

Until the completion of the project, this searchable, full-text data will be one of the premium services.

Does this mean that once the project is completed everyone will have free access to these features? If this is only one of the premium features, what else will we be missing out on if we don’t pay? These are just some of the questions I have about the deal, but more importantly, I think it is crucial that we start asking those involved (CRKN, CARL, LAC, Canadiana) how they plan to manage, describe and preserve this enormous amount of information and make sure that it will be available to Canadians for years to come. A lot of these questions have been discussed in Bibliocracy’s blog posts on the issue, but I would like to reiterate and request that the library and archives community start asking Canadiana and LAC their own questions to hopefully spur on more details about the project. To start it off, I have outlined below the questions that I would like to have answered:

How will this information be stored, and consequently transferred back to LAC once the full digitization process is complete?

Information architecture is obviously a crucial component of this project, as the collection will need to be stored someplace where it can be accessed by all. I think it is more important that we receive an answer about how all of this content will be transferred back to LAC. There are many methods and avenues this project can take in terms of placing the material in a repository or content management system to hold of all this material, and I think that both parties owe it to us to explain how this work will be completed. Will Canadiana use something like CLOCKSS to ensure that this material is preserved and made freely available forever? Or will this be the responsibility of LAC once the project is done? I would like to know that themigration of digital documents will be easily transferred back to LAC once this is over. Which brings me to my next question:

What measures will be taken regarding the digital preservation of the finalized, newly described content?

I’m hoping that having the responsibility of managing Canada’s largest archival collection will spearhead Canadiana to take measures to ensure the preservation not only of the physical content, but the newly digitized content as well. I would like to know where they plan on storing all of this information – will copies be held in a dark archive to ensure its long-term preservation? Will they use an Open Archival Information System (OAIS)? Will they use the Trusted Digital Repository model? It would be nice to see something akin to a Trustworthy Repository Audit and Certification (TRAC) so that Canadian information professionals feel confident that the proper steps are being taken to preserve this digital content.

What type of metadata schemas will be used?

This one is pretty self explanatory, but seeing as this is a Canadian initiative one would have to assume that Canada’s RAD archival description schema will be used. Seeing as linked data has become so prominent as of late, does Canadiana have plans to use RDF to encourage and support linked data within this collection? Because one of the main goals of this project is to make this content more discoverable and searchable, I think it would be helpful for us to understand how all of this transcription and metadata tagging will take place.

What do you really mean when you say that all of the content will be open access?

When I hear the term open access used to describe information content I always get excited. If this effort is truly going to make all of this digitized archival material open access, then that is fantastic. However with this deal, there are details of how open access is being described in this context that have me scratching my head. For a definition of open access, I like to use SPARC’s definition, which they define (in a nutshell) as material that has:

immediate, free availability on the public internet, permitting any users to read, download, copy, distribute, print, search or link to the full text of collections, crawl them for indexing, pass them as data to software or use them for any other lawful purpose

There have been a lot of discussions around Canadiana’s statement that they will be making the digital content available for free via a Creative Commons license. What I don’t understand is that in order to access certain features of this content, you will have to pay a premium fee. That doesn’t sound very open access to me, but a simple clarification would help with this fact. Which leads me to:

Can you please elaborate on the fees that are involved with premium access, and how this will work with the 10% of digital material released per year for 10 years?

This question has been on my mind since I heard about this deal (as I described above). What I would like to know if how this premium fee will work: what will it cost? what features are involved? Will the premium features become freely available once every 10% of the digitization process is completed?

I understand that in order to create high quality descriptive metadata for digitization you need money to do it. I don’t have as much of a problem with that, but what worries me is that these details have no been provided to us. By not answering this one glaring questions, Canadiana has made me nervous that I will have to pay, or my institution will have to pay for content over the long term? How do I know that these charges won’t continue once you finish the project?

What experts are going to be consulted for this project?

I know that CRKN and CARL have both supplied money for this project, but it would be very comforting to know that highly skilled, expert personnel will be working on this project. As a librarian and archivist, I want this effort to succeed at the highest level. In order to feel confident that this will be the case, I think it would be wise to inform the library and archival community in Canada as to who will be advising this effort. I always like specifics, and knowing that the best people are working on this effort will go a long way towards easing my mind.

In the end, all I’m asking for is a little bit of transparency. This project will have an effect on a huge number of information professionals, researchers, and the general public. I think that this project shows a lot of promise, and should be a cause for excitement amongst the Canadian information community. However, until Canadiana or LAC provide specifics about this deal, I will be holding my excitement. The lack of explanation, and vagueness of this project should be a cause of concern for everyone. Ultimately, I don’t think an open and transparent explanation of a project that affects so many Canadian people is too much to ask for.

I encourage other Canadian archivists and librarians to ask their own questions about this deal through blogs, social media, or email in hopes that it will generate enough demand that Canadiana and LAC will have to respond. I am only a small voice in this, and it would be great to see others get involved. Using #heritagedeal on Twitter could help synthesize all of this information in one place.

Thanks for reading.