Librarians working in data management: How to avoid a data management nightmare

I thought I would quickly share a video that I created in collaboration with two of my excellent colleagues — Karen Hanson and Alisa Surkis. We are developing short data management modules for a clinical department at our institution that will cover everything from selecting the right data collection tool to file naming conventions. This first video was developed to serve as a teaser for the more focused modules.

The first module has already been well received within the department we are working with, and we hope that this will catch on with other departments as we move forward. As always, I’m happy to hear feedback or answer questions about how we developed the module or what we’re using it for in more detail. In the meantime, I hope you enjoy:

Why librarians can’t ignore data anymore

It’s here, and I have to say it came quicker than I expected – the first big stick for researchers – the PLoS data sharing policy. What this policy means for researchers is that if they refuse to share the data accompanying their publication – they can’t publish in PLoS. It also means that if they get published, but then hide their data after the fact they can have their publication retracted. This is an example of a really firm hand in an area where there hasn’t been one before. My first thought when I read this: what an amazing opportunity for librarians! It’s no secret that researchers have mixed feelings about the policy; some are angry and frustrated, others see the light and understand that this has been a long time coming. What librarians can do is ease the pain a little bit and try as best they can to reduce the burden on these researchers and provide them with options that will make this transition as easy as possible. While PLoS is the only publisher providing this big stick for data sharing, I expect Nature, Science and others will be following suit before long. Not to mention the various federal policies from the NSF, NIH, and now finally Canada with the Tri-Council looking to capitalize on Big Data.

So what can you do, even if you aren’t that familiar with data management, or data sharing policies?

Familiarize yourself with data repositories and their policies

One of the requirements of this data policy from PLoS is that they strongly recommend that researchers deposit their data in a public repository, so that their data can receive a DOI, accession number, or any other unique identifier. This is a simple step to providing researchers with valuable information. There are so many different options out there for researchers, and they may only know of a few different options – if any. Learning about what is available to researchers with respect to subject specific or even general data repositories (as well as any fees that may apply) can go a long way towards steering them in the right direction. Here are some options:

A recent blog post by John Kratz and Natsuko Nicholls on DataPub also provides valuable information about finding a suitable repository, and they do a very good job of outlining the differences between Databib and re3data.

Find out what your own institution offers

Our institution spoke with PLoS and found out that they also accept handles as a form of unique identifier. What this means is if you have an institutional repository that supports handles, DOIs, or any other type of unique identifier, you may already have a solution for your researchers. Check with those responsible for your institutional repositories to see if they can handle supporting researchers data. Questions to ask would include: does the metadata support data? What is the maximum file size you accept? Can you link multiple records in the repository together?

Let researchers know you are aware of the policy, and that the library is there to support them

At our institution, our first instinct was to see how many of our researchers have published in PLoS over the years. We found that in total, we had over 800 – but when you narrowed it down to first and last authors, we got a number closer to 130. We made an active decision to reach out to these authors know about the policy, and made an effort to find out what our institution has to offer, as well as other options they could pursue. The goal here was to let everyone know that the library was on top of it, and if they need support in this effort, we were going to be there.

Additionally, the library decided to send out a broadcast email to the entire institution to let them know about PLoS’s new policy, and that we were on top of it. We wanted to do this quickly to make sure everyone knew that the library was the place to go for these types of questions.

Go out and talk to your researchers

If anything, the one thing that can’t hurt is to try and reach out to the various areas you have – either as subject librarians or liaison librarians. We’ve just finished an exercise where we met and interviewed 30+ researchers with active grants at our institution (results to be published later this year) to learn more about issues surrounding how they manage, organize, store, preserve, reuse and share data. This exercise was invaluable as it provided us with multiple scenarios where they could be supported by the library. Even starting the conversation around how they feel about the PLoS data sharing policy is a good idea. More of these policies are going to emerge, so it’s best to start now. 

What if I don’t feel comfortable with the content yet?

That’s fine, but you could start by reading the plethora of literature out there on the topics of data management, sharing, storage, preservation, reuse – the list goes on and on. It’s also great to speak with other librarians who have been active in this area – I’m always open for a talk about research data! I’ve included some resources below that are a good start:

For librarians concerned about their role in the library, or looking for new opportunities to branch out and stake a claim in another area of the information profession – this is your chance. Talk to your researchers, learn to provide them with the support they need, and stay active in this area because research data management and sharing is only going to grow, and I know I am one librarian who does not want to be left behind. 

I’d love to hear in the comments about how your library is tackling this issue – if at all. I would also be keen to know the reasons why you won’t be pursuing this issue. Thanks for reading!

Data Publishing: Who is meeting this need?

I realize I haven’t written a post in over a month, and I feel horribly guilty about it. The one good thing about not having the time to write blog posts frequently is that I now have a stockpile of ideas, and plenty of material to write more frequent posts.

What I would like to address in today’s post is some of the ongoing efforts from journals, government agencies, and open source communities have taken to address the need to publish data, in all of its messy and intricate formats. Similar to my previous posts, I will describe each of the efforts that I find to be promising in terms of their ability to tackle this massive, and complicated task. In case readers are unfamiliar with the concept of a data publication, I define the concept based on a hybrid of different viewpoints from papers by Borgman, Lynch, Reilly et al., Smith, and White:

A data publication takes data that has been used for research and expands on the ‘why, when and how’ of its collection and processing, leaving an account of the analysis and conclusions to a conventional article. A data publication should  include metadata describing the data in detail such as who created the data, the description of the type of data, the versioning of the data, and most importantly where the data can be accessed (if it can be accessed at all). The main purpose of a data publication is to provide adequate information about the data so that it can be reused by another researcher in the future, as well as provide a way to attribute data to its respective creator. Knowing who creates data provides an added layer of transparency, as researchers will have to be held accountable for how they collect and present their data. Ideally, a data publication would be linked with its associated journal article to provide more information about the research.

With all that being said, lets take a look at some of the efforts that currently exist in the data publishing realm. Note that clicking on the images will take you to the homepages of each resource.

Nature Publishing Group – Scientific Data

Scientific Data

Scientific Data is the first of its kind in that it is an open access, online-only publication that is specifically designed to describe scientific data sets. Because the description of scientific data can be a complicated and exhaustive, this publication does an excellent job of addressing all of the questions that need to be asked of researchers before they even think of submitting their data. Scientific Data just came out with their criteria for publication today, and the questions they ask are exactly what is needed to ensure that the data publication will be able to be reused through appropriate description.

Then comes the next great component – the metadata. Scientific Data uses aData Descriptor’ model that requires narrative content about a data set such as the more traditional descriptors librarians are familiar with such as Title, Abstract and Methodology. What is excellent about the Data Descriptor model is that it also requires structured content about the data.  This structured content uses the an ‘Investigation’, ‘Study’ and ‘Assay’ (ISA) open source metadata format to describe aspects of the data in detail. These major categories are apparently designed to be ‘generic and extensible’, and serve to address all scientific data types and technologies. You can check ISA out HERE.

Overall I think that Scientific Data is the beginning of a new trend in publishing where major journals will begin to publish data publications more frequently on top of traditional research articles. This publication is the first step towards making research data available, reusable and transparent within the scientific research community.

F1000Research – Making Data Inclusion a Requirement

F1000Research   An innovative OA journal offering immediate publication and open peer review.

F1000Research is an excellent new open science journal that has caught my attention for its foray into systematic reviews and meta analyses and for its recent ‘grace period’ to encourage researchers to submit their negative results for publication. I think that this publication that medical librarians should be aware of, and potentially encourage researchers to submit to should they be looking for a more frugal option. What really impresses me with F1000Research though, is their commitment to ensuring that data associated with research articles is made readily available.

Currently, F1000Research reviews data that is submitted in conjunction with an article, and then offers to deposit the data on the authors behalf in an appropriate data repository. The journal is open to placing in data in any repository, but they work mainly with figshare - a popular platform for sharing data.  Together figshare and F1000Research have created a ‘data widget’ that allows figshare to link data files with its associated article in F1000Research – which is excellent! There was a recent blog post written about this widget here that can give it the attention it deserveshttp://blog.f1000research.com/2013/05/23/new-f1000research-figshare-portal-and-widget-design/). F1000Research is also apparently working on a similar project with Dryad. I think that moving forward we will see more efforts from journals like F1000Research to seamlessly connect their publications with associated data. This is a crucial component to publishing data as the journal article provides the context in terms of how the data was used. 

Dryad – Integrated Journals

Dryad Digital Repository   Dryad

Dryad is a data repository and service that offers journals the option of submission integration with their system. The service is completely free and is designed to simplify the process of submitting data, and ensure biodirectional links between the article and the data. Currently Dryad provides an option for data to be opened up to peer review, but I would like to see that become more of a requirement going forward. Here is a link to Dryad’s journal integration page: http://datadryad.org/pages/journalIntegration

Currently there are a number of journals currently participating in this effort, and a complete list of them can be seen HERE. Carly Strasser also did a great job of outlining other journals that require data sharing in her post about data sharing on the excellent blog Data Pub. I think Dryad is a perfect example of the other side of traditional publishing. We need data repositories like Dryad and figshare to continue supporting data publication and storage, as they represent half of the picture that will allow articles and data to be connected.

The Dataverse Network

Screenshot_1The Dataverse Network is a data repository designed for sharing, citing and archiving research data. Developed by Harvard and the Data Science team at the Institute for Quantitative Social Science, Dataverse is open to researchers in all scientific fields. As a service, Dataverse organizes its data sets into studies; each study contains cataloguing information along with the data, and provides a persistent way to cite the data that has been deposited.

Dataverse also uses Zelig (an R statistical package) software that provide statistical modeling of the data that is submitted. Finally, Dataverse can also be installed as a software program into their own institutional data repositories. I see the ability to download Dataverse for institutional purposes to be an excellent prospective strategy; as more academic institutions begin to develop data storage capabilities to their institutional repositories, Dataverse will provide some much needed assistance in this arena.

GitHub: Git for Data Publishing

GitHub · Build software better  together.

Although I would not call myself an expert of the GitHub world, I will say that I recognize a fruitful initiative to publish data when I see one. In a recent blog post by James Smith talking about how the tools of open source could potentially revolutionize open data publishing. The post is great and you can read it here: http://theodi.org/blog/gitdatapublishingutm_source=buffer&utm_medium=twitter&utm_campaign=Buffer&utm_content=buffer6c57f James’ idea is to upload data to GitHub repositories and use a DataPackage to attach metadata that will sufficiently describe the data. Ultimately the goal of using GitHub for data publication would enable sharing and reuse of data within a supporting and collaborative community. While some of this can get complicated, working through the links from his post really provides you with a sense of how an open source community is coming together to address the need to publish data.

Biositemaps

National Centers for Biomedical Computing

Biositemaps is a working group within the NIH that is designed to: 

(i) locating, (ii) querying, (iii) composing or combining, and (iv) mining biomedical resources

‘Biomedical resources’, in this case can be defined as anything from data sets to software packages to computer models. What is most interesting about Biositemaps is that they provide an Information Model that outlines a set of metadata that can be used to describe data. Using the Information Model as a base for data description, it then uses a Biomedical Resource Ontology (BRO); BRO is a controlled terminology for the ‘resource_type’, ‘area of research’, and ‘activity’ to help provide more information about how  data is used, and how it can be described in detail using biomedical terminology. I will admit this resource is still pretty raw, but I think it has a lot of potential for being an excellent resource moving forward. The basic idea behind Biositemaps is that a researcher fills in a lengthy auto-complete form describing themselves, their data, and the methodology used to create the data. Once the form is complete, it produces an RDF file that is uploaded to a registry where it can be linked to, and from anywhere. If you are a medical librarian and you have researchers interested in publishing data, I encourage you to take a look at this resource.

SHARE Program – Association of Research Libraries (ARL), Association of American Universities (AAU), the Association of Public and Land-grant Universities (APLU)

This effort just came out last week, but the ARL, AAU and APLU are joining together to create a shared vision of universities collaborating with the Federal government and others to host institutional repositories across the the memberships to provide access to public access research – including data. While it is not entirely clear how this will be achieved – especially in the realm of data – I think that this is the type of collaboration that will provide a well researched, evidence based solution moving forward. I hope that SHARE continues to expand beyond the response to the OSTP memo, as I think Canadian academic institutions could benefit greatly from this effort. Here is a link to the development draft for SHARE: http://www.arl.org/storage/documents/publications/share-proposal-07june13.pdf

For Medical Librarians

My goal in presenting these data publication efforts is an attempt to get medical librarians to think more about the options that are available for data publication. Journals, government agencies and open source communities are all trying to address the issues surrounding data publication, and I think it is our duty as medical librarians to familiarize ourselves with journal policies around data sharing; data publication initiatives like DataCite, Dryad, and figshare; and new government efforts like Biositemaps that are becoming more heavily used every day, and will be relevant for our liaison and research areas of practice moving forward. I have tried to provide a lot of links within this post, but I’ve included some more reading below that may be useful. I’d also like to mention that this is by no means an exhaustive list, but rather some of the interesting efforts i’ve seen throughout my work with data. Please feel free to add as you wish in the comments section.

Readings/References

1. Borgman CL, Wallis JC, Enyedy N. Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries. International Journal of Digital Libraries [Internet]. 2007;7:17–30. Available from: http://escholarship.org/uc/item/6fs4559s#  

2. Lynch C. The shape of the scientific article in the developing cyberinfrastructure. CT Watch Quarterly [Internet]. 2007;3(3):5–10. Available from: http://www.ctwatch.org/quarterly/articles/2007/08/the-shape-of-the-scientific-article-in-the-developing-cyberinfrastructure/  

3. Piowowar H, Chapman W. A review of journal policies for sharing research data. Nature Precedings [Internet]. 2008. Available from: http://www.academia.edu/904922/A_review_of_journal_policies_for_sharing_research_data

4. Reilly S, Schallier W, Schrimpf S, Smit E, Wilkinson M. Report on Integration of Data and Publications [Internet]. 2011: p. 1–7. Available from: http://www.alliancepermanentaccess.org/wp-content/uploads/downloads/2011/10/ODE-ReportOnIntegrationOfDataAndPublications-exesummary.pdf  

5. Smith VS. Data publication: towards a database of everything. BMC research notes [Internet]. 2009 Jan [cited 2013 Mar 3];2:113. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2702265&tool=pmcentrez&rendertype=abstract  

6. Whyte A. IDCC13 Data Publication: generating trust around data sharing. Digital Curation Centre [Internet]. 2013 Jan 23; Available from: http://www.dcc.ac.uk/blog/idcc13-data-publication-generating-trust-around-data-sharing

A Data Management and Data Sharing Bibliography for Librarians

It has been a while since I last posted. December was a pretty crazy month and I’ve been working on some excellent projects (more to come on the blog in a few weeks). In the meantime, a colleague of mine – the talented @fsayre - and I have been working hard to compile all of the literature on data management that we thought would be useful for librarians. Since we are both medical librarians, there are quite a few articles that are health-focused, but the majority should be useful for any librarian. 

The two of us are hoping to start a Mendeley group where more librarians can join and share their experiences and ideas about working with data management. We would love to have the input of more librarians, so please let us know via this blog or on Twitter if you would be interested in joining our Mendeley group.

As for this bibliography, while we’ve tried to make it as comprehensive as possible, we encourage people who read this to add additional material in case we’ve missed some resources.  Also, if you’re interested in looking at some other resources, check out my posts on the Data Curation Lifecycle and data management resources for librarians. Happy reading!

**Update** The Mendeley Group is now up and running and you can request to join it here: http://www.mendeley.com/groups/2956801/data-management-for-librarians/. We encourage all of those who are interested to sign up, and you are not required to contribute if you do not want to. Otherwise, we hope that librarians will share resources as well as their experiences working with data.

1. Advisor E, Committee WP, Attribution S. Report on the International Workshop on Contributorship and Scholarly Attribution Report written by Irene Hames , Editorial Advisor and Consultant , with input and some facilitators Workshop Planning Committee Executive summary. 2012;2012(May):1–29.

2. Allard S. DataONE: Facilitating eScience through Collaboration. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 10];1(1):4–17. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/3/

3. Auckland M. Re-skilling for Research. RLUK Research Libraries UK. 2012. Available from: http://www.rluk.ac.uk/files/RLUK%20Re-skilling.pdf

4. Baker M. Gene data to hit milestone. Nature [Internet]. 2012 Jul 19 [cited 2012 Nov 1];487(7407):282–3. Available from: http://www.nature.com/news/gene-data-to-hit-milestone-1.11019

5. Bloom T. Dealing with data. PLOS Biologue [Internet]. 2012 [cited 2012 Nov 9]; Available from: http://blogs.plos.org/biologue/2012/07/13/dealing-with-data/

6. National Science Board. Digital Research Data Sharing and Management. National Science Foundation. Arlington, VA; 2011. Available from: http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf

7. Borgman CL. Research Data : Who will share what, with whom, when, and why ? China-North American Library Conference. Beijing; 2010. p. 21. Available from: http://works.bepress.com/borgman/238/

8. Charles W. Bailey J. Research Data Curation Bibliography [Internet]. Houston: Charles W. Bailey, Jr.; 2012 [cited 2012 Nov 9]. Available from: http://digital-scholarship.org/rdcb/rdcb.htm

9. Christensen-Dalsgaard B. Ten recommendations for libraries to get started with research data management. Wirtschaftsforschung, Berlin; 2012 p. 3. Available from: http://www.libereurope.eu/sites/default/files/The%20research%20data%20group%202012%20v7%20final.pdf

10. Creamer A. Creating an Online Research Data Management Course: A Conversation with Data Librarians Robin Rice and Stuart Macdonals. Worcester, MA; 2011. Available from: http://esciencecommunity.umassmed.edu/2012/10/09/creating-an-online-research-data-management-course-a-conversation-with-data-librarians-robin-rice-and-stuart-macdonald/

11. Creamer A, Morales M, Crespo J, Kafel D, Martin E. An Assessment of Needed Competencies to Promote the Data Curation and Management Librarianship of Health Sciences and Science and Technology Librarians in New England. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 10];1(1):18–26. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/4/

12. Creamer A, Morales M, Crespo J, Kafel D, Martin E. Data Curation and Management Competencies of New England Region Health Sciences and Science and Technology Librarians [Internet]. University of Massachusetts and New England Area Librarian e-Science Symposium 2011. Available from: http://escholarship.umassmed.edu/escience_symposium/2011/posters/8

13. Crosas M. The Dataverse Network. The Institute of Quantitative Social Science 2012. Available from: http://thedata.org/

14. D’Ignazio J, Qin J, Kitlas J. Using internship experience to evaluate a new program in eScience librarianship. Proceedings of the 2012 iConference on – iConference  ’12 [Internet]. New York, New York, USA: ACM Press; 2012;601–2. Available from: http://dl.acm.org/citation.cfm?doid=2132176.2132304

15. Dukes P. Maximising value of population health sciences data The role for Data Management Plans MRC data strategy. 2012;(November). Available from: http://blogs.lshtm.ac.uk/rdmss/files/2012/11/4-Dukes-MRC1.pdf

16. Eynden AV Van Den, Corti L, Bishop L, Horton L. Managing and Sharing Data: Best Practices for Researchers. UK Data Arrchive; 2011. Available from: http://data-archive.ac.uk/media/2894/managingsharing.pdf

17. Ferguson J. Description and Annotation of Biomedical Data Sets. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 10];1(1):51–6. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/9/

18. Godlee F. Clinical trial data for all drugs in current use. BMJ [Internet]. 2012 Oct 29 [cited 2012 Nov 2];345(oct29 2):e7304–e7304. Available from: http://www.bmj.com/content/345/bmj.e7304

19. Gore S a. e-Science and data management resources on the Web. Medical reference services quarterly [Internet]. 2011 Jan;30(2):167–77. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21534116

20. Hackett Y. A National Research Data Management Strategy for Canada: The Work of the National Data Archive Consultation Working Group. 2001. Available from: http://www.interpares.org/display_file.cfm?doc=ip1_dissemination_janr_hackett_iassist_quarterly_25_2001.pdf

21. Heidorn PB. The Emerging Role of Libraries in Data Curation and E-science. Journal of Library Administration [Internet]. Routledge; 2011 Oct [cited 2012 Nov 9];51(7-8):662–72. Available from: http://dx.doi.org/10.1080/01930826.2011.601269

22. Hey A, Tansley S, Tolle K. The fourth paradigm: data-intensive scientific discovery [Internet]. Microsoft Research; 2009 [cited 2012 Nov 9]. Available from: http://iw.fh-potsdam.de/fileadmin/FB5/Dokumente/forschung/tagungen/i-science/TonyHey_-__eScience_Potsdam__Mar2010____complete_.pdf

23. Hswe P, Holt A. Guide for Research Libraries: The NSF Data Sharing Policy [Internet]. Association of Research Libraries. 2011 [cited 2012 Oct 11]. Available from: http://www.arl.org/rtl/eresearch/escien/nsf/index.shtml

24. Inouye D, Scheiner S. Some Simple Guidelines for Effective Data Management. Bulletin of the Ecological Society of America. 2009;2:1–10. Available from: http://www.nceas.ucsb.edu/files/computing/EffectiveDataMgmt.pdf

25. Interview with Svetia Baykoucheva and James Mullin: What Do Libraries Have to Do with e-Science ? ACS Division of Chemical Information (CINF ). 2011;1–2. Available from: http://drum.lib.umd.edu/bitstream/1903/11843/1/Baykoucheva_Mullins_eScience.pdf

26. Jahnke L, Asher A, Keralis SDC. The Problem of Data. Washington, DC: Council on Library and Information Resources; 2012. Available from: http://www.clir.org/pubs/reports/pub154/pub154.pdf

27. Johnston L, Lafferty M, Petsan B. Training Researchers on Data Management: A Scalable, Cross-Disciplinary Approach. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 8];1(2). Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss2/2/

28. Kafel D, Morales M, Vander Hart R, Gore S, Creamer A, Crespo J, et al. Building an e-Science Portal for Librarians: A Model of Collaboration. Journal of eScience Librarianship [Internet]. 2012;1(1):41–5. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/7/

29. LeFurgy B. Data-Intensive Librarians for Data-Intensive Research [Internet]. The Signal: Digital Preservation. 2012 [cited 2012 Nov 9]. Available from: http://blogs.loc.gov/digitalpreservation/2012/07/data-intensive-librarians-for-data-intensive-research/

30. Lamar Soutter Library, University of Massachusetts Medical School and the George C. Gordon Library, Worcester Polytechnic Institute. Frameworks for a Data Management Curriculum [Internet]. Worcester; 2011 p. 1–67. Available from: http://library.umassmed.edu/data_management_frameworks.pdf

31. Lesk M. Data curation : just in time , or just in case ? International Association of Scientific and Technological University Libraries, 31st Annual Conference. West Lafayette, IN; 2010. Available from: http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1021&context=iatul2010

32. Mayernik MS. The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation [Internet]. D-Lib Magazine. 2012. Available from: http://www.dlib.org/dlib/september12/mayernik/09mayernik.html

33. Minnesota U of. Data Management 101 – Planning Checklist.

34. Most WC. Keeping Research Data Safe: Cost issues in digital preservation of research data. 2:5–6. Available from: http://www.beagrie.com/KRDS_Factsheet_0910.pdf

35. NISO. Linked Data for Libraries, Archives and Museums. Information Standards Quarterly. 2012;24(2/3). Available from: http://www.niso.org/apps/group_public/download.php/9422/isqv24no2-3.pdf

36. Pathak J, Wang J, Kashyap S, Basford M, Li R, Masys DR, et al. Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience. Journal of the American Medical Informatics Association : JAMIA [Internet]. [cited 2012 Oct 29];18(4):376–86. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3128396&tool=pmcentrez&rendertype=abstract

37. Piorun M, Kafel D, Leger-Hornby T, Najafi S, Martin E, Colombo P, et al. Teaching Research Data Management: An Undergraduate/Graduate Curriculum. Journal of eScience Librarianship [Internet]. 2012;1(1):46–50. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/8/

38. Piwowar HA, Vision TJ, Whitlock MC. Data archiving is a good investment. Nature [Internet]. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2011 May 19 [cited 2012 Nov 9];473(7347):285. Available from: http://dx.doi.org/10.1038/473285a

39. Piwowar H a., Day RS, Fridsma DB. Sharing Detailed Research Data Is Associated with Increased Citation Rate. Ioannidis J, editor. PLoS ONE [Internet]. 2007 Mar 21 [cited 2012 Oct 25];2(3):e308. Available from: http://dx.plos.org/10.1371/journal.pone.0000308

40. Pryor G. Managing Research Data [Internet]. Facet Publishing; 2012 [cited 2012 Nov 9]. p. 224. Available from: http://www.amazon.com/Managing-Research-Data-Graham-Pryor/dp/1856047563

41. Rajaraman A, Ullman JD. Mining of Massive Datasets. Cambridge: Cambridge University Press; 2011; Available from: http://ebooks.cambridge.org/ref/id/CBO9781139058452

42. Reznik-Zellen R, Adamick J, McGinty S. Tiers of Research Data Support Services. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 10];1(1):27–35. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/5/

43. Rosenthal DSH, Vargas DL. LOCKSS Boxes in the Cloud. 2012. Available from: http://www.lockss.org/locksswp/wp-content/uploads/2012/09/LC-final-2012.pdf

44. Rosenthal D, Rosenthal D, Miller E. The Economics of Long-Term Digital Storage. fsl.cs.sunysb.edu [Internet]. [cited 2012 Dec 2];1–8. Available from: http://www.fsl.cs.sunysb.edu/docs/unesco12/UNESCO2012-storage-econ.pdf

45. Salo D. Retooling Libraries for the Data Challenge [Internet]. Web Magazine for Information Professionals. 2010 [cited 2012 Nov 9]. Available from: http://www.ariadne.ac.uk/issue64/salo

46. Schemes M. Understanding Metadata. Bethesa, MD: NISO Press; 2004. Available from: http://www.niso.org/publications/press/UnderstandingMetadata.pdf

47. Society TR. Science as an open enterprise. London: The Royal Society; 2012. Available from: http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/projects/sape/2012-06-20-SAOE.pdf

48. Starr J, Willett P, Federer L, Horning C, Bergstrom M. A Collaborative Framework for Data Management Services: The Experience of the University of California. Journal of eScience Librarianship [Internet]. 2012 Oct 3 [cited 2012 Nov 10];1(2):109–14. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss2/7

49. Strasser C, Cook R, Michener W, Budden A. Primer on Data Management: What you always wanted to know [Internet]. 2012. p. 1–11. Available from: http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf

50. Tenopir C, Birch B, Allard S. Academic Libraries and Research Data Services: Current Practices and Plans for the Future [Internet]. 2012. Available from: http://www.ala.org/acrl/sites/ala.org.acrl/files/content/publications/whitepapers/Tenopir_Birch_Allard.pdf

51. Thibodeau K. Certificate of Advanced Study in Digital Preservation. Proceedings of the 1st International Digital Preservation Interoperability Framework Symposium on – INTL-DPIF  ’10 [Internet]. New York, New York, USA: ACM Press; 2010;1–9. Available from: http://dl.acm.org/citation.cfm?doid=2039263.2039264

52. Trinidad SB, Fullerton SM, Bares JM, Jarvik GP, Larson EB, Burke W. Genomic research and wide data sharing: views of prospective participants. Genetics in medicine : official journal of the American College of Medical Genetics [Internet]. 2010 Aug [cited 2012 Oct 29];12(8):486–95. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3045967&tool=pmcentrez&rendertype=abstract

Librarians & Data Management: Useful Resources for Learning About & Implementing a Data Management Resource Guide

After writing my post on the data curation lifecycle a while back, I have been slowly accumulating a bank of academic libraries who offer data management guides. Because data management is now a facet of librarianship that has come to the forefront of our profession, I felt that it was appropriate to explore a number of these university libraries to understand what types of information these guides provide to patrons. For this post,  I’ve taken the time to include a brief list of what I believe to be good examples of data management guides with the hope that it will help librarians prepare their own data management guides within their institution. I’ve also included some other resources on this topic that have been helpful to me when learning more about this area. 

If any readers know of other useful resources in this area, please send them along to me as this is in no way meant to be a comprehensive list. I think this is an exciting area for librarians (especially medical ones like me), and the libraries below have already done an excellent job in establishing their role within the discipline. 

Data Management Resources

University of Minnesota — Managing Your Data

The University of Minnesota has an excellent resource guide on data management in that it provides examples of requirements that need to be fulfilled when working with grants such as the NSF; it includes a step-by-step approach to developing a plan; provides examples of existing data management plans to view; and gives reasons as to why data management is important. Minnesota’s library is very active in the data management realm and offer workshops that you can watch online here.

MIT — Data Management and Publishing

Not surprisingly, MIT offers an impressive resource guide for data management. Included in the guide is everything from a data planning checklist to a page on how to share data in various repositories. What I appreciate about the data sharing component of this guide is that MIT is encouraging it’s researchers to venture into a new area of scholarly publishing. This recommendation is a great approach for attracting university researchers to MIT’s DSpace repository, and other open data sharing repositories.

California Digital Library — Manage Your Data

Like the other data management guides in this list, California’s Digital Library (CDL) provides a nice clean and straightforward approach to teaching their users about data management. I particularly liked their citing data page because it provided easy to follow steps and gave a lot of different examples. What sets CDL apart from the others is that they have the DMPToolkit which is an unbelievable resource that allows for a researcher to create ready-to-use data management plans for specific funding agencies; meet requirements for data management plans; get step-by-step instructions and guidance for data management plan; and learn about resources and services available at your institution to fulfill the data management requirements of their grants.

University of Washington – Data Management Guide

University of Washington (UW) provides a traditional libguide on the subject of data management. What I like about this particular resource is that they provide very clear instructions on the importance of organization and structure, and take the researchers through the process step by step. Like MIT, UW has a page on the importance of data sharing and provides links to a variety of file sharing and storing services.

Purdue University – Data Management Hub

Purdue has been a major player in the adoption of data management roles for librarians through the creation of their excellent Data Curation Profiles Toolkit, which provides a guide for librarians to interview researchers about their data and help build a data management plan. The Data Management Hub at Purdue is a stellar example of a library implementing data management practices as they offer everything from examples of Data Management Plans to a Data Plan Self Assessment Tool that you can download in pdf format.

Other Useful Tools

Digital Curation Centre Checklist for a Data Management Plan

This document contains the 118 headings and questions that make up the DCC’s Checklist for a Data Management Plan

Emerging Technologies: Some Simple Guidelines for Effective Data Management

This document provides some simple guidelines for effective data management, which, if put into practice, will benefit the original data owner as well as enhance prospects for the long-term preservation and re-use of the data by other researchers.

E-Science Talking Points for ARL Deans and Directors

This document provides answers to some of the questions the ARL is asking about the most relevant areas for library involvement in e-science projects. It also provides examples of library involvement in the data arena.

University of Edinburgh MANTRA Research Data Management Training

While MANTRA is a course designed for PhD students and others who are planning a research project using digital data, it is an excellent introduction to librarians who do not have the experience of working with data management. There are videos of researchers within the course, and mini-exams after each unit to help test your knowledge. I highly recommend it.

Recent Digitization Projects from Librarians and Archivists: A Great Example of our Expertise

I haven’t written a post in a few weeks so I thought I would return with a short post highlighting some of the great work that is being done by librarians and archivists to preserve both print and born digital material. Enjoy!

The Salman Rushdie Archive (Emory University)

The Salman Rushdie Archive is an excellent example of what librarians and archivists can do with born digital material. In this case, Emory has chosen to preserve all of Rushdie’s  manuscripts, drawings, journals, letters and photographs. What is amazing about this collection is that they have also created the actual digital environments (several computers) that Rushdie used to produce his work. This project is the most complete set of born-digital records to date (according to Emory), and provides a gold standard for how libraries and archives can raise the bar to provide important historical information to patients. Take a look at the video below to get a glimpse of what the digital environments look like. His computers even replicate crashes like they would when he was using them!

I love seeing the old Mac iOS recreated with all of Rushdie’s records on them. As artists, authors and researchers continue to use material electronically, those who abide by strong record keeping practices will hopefully be able to have their material preserved as they used it. The Emory project is the first step in the right direction.

National Library of Medicine Exhibition Programs

The National Library of Medicine’s History of Medicine Division has done an excellent job promoting the vast amount of material they have within their library. Through the exhibition programs webpage, users have an opportunity to browse through historical images on everything from Shakespeare and the Four Humors to Forensic Views of the Human Body. By providing beautifully scanned images and comprehensive historical information, these exhibitions provide an opportunity for the general public to observe and learn about important material related to the history of medicine.

Balinese Digital Library Collection

Available through the Internet Archive, the Balinese Digital Library provides access to manuscripts that are comprised of everything from  religion, holy formulae, rituals, family genealogies, law codes, treaties on medicine, arts and architecture, calendars, prose, poems and magic! What I have found most interesting about this collection is that it contains information on important issues such as medicines and village regulations that are used in daily practice. It is also important to note that this collection is the first complete literature of the Balinese.

The Internet Archive is now home to 10 petabytes of data and is an excellent resource if you’re interested in historical material.

Wellcome Collection

Supported by the Wellcome Trust, the Wellcome Collection provides the public with an opportunity to read articles, view images and watch videos on a variety of subjects related to the human body. The collection is separated into the following categories: Life, Genes & You; Mind & Body; Sickness & Health; Time & Place; Science & Art; and Education.

This collection has so much excellent material I don’t even know where to begin. As a visitor to the site, you have an opportunity to look at everything from Crick’s preliminary sketch of DNA, to fabulous satirical medical images. Take the time to explore this site, I’ve spent hours on it already while writing this post. 

I’ve tried to highlight my favourite examples of library and archival projects here to provide a glimpse of the great work we do to provide access to historical material. I am excited to see what projects come from the inspiration of seeing the Emory Salman Rushdie Archive, as I think recreating the digital environment is an excellent idea for future collections. Science and medical researchers are exclusively using digital formats to store, share and interpret data. It is vital that as librarians and archivists we work with these groups to manage their data and preserve it in ways that will allow us to present it in a coherent way in the future. How else can we be sure that this data will be available to people in the future? More to come on this in my next post! 

Librarians: It’s time to stand up for ourselves!

I recently attended a library association meeting where I heard a familiar concern among my colleagues: “Our products and services are used widely by many within our institution, yet nobody knows that we(librarians) developed them.” Similarly, there seemed to be a lot of uncertainty amongst the group about how to transform our roles and provide services that will benefit our patron base. The remainder of the discussion involved the doom and gloom of libraries… I personally found this discussion incredibly frustrating because I have heard it time and time again. Librarians seem to love sitting around and talking about how we are suffering and losing patrons left, right and center; how does this help us improve? I know librarians are capable of doing amazing things with information and developing services that can benefit users, we just need to actively go out and do it. For this post I want to address a few things that I think librarians need to do on a regular basis to remain relevant and stand up for themselves.

Develop Relationships

Liaison and subject librarians already have a head start on this one, but I think developing and fostering relationships among our patron groups is essential. With the emergence of embedded librarianship, I believe that librarians need to take a more hands on approach to their work. In order to find out what will benefit a patron base, we need to go out and ask them directly. In an academic library, librarians need to ask faculty and students how they can better serve their needs. This approach accomplishes two things: first, it gives us an opportunity to explain and demonstrate our skills and expertise, and second it allows us to hear firsthand how our patrons function and evaluate how we can help them.

I think that fostering relationships is the first step towards changing the ways librarians do their work. The stronger the relationships, the better the opportunity to create information resources and services that will benefit our patrons. Furthermore, when it comes time for annual review, we will be able to provide actual evidence that can prove that our work has been beneficial. 

Expand our role

We have to continually explore new ways that we can help our users — simple as that. Many librarians are now seeking out new ways they can provide services that range from helping researchers with data management to becoming more embedded within an institution. Specific roles I have been really impressed with are as follows:

NIH Library’s Informationists

The NIH Library’s Informationists are really pushing the envelope in terms of services they provide to patrons. Being completely embedded within clinical teams, these librarians have specialized knowledge and clinical expertise that allows for them to be a contributing member of a clinical research team. These librarians take multiple continuing education (CE) courses in information technology and biomedical sciences in order to keep up with their patrons. I think librarians in general should follow this model in the sense that taking CE courses should be a regular part of our jobs. As a medical librarian I believe it is important to learn how my patrons work within their research environment. Moreover, I believe learning about new technology and information practices will help me discover new ways to help users.

Purdue University’s Digital Curation Profiles

Purdue’s Digital Curation Profiles (DCP) provide an opportunity for academic librarians to enter a new role by helping academic researchers manage and understand their data. A DCP is designed to be “essentially an outline of the “story” of a data set or collection, describing its origin and lifecycle within a research project.” These DCPs allow librarians to meet with and understand how their patrons perform their research and manage their data. From this evaluation, librarians can assist with researchers’ data management methodology so that it can be preserved or uploaded into an information repository. Moreover, this exercise gives the librarian an opportunity to learn about how their users actually perform their research. The DCPs and data management in general are areas where I believe librarians should get more involved as it 1) fosters a relationship with a specific patron base, 2) provides a new role for librarians, and 3) allows librarians to provide tangible evidence in the form of DCPs as to the services they provide. 

University of Massachusetts’ Journal of eScience Librarianship

UMass has received a grant from the National Library of Medicine to create a journal where eScience is the topic of discussion. The journal is designed to “advance the theory and practice of librarianship with a special focus on services related to data-driven research in the physical, biological, and medical sciences. The journal explores the many roles of librarians in supporting eScience and welcomes articles related to education, outreach, collaborations, and current practices, by contributors from all areas of the globe.” I think this is an excellent initiative as it fosters collaboration amongst librarians and provides a new way to educate librarians about a new topic. The collaborative nature of librarians is something I have discussed before, but we need to support the interpersonal nature of the field as we move into new areas. The Journal of eScience Librarianship is just one, but excellent example of this.

Assessment! Assessment! Assessment!

This one is a no brainer – librarians need to perform more evaluations and assessments of their services. If you’re an instructional librarian, it is vital that you survey your audience so that we can prove the students are learning something. These evaluative measures also provides insight into how we can improve our services. Similarly if you’re a subject librarian, allow your patrons to evaluate you and the services you provide: How can your services be improved? What do these patrons expect from their librarian? If they don’t know – explain to them how you can better serve their needs!

One initiative I am impressed with is the ACRL’s “Assessment in Action: Academic Libraries and Student Success.” This study sets out to build a professional development program to strengthen the competencies of librarians in campus leadership and data-informed advocacy. Individual libraries can follow this model to ensure that if they ever come under scrutiny (and surely they will), they can prove how their efforts have been successful within their institution. 

Final Thoughts

The basic point I am trying to get across is that librarians experienced and inexperienced cannot sit idly and hope that they will remain relevant within their institution. As information specialists we need to keep up with emerging technologies in order to provide services that our patrons need. After all, isn’t learning new things  one of the main reasons we all love this profession?

We also need to get out of our comfort zones and head into the environments where our patrons spend their time. How are our patrons going to know how we can help them if we don’t tell them?! It may seem daunting to some, but we need to prove to our institutions that we are essential and that the expertise we possess can improve the work our patrons do. The relationships we build will help us prove our worth, as well as make our jobs more dynamic and interesting.

Finally, evaluating our services is incredibly important. How so many librarians have gotten by without doing this until now is baffling to me. Just like any thing we do, it is important that we know if you we’re doing it well and if it has value. With every instructional session we give, every research guide we create, and every hands on information service we provide, we need to know that it is effective and meeting the needs of our users. 

Using these measures can help us improve as librarians and give us an opportunity to stand up for ourselves.. Maybe then we won’t have to attend any more library meetings where the topic of discussion is the doom and gloom of the librarians.

As a librarian, how do you stand up for yourself within your institution?