Librarians working in data management: How to avoid a data management nightmare

I thought I would quickly share a video that I created in collaboration with two of my excellent colleagues — Karen Hanson and Alisa Surkis. We are developing short data management modules for a clinical department at our institution that will cover everything from selecting the right data collection tool to file naming conventions. This first video was developed to serve as a teaser for the more focused modules.

The first module has already been well received within the department we are working with, and we hope that this will catch on with other departments as we move forward. As always, I’m happy to hear feedback or answer questions about how we developed the module or what we’re using it for in more detail. In the meantime, I hope you enjoy:

Why librarians can’t ignore data anymore

It’s here, and I have to say it came quicker than I expected – the first big stick for researchers – the PLoS data sharing policy. What this policy means for researchers is that if they refuse to share the data accompanying their publication – they can’t publish in PLoS. It also means that if they get published, but then hide their data after the fact they can have their publication retracted. This is an example of a really firm hand in an area where there hasn’t been one before. My first thought when I read this: what an amazing opportunity for librarians! It’s no secret that researchers have mixed feelings about the policy; some are angry and frustrated, others see the light and understand that this has been a long time coming. What librarians can do is ease the pain a little bit and try as best they can to reduce the burden on these researchers and provide them with options that will make this transition as easy as possible. While PLoS is the only publisher providing this big stick for data sharing, I expect Nature, Science and others will be following suit before long. Not to mention the various federal policies from the NSF, NIH, and now finally Canada with the Tri-Council looking to capitalize on Big Data.

So what can you do, even if you aren’t that familiar with data management, or data sharing policies?

Familiarize yourself with data repositories and their policies

One of the requirements of this data policy from PLoS is that they strongly recommend that researchers deposit their data in a public repository, so that their data can receive a DOI, accession number, or any other unique identifier. This is a simple step to providing researchers with valuable information. There are so many different options out there for researchers, and they may only know of a few different options – if any. Learning about what is available to researchers with respect to subject specific or even general data repositories (as well as any fees that may apply) can go a long way towards steering them in the right direction. Here are some options:

A recent blog post by John Kratz and Natsuko Nicholls on DataPub also provides valuable information about finding a suitable repository, and they do a very good job of outlining the differences between Databib and re3data.

Find out what your own institution offers

Our institution spoke with PLoS and found out that they also accept handles as a form of unique identifier. What this means is if you have an institutional repository that supports handles, DOIs, or any other type of unique identifier, you may already have a solution for your researchers. Check with those responsible for your institutional repositories to see if they can handle supporting researchers data. Questions to ask would include: does the metadata support data? What is the maximum file size you accept? Can you link multiple records in the repository together?

Let researchers know you are aware of the policy, and that the library is there to support them

At our institution, our first instinct was to see how many of our researchers have published in PLoS over the years. We found that in total, we had over 800 – but when you narrowed it down to first and last authors, we got a number closer to 130. We made an active decision to reach out to these authors know about the policy, and made an effort to find out what our institution has to offer, as well as other options they could pursue. The goal here was to let everyone know that the library was on top of it, and if they need support in this effort, we were going to be there.

Additionally, the library decided to send out a broadcast email to the entire institution to let them know about PLoS’s new policy, and that we were on top of it. We wanted to do this quickly to make sure everyone knew that the library was the place to go for these types of questions.

Go out and talk to your researchers

If anything, the one thing that can’t hurt is to try and reach out to the various areas you have – either as subject librarians or liaison librarians. We’ve just finished an exercise where we met and interviewed 30+ researchers with active grants at our institution (results to be published later this year) to learn more about issues surrounding how they manage, organize, store, preserve, reuse and share data. This exercise was invaluable as it provided us with multiple scenarios where they could be supported by the library. Even starting the conversation around how they feel about the PLoS data sharing policy is a good idea. More of these policies are going to emerge, so it’s best to start now. 

What if I don’t feel comfortable with the content yet?

That’s fine, but you could start by reading the plethora of literature out there on the topics of data management, sharing, storage, preservation, reuse – the list goes on and on. It’s also great to speak with other librarians who have been active in this area – I’m always open for a talk about research data! I’ve included some resources below that are a good start:

For librarians concerned about their role in the library, or looking for new opportunities to branch out and stake a claim in another area of the information profession – this is your chance. Talk to your researchers, learn to provide them with the support they need, and stay active in this area because research data management and sharing is only going to grow, and I know I am one librarian who does not want to be left behind. 

I’d love to hear in the comments about how your library is tackling this issue – if at all. I would also be keen to know the reasons why you won’t be pursuing this issue. Thanks for reading!

Data Publishing: Who is meeting this need?

I realize I haven’t written a post in over a month, and I feel horribly guilty about it. The one good thing about not having the time to write blog posts frequently is that I now have a stockpile of ideas, and plenty of material to write more frequent posts.

What I would like to address in today’s post is some of the ongoing efforts from journals, government agencies, and open source communities have taken to address the need to publish data, in all of its messy and intricate formats. Similar to my previous posts, I will describe each of the efforts that I find to be promising in terms of their ability to tackle this massive, and complicated task. In case readers are unfamiliar with the concept of a data publication, I define the concept based on a hybrid of different viewpoints from papers by Borgman, Lynch, Reilly et al., Smith, and White:

A data publication takes data that has been used for research and expands on the ‘why, when and how’ of its collection and processing, leaving an account of the analysis and conclusions to a conventional article. A data publication should  include metadata describing the data in detail such as who created the data, the description of the type of data, the versioning of the data, and most importantly where the data can be accessed (if it can be accessed at all). The main purpose of a data publication is to provide adequate information about the data so that it can be reused by another researcher in the future, as well as provide a way to attribute data to its respective creator. Knowing who creates data provides an added layer of transparency, as researchers will have to be held accountable for how they collect and present their data. Ideally, a data publication would be linked with its associated journal article to provide more information about the research.

With all that being said, lets take a look at some of the efforts that currently exist in the data publishing realm. Note that clicking on the images will take you to the homepages of each resource.

Nature Publishing Group – Scientific Data

Scientific Data

Scientific Data is the first of its kind in that it is an open access, online-only publication that is specifically designed to describe scientific data sets. Because the description of scientific data can be a complicated and exhaustive, this publication does an excellent job of addressing all of the questions that need to be asked of researchers before they even think of submitting their data. Scientific Data just came out with their criteria for publication today, and the questions they ask are exactly what is needed to ensure that the data publication will be able to be reused through appropriate description.

Then comes the next great component – the metadata. Scientific Data uses aData Descriptor’ model that requires narrative content about a data set such as the more traditional descriptors librarians are familiar with such as Title, Abstract and Methodology. What is excellent about the Data Descriptor model is that it also requires structured content about the data.  This structured content uses the an ‘Investigation’, ‘Study’ and ‘Assay’ (ISA) open source metadata format to describe aspects of the data in detail. These major categories are apparently designed to be ‘generic and extensible’, and serve to address all scientific data types and technologies. You can check ISA out HERE.

Overall I think that Scientific Data is the beginning of a new trend in publishing where major journals will begin to publish data publications more frequently on top of traditional research articles. This publication is the first step towards making research data available, reusable and transparent within the scientific research community.

F1000Research – Making Data Inclusion a Requirement

F1000Research   An innovative OA journal offering immediate publication and open peer review.

F1000Research is an excellent new open science journal that has caught my attention for its foray into systematic reviews and meta analyses and for its recent ‘grace period’ to encourage researchers to submit their negative results for publication. I think that this publication that medical librarians should be aware of, and potentially encourage researchers to submit to should they be looking for a more frugal option. What really impresses me with F1000Research though, is their commitment to ensuring that data associated with research articles is made readily available.

Currently, F1000Research reviews data that is submitted in conjunction with an article, and then offers to deposit the data on the authors behalf in an appropriate data repository. The journal is open to placing in data in any repository, but they work mainly with figshare - a popular platform for sharing data.  Together figshare and F1000Research have created a ‘data widget’ that allows figshare to link data files with its associated article in F1000Research – which is excellent! There was a recent blog post written about this widget here that can give it the attention it deserveshttp://blog.f1000research.com/2013/05/23/new-f1000research-figshare-portal-and-widget-design/). F1000Research is also apparently working on a similar project with Dryad. I think that moving forward we will see more efforts from journals like F1000Research to seamlessly connect their publications with associated data. This is a crucial component to publishing data as the journal article provides the context in terms of how the data was used. 

Dryad – Integrated Journals

Dryad Digital Repository   Dryad

Dryad is a data repository and service that offers journals the option of submission integration with their system. The service is completely free and is designed to simplify the process of submitting data, and ensure biodirectional links between the article and the data. Currently Dryad provides an option for data to be opened up to peer review, but I would like to see that become more of a requirement going forward. Here is a link to Dryad’s journal integration page: http://datadryad.org/pages/journalIntegration

Currently there are a number of journals currently participating in this effort, and a complete list of them can be seen HERE. Carly Strasser also did a great job of outlining other journals that require data sharing in her post about data sharing on the excellent blog Data Pub. I think Dryad is a perfect example of the other side of traditional publishing. We need data repositories like Dryad and figshare to continue supporting data publication and storage, as they represent half of the picture that will allow articles and data to be connected.

The Dataverse Network

Screenshot_1The Dataverse Network is a data repository designed for sharing, citing and archiving research data. Developed by Harvard and the Data Science team at the Institute for Quantitative Social Science, Dataverse is open to researchers in all scientific fields. As a service, Dataverse organizes its data sets into studies; each study contains cataloguing information along with the data, and provides a persistent way to cite the data that has been deposited.

Dataverse also uses Zelig (an R statistical package) software that provide statistical modeling of the data that is submitted. Finally, Dataverse can also be installed as a software program into their own institutional data repositories. I see the ability to download Dataverse for institutional purposes to be an excellent prospective strategy; as more academic institutions begin to develop data storage capabilities to their institutional repositories, Dataverse will provide some much needed assistance in this arena.

GitHub: Git for Data Publishing

GitHub · Build software better  together.

Although I would not call myself an expert of the GitHub world, I will say that I recognize a fruitful initiative to publish data when I see one. In a recent blog post by James Smith talking about how the tools of open source could potentially revolutionize open data publishing. The post is great and you can read it here: http://theodi.org/blog/gitdatapublishingutm_source=buffer&utm_medium=twitter&utm_campaign=Buffer&utm_content=buffer6c57f James’ idea is to upload data to GitHub repositories and use a DataPackage to attach metadata that will sufficiently describe the data. Ultimately the goal of using GitHub for data publication would enable sharing and reuse of data within a supporting and collaborative community. While some of this can get complicated, working through the links from his post really provides you with a sense of how an open source community is coming together to address the need to publish data.

Biositemaps

National Centers for Biomedical Computing

Biositemaps is a working group within the NIH that is designed to: 

(i) locating, (ii) querying, (iii) composing or combining, and (iv) mining biomedical resources

‘Biomedical resources’, in this case can be defined as anything from data sets to software packages to computer models. What is most interesting about Biositemaps is that they provide an Information Model that outlines a set of metadata that can be used to describe data. Using the Information Model as a base for data description, it then uses a Biomedical Resource Ontology (BRO); BRO is a controlled terminology for the ‘resource_type’, ‘area of research’, and ‘activity’ to help provide more information about how  data is used, and how it can be described in detail using biomedical terminology. I will admit this resource is still pretty raw, but I think it has a lot of potential for being an excellent resource moving forward. The basic idea behind Biositemaps is that a researcher fills in a lengthy auto-complete form describing themselves, their data, and the methodology used to create the data. Once the form is complete, it produces an RDF file that is uploaded to a registry where it can be linked to, and from anywhere. If you are a medical librarian and you have researchers interested in publishing data, I encourage you to take a look at this resource.

SHARE Program – Association of Research Libraries (ARL), Association of American Universities (AAU), the Association of Public and Land-grant Universities (APLU)

This effort just came out last week, but the ARL, AAU and APLU are joining together to create a shared vision of universities collaborating with the Federal government and others to host institutional repositories across the the memberships to provide access to public access research – including data. While it is not entirely clear how this will be achieved – especially in the realm of data – I think that this is the type of collaboration that will provide a well researched, evidence based solution moving forward. I hope that SHARE continues to expand beyond the response to the OSTP memo, as I think Canadian academic institutions could benefit greatly from this effort. Here is a link to the development draft for SHARE: http://www.arl.org/storage/documents/publications/share-proposal-07june13.pdf

For Medical Librarians

My goal in presenting these data publication efforts is an attempt to get medical librarians to think more about the options that are available for data publication. Journals, government agencies and open source communities are all trying to address the issues surrounding data publication, and I think it is our duty as medical librarians to familiarize ourselves with journal policies around data sharing; data publication initiatives like DataCite, Dryad, and figshare; and new government efforts like Biositemaps that are becoming more heavily used every day, and will be relevant for our liaison and research areas of practice moving forward. I have tried to provide a lot of links within this post, but I’ve included some more reading below that may be useful. I’d also like to mention that this is by no means an exhaustive list, but rather some of the interesting efforts i’ve seen throughout my work with data. Please feel free to add as you wish in the comments section.

Readings/References

1. Borgman CL, Wallis JC, Enyedy N. Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries. International Journal of Digital Libraries [Internet]. 2007;7:17–30. Available from: http://escholarship.org/uc/item/6fs4559s#  

2. Lynch C. The shape of the scientific article in the developing cyberinfrastructure. CT Watch Quarterly [Internet]. 2007;3(3):5–10. Available from: http://www.ctwatch.org/quarterly/articles/2007/08/the-shape-of-the-scientific-article-in-the-developing-cyberinfrastructure/  

3. Piowowar H, Chapman W. A review of journal policies for sharing research data. Nature Precedings [Internet]. 2008. Available from: http://www.academia.edu/904922/A_review_of_journal_policies_for_sharing_research_data

4. Reilly S, Schallier W, Schrimpf S, Smit E, Wilkinson M. Report on Integration of Data and Publications [Internet]. 2011: p. 1–7. Available from: http://www.alliancepermanentaccess.org/wp-content/uploads/downloads/2011/10/ODE-ReportOnIntegrationOfDataAndPublications-exesummary.pdf  

5. Smith VS. Data publication: towards a database of everything. BMC research notes [Internet]. 2009 Jan [cited 2013 Mar 3];2:113. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2702265&tool=pmcentrez&rendertype=abstract  

6. Whyte A. IDCC13 Data Publication: generating trust around data sharing. Digital Curation Centre [Internet]. 2013 Jan 23; Available from: http://www.dcc.ac.uk/blog/idcc13-data-publication-generating-trust-around-data-sharing

Altmetrics and Evaluating Scholarly Impact: What’s out there and how can we participate?

Alternative metrics (altmetrics) – better known as new ways to measure research impact – raise a lot of questions amongst the scientific community. What do these metrics actually mean? And more importantly, what do they actually measure? It’s hard to measure the impact of a research article based on how many times it has been tweeted or posted to facebook: how does that prove that the person posting it actually read the article? Or used it within their own research?

Personally, I love the idea of altmetrics, but I don’t think it has quite reached the point where we can compare it to the impact-factor or the h-index of a journal article (although these are ultimately flawed as well). Heather Piowowar does an excellent job of describing altmetrics from her article in Nature and it aligns well with my own ideas of what altmetrics try to achieve:

“Altmetrics give a fuller picture of how research products have influenced conversation, thought and behaviour.”

I like to think of the “fuller picture” of altmetrics as the evolving story of a journal article. Altmetrics doesn’t necessarily tell us how influential or prominent a journal article has been, but it tells us about how it has been used, shared and communicated over time via social media, the web and the scholarly community. Eventually, I think that the emergence of several prominent altmetric platforms there will eventually lead to a more effective way to evaluate scholarly impact in the form of a hybrid system. In fact, an article written yesterday by Pat Loria from LSE blogs states that “as more systems incorporate altmetrics into their platforms, institutions will benefit from creating an impact management system to interpret these metrics, pulling in information from research managers, ICT and systems staff, and those creating the research impact”. His post is definitely worth a read and would be a great follow up to the content I will present here. He even compares several of the altmetrics platforms that I will outline in this post.

For this post, I thought it would be a good idea to introduce some of the most prominent altmetric platforms within the scholarly publication ecosystem. Below I will describe each altmetric platform and explain how it communicates the impact and metrics of scholarly research to hopefully provide a better understanding of how this type of measurement works.

Impact Story

impactstory

ImpactStory aligns well with my idea of altmetrics because its goal is to tell the story of how research and scholarly publications are shared and discussed. ImpactStory tracks metrics across a variety of commonly used services such as Delicious, Scopus, Mendeley, PubMed and even SlideShare (among many others). You can import your Google Scholar profile, or even your Dryad records. Once you have imported the service you want to measure, Impact Story tells you how many times an article has been saved by scholars, how many times it has been cited by scholars, how many people have discussed it in public (via Twitter, Facebook, etc.) and how many times it has been cited by the public (eg. Wikipedia article, Blog post).

Anyone who has research material in any of the platforms that ImpactStory supports can view their metrics very easily by creating their own collection. Researchers can also embed a widget into their websites that will attach ImpactStory metrics to their citations, indicating if an article is highly discussed or cited by scholars and the public. I think ImpactStory is an excellent model for altmetrics because it is comprised of traditional metrics and new, social metrics suitable for discovering web impact.

Altmetric

altmetric

Perhaps the most well known of the altmetrics tools, Altmetric provides three main products that provide embeddable content about particular journal articles. The most prominent product from Altmetric is their Explorer program; this program is comprehensive in that it provides information about how many times an article has been viewed and the rankings from the journal they are from. Explorer also provides a list of social components like how many times an article has been picked up on a news feed; how often it has been tweeted; who has discussed it on Google+ and several other social media platforms. Using Explorer a researcher can even see the demographics of who has seen their article. This is an excellent feature as it provides people with an idea of who is looking at the material. As a librarian, I would be interested to know who is looking at my research: librarians? doctors? the scientific research community? 

Altmetric also provides services for publishers where they can embed Altmetric badges that will provide additional information about their articles. Publishers can customize their pages that present the metrics so that their branding can be included.

Finally, Altmetric has a bookmarklet that will provide altmetrics about an article you’re reading. I personally use this feature for fun because it is interesting to learn a little bit more about how an article has been used.. The only problem is that Altmetric does not have the data for every single journal publication. This means that a large portion of the time I’m clicking on the bookmarklet for an article that I’m reading and there is no data available. This is the case especially with library literature – this could be incentive to try and get the LISA and LISTA databases on board. Either way, if you’re interested you can add the bookmarklet HERE.

Plum Analytics

Plum Analytics

Plum Analytics is the third power player in the altmetrics arena. The goal of Plum Analytics  is to ” to give researchers and funders a data advantage when it come to conveying a more comprehensive and time impact of their output”. Plum collects altmetrics and categorizes their metrics into five different groups: usage, captures, mentions, social media, and citations.

For usage, Plum looks at downloads, views, book holdings, ILL, and document delivery. This is where the library component comes in. If altmetric platforms like Plum are tracking ILL’s and document delivery requests for research literature, librarians should be aware of this and look to contribute to the effort.

The second category, captures, provides information about the favorites, bookmarks, saves, readers, groups, and watchers of an article.

Mentions cover the blog posts, news stories, Wikipedia articles, comments, and reviews of research articles.

Social media refers to the tweets, shares, +1’s and likes based on a research article, and finally citations in Plum Analytics currently cover PubMed, Scopus and Patent citations. You can look at their information page to see how they define all of their terminology.

Peer Evaluation

peerevaluation

Peer Evaluation is a different sort of altmetric platform in that it is designed an open peer review service where researchers can curate their own peer review process for scholarly publications. The goal of peer evaluation is for researchers to make their work visible within their community, and be able to track the impact and reuse of what they share. Researchers can submit their articles, data, working papers, books, etc. to Peer Evaluation and have other researchers review their work. Furthermore, because this is a community effort the researcher can in turn review other peoples work as well. Peer Evaluation provides qualitative and quantitative metrics that help the researcher understand the impact of their work, and then be able to share their feedback with others in their community. This idea is very unique within the altmetrics realm, and there has been a considerable amount of participation from the scientific community.

Research Scorecard

researchscorecard

Research Scorecard is a company devoted to “characterizing and quantifying scientific expertise to facilitate scientific collaborations”. Focusing primarily on the biotechnology and pharmaceutical domains, Research Scorecard builds reports and databases for researchers and academic institutions to evaluate the products that they use and how they are used, the people that they collaborate with, the metrics about a specific scientist or researcher, and the funding history of an individual or organization. Research Scorecard is slightly more commercialized than the other platforms that I’ve mentioned here, but I still think it provides valuable information about products, services and researchers within the scientific community.

Librarians! How can we participate?

Librarians should be thinking about how we can best incorporate altmetrics into our own work lives. Librarians working in research environments will need to keep up with altmetrics to evaluate the impact of literature needed for their collection, and to direct researchers to high impact journals for publishing. The shift towards open access publishing will also make altmetrics a valuable tool for librarians to evaluate the impact and quality of these publications. As an academic librarian, I would love to see tools like Altmetric Explorer embedded into a university’s discovery search system or institutional repository.

I think that as altmetrics start to develop a more comprehensive picture of scholarly impact, we will begin to see wider adoption from the scientific community. As Loria states in his blog post, the combination of several platforms in what he calls an Impact Management System (IMS) will be the turning point for altmetrics. If an IMS service can combine all of these research outputs and impacts into one system, it can facilitate the dissemination of a more complete set of research metrics including everything from community and academic impacts to social communication indicators.

Loria makes the point that: “Librarians can help, with their data management skills and aptitude for storytelling.” I have no doubt in my mind that librarians can help, but it is up to us to reach out to these altmetric communities early on so that we can contribute in any way we can. I think it is at least our duty to educate ourselves on the benefits of altmetrics and their potential significance for informing the patrons that we serve.

Other Altmetric Platforms

PaperCritic

ScienceCard

Symplectic

VIVO

References

1. Loria P. The new metrics cannot be ignored – we need to implement centralised impact management systems to understand what these numbers mean [Internet]. London School of Economics and Political Science Blog. 2013. Available from: http://blogs.lse.ac.uk/impactofsocialsciences/2013/03/05/the-new-metrics-cannot-be-ignored/

2. Piwowar H. Altmetrics: Value of all research products [Internet]. Nature. 2013 Jan;493(159).Available from: http://www.nature.com/nature/journal/v493/n7431/full/493159a.html

Drupal Ladder: A great learning tool for librarians

Recently I attended a workshop at the NIH Library on learning how to use Drupal called Drupal4Gov. The workshop wasn’t designed for librarians but I definitely found the workshop useful and thought I would pass along the information. And even though this was a government workshop, the things I learned are applicable to any environment – especially a library-related one.

The great thing about Drupal is that once you get past the difficulty of installing it, it is very easy to use and there is a wealth of support on the web and within the Drupal community itself. So keep reading if you’re interested in learning a new skill, or are thinking about using Drupal as a content management system in your library. 

What is Drupal?

I thought it would be fruitful to explain Drupal before I start explaining the tools that I used to learn the software. Drupal is simply (from the website):

…an open source content management platform powering millions of websites and applications. It’s built, used, and supported by an active and diverse community of people around the world.

Basically Drupal is an easy way to develop websites, and other applications for your business or institution. From a library perspective, Drupal can run your library website, support your OPAC, and link out to your subscribed databases. Think of Drupal like the WordPress platform, but with many more features that are more intuitive.

What is Drupal Ladder?

Drupal Ladder is a website that contains (or links to) lessons and materials to help people learn about and contribute to Drupal. The site was created by the Boston Initiative to help Drupal user groups develop and share and develop materials. These lessons are designed for the most novice user to the experienced software developer. 

There are a variety of ladders to choose from, but the best one to learn how to use Drupal and learn how to apply some of the great features of Drupal are in the Drupal4Gov ladder:

Drupal Ladders

Once you’ve selected the ladder you want to learn, you’ll be taken to a page where you can see all the steps you can learn, from installing Drupal to contributing your own project. I thought this was an excellent tool to learn something new because the directions are very clear and the each step builds on the previous one so you are never left feeling lost.

Drupal4Gov - Drupal Ladder

What’s great about this program is that the Drupal Ladder gives you the option of installing Drupal on your own server (if you have one), or using a simulation called Dev Desktop that simulates a server and allows you to have all the same functionality of Drupal. For librarians specifically, the first 5 rungs on the ladder above are an excellent way to become familiar with the software and try a few of the more advanced functions.

Another cool tool you can use is called simplytest.me that allows you to run anybody’s Drupal site for 30 minutes to an hour and play around with it. This is an helpful way for people to see how different websites and applications are developed and used. I could spend hours just fiddling around with the themes of websites and installing cool modules into the program.

I chose to write about this topic today because I see more and more libraries struggling to figure out how they can quickly and easily build new websites or platforms for their patrons. With the influx of new librarianship roles like embedded librarians and informationists, I figured knowing how to quickly build a website would be useful – this is what Drupal is designed for. Because Drupal is open source and has such a strong community supporting it, I kept thinking to myself during the workshop: Why can’t librarians be a part of this community too? I think that Drupal is an excellent skill to have as it provides libraries with a lot of options to move forward if they are looking for a new content management system. The ease of use and intuitive nature of Drupal also make it easier to train other staff how to use it. If you have the time, I encourage any librarian reading this to give the Drupal Ladder a try. The more time you put into learning it and exploring what Drupal can do, the easier it is to use. 

**I am not affiliated with Drupal in any way, the views expressed here are my own.**

Open Access & Open Data: Projects that librarians should know about (and share with others!)

Last week I had the opportunity to attend a presentation by Heather Joseph –  a representative of SPARC (Scholarly Publishing and Academic Resources Coalition) – to hear about some of the great open access journal publishing initiatives taking place. There are a variety of publishing platforms that have emerged as of late that offer their own unique way of promoting open access and supporting research sharing. I thought I would share with you some of the initiatives that Heather highlighted in her talk. 

To extend the discussion into the realm of open access data, I also want to discuss a few of the data sharing initiatives I have found while working on my current projects. I believe that these data sharing resources represent an ideal  future for research and data publication; they offer platforms where investigators can share data, collaborate and modify data with other researchers and even use software to transform their datasets into education materials. To access each resource, click on the images to link to their respective webpages.

Open Access Publishers

Public Library of Science (PLOS)

PLOS

The most obvious on the list but I feel like I would have heard about it from colleagues if I didn’t include it. PLOS is the initiative that provides multiple platforms for scientific journals that are completely open access. They are strong advocates of sharing research and have 9 core principles that promote sharing, community engagement and scientific excellence. PLOS hosts many excellent journals such asPLOS ONE, which publishes across the full range of life and health sciences; community journals (PLOS GeneticsPLOS Computational BiologyPLOS Pathogensand PLOS Neglected Tropical Diseases); and  PLOS Medicine and PLOS Biology. PLOS Blogs and Currents also make for some excellent reading, focused mainly on the issues of research sharing and open access. I read PLOS blogs and currents on a regular basis, as they provide excellent information on open access and focus on many publication issues that librarians need to be aware of.

eLIFE

eLife   the funder researcher collaboration and forthcoming journal for the best in life science and biomedicine

eLIFE is one of the new actors in the realm of open access publishing, and prides itself on being:

a researcher-led digital publication for outstanding work, a platform to maximise the reach and influence of new findings and a showcase for new approaches for the presentation and assessment of research.

Working with the Howard Hughes Medical Institute, the Max Planck Society, and the Wellcome Trust among 200 others, eLIFe is focusing its attention to early-career researchers. Their goal is to make researchers first foray into publishing a constructive and fair exercise by providing a fair, transparent, and supportive author experience. eLIFE is also interested in promoting data sharing, but I don’t think it has been fully realized yet. I look forward to see what will come out of eLIFE as it continues to grow.

PeerJ

PeerJ

PeerJ offers a different model from eLife and PLOS in that it costs money to sign up, but for a small sum a publisher can be set up with a publication platform for life. $99 allows a researcher to publish one article per year for life; $199 allows a researcher to publish twice a year for life; and $299 provides the researcher with the opportunity to publish as many articles as they want per year. There is still a rigorous peer review process and paying this amount does not guarantee that their papers will be accepted. It is also important to note that all authors of an article must be members of PeerJ to submit. PeerJ has a set list of criteria that need to be met and provides an extensive list of editors from various disciplines that review submissions. Furthermore, every PeerJ member is required to review at least one paper each year or participate in post-publication peer review.

A news article in Nature comments on PeerJ as one of the cheapest options for this type of publishing. I highly encourage everyone to read the news article as it provides some insight into the emerging nature of open access publishing platforms. PeerJ seems like a good idea, but we’ll have to see if it will generate enough of a following to remain sustainable over time.

Open Humanities Alliance

Open Humanities Alliance

For my humanities friends out there, I had to include the Open Humanities Alliance in this list. The Alliance is a community-building project of thOpen Humanities Press. It aims to overcome some of the common technical barriers to open access in the humanities by linking students and faculty with resources such as open source software, hosting and archiving. The Open Humanities Alliance is a way for like-minded people from inside or outside the academy to work together in opening humanities scholarship to the world.

The one project that is sponsored by the Alliance that I want to talk about is the Open Access Journal Incubator ibiblio. This project is designed to provide researchers with a place to access a wide variety of research (music, art, literature, politics, etc.) as well as share their own. Contributors to ibiblio have to meet their set of criteria before they can share their research, but the requirements are clear and easy to follow. I had a lot of fun rooting around the site looking at the 900+ collections.

Data Sharing Projects

As a result of the discussions of research data sharing within the scientific community, projects such as HUBzero, Cytobank, and WebPAX have emerged to broach the subject through online communities that encourage the sharing of research data, foster research collaboration, and promote collective data analysis. I discuss a little bit about each one below.

Cytobank

Cytobank

Cytobank is a data sharing repository designed to manage, share, and analyze flow cytometry data from any researcher. Cytobank prides itself on being a platform for researchers, collaborators, lab and core facility managers, developers and statisticians, educators and trainers, and vendors.

What is great about Cytobank is that it allows researchers to manage their own data and host it on a cloud server; share experiment data and details quickly and easily through the web to other Cytobank users; foster interactive discussions around particular experiments; and allow researchers to turn their cytometry data into education materials. I believe that we will be seeing more repositories like Cytobank as data sharing becomes more common among researchers. This type of repository represents the potential benefits of data sharing by providing researchers with a place where they can store and manage their research as well as collaborate with others to achieve new scientific discovery.

HubZERO

HUBzero   Platform for Scientific Collaboration

HubZERO is an open source software platform for building powerful Web sites that support scientific discovery, learning, and collaboration. The scientific community has started to refer to web sites like this as “collaboratories” supporting “team science.” HubZERO differs from Cytobank in that it provides a content management system that is  built to support scientific activities. Using this system researchers can work together in projects, publish datasets and computational tools with Digital Object Identifiers (DOIs), and make these publications available for others to use as live, interactive digital resources. HubZERO’s datasets and tools run on cloud computing resources, campus clusters, and other national high-performance computing (HPC) facilities. You can take a look at some existing hubs here.

These hubs represent new and exciting innovations in data sharing. These sites are dynamic with options to build animations with data; download data; take courses to understand various datasets; view publications associated with the data;  observe online presentations about the data; and even create online simulations based on the data.

WebPax

WebPAX.com   Share Your Medical Images

WebPax is exciting because it focuses primarily on sharing medical imagery. Researchers can host and manage their medical images on the site and share them with colleagues for further analysis. Researchers create an account and have full control over who can view their images. They can then share their images with a select group of people or post them to where all members can see them. In case you were wondering about privacy, all images are anonymized and encrypted using secure socket layer (SSL) encryption technologies to make sure that third parties are unable to access this sensitive information. Because so many physicians come into the library wanting to see images on a particular topic, I think WebPax would be an excellent resource to point them to. Not only will it give them another option for viewing images, but it might even encourage them to share some of their own.

A Data Management and Data Sharing Bibliography for Librarians

It has been a while since I last posted. December was a pretty crazy month and I’ve been working on some excellent projects (more to come on the blog in a few weeks). In the meantime, a colleague of mine – the talented @fsayre - and I have been working hard to compile all of the literature on data management that we thought would be useful for librarians. Since we are both medical librarians, there are quite a few articles that are health-focused, but the majority should be useful for any librarian. 

The two of us are hoping to start a Mendeley group where more librarians can join and share their experiences and ideas about working with data management. We would love to have the input of more librarians, so please let us know via this blog or on Twitter if you would be interested in joining our Mendeley group.

As for this bibliography, while we’ve tried to make it as comprehensive as possible, we encourage people who read this to add additional material in case we’ve missed some resources.  Also, if you’re interested in looking at some other resources, check out my posts on the Data Curation Lifecycle and data management resources for librarians. Happy reading!

**Update** The Mendeley Group is now up and running and you can request to join it here: http://www.mendeley.com/groups/2956801/data-management-for-librarians/. We encourage all of those who are interested to sign up, and you are not required to contribute if you do not want to. Otherwise, we hope that librarians will share resources as well as their experiences working with data.

1. Advisor E, Committee WP, Attribution S. Report on the International Workshop on Contributorship and Scholarly Attribution Report written by Irene Hames , Editorial Advisor and Consultant , with input and some facilitators Workshop Planning Committee Executive summary. 2012;2012(May):1–29.

2. Allard S. DataONE: Facilitating eScience through Collaboration. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 10];1(1):4–17. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/3/

3. Auckland M. Re-skilling for Research. RLUK Research Libraries UK. 2012. Available from: http://www.rluk.ac.uk/files/RLUK%20Re-skilling.pdf

4. Baker M. Gene data to hit milestone. Nature [Internet]. 2012 Jul 19 [cited 2012 Nov 1];487(7407):282–3. Available from: http://www.nature.com/news/gene-data-to-hit-milestone-1.11019

5. Bloom T. Dealing with data. PLOS Biologue [Internet]. 2012 [cited 2012 Nov 9]; Available from: http://blogs.plos.org/biologue/2012/07/13/dealing-with-data/

6. National Science Board. Digital Research Data Sharing and Management. National Science Foundation. Arlington, VA; 2011. Available from: http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf

7. Borgman CL. Research Data : Who will share what, with whom, when, and why ? China-North American Library Conference. Beijing; 2010. p. 21. Available from: http://works.bepress.com/borgman/238/

8. Charles W. Bailey J. Research Data Curation Bibliography [Internet]. Houston: Charles W. Bailey, Jr.; 2012 [cited 2012 Nov 9]. Available from: http://digital-scholarship.org/rdcb/rdcb.htm

9. Christensen-Dalsgaard B. Ten recommendations for libraries to get started with research data management. Wirtschaftsforschung, Berlin; 2012 p. 3. Available from: http://www.libereurope.eu/sites/default/files/The%20research%20data%20group%202012%20v7%20final.pdf

10. Creamer A. Creating an Online Research Data Management Course: A Conversation with Data Librarians Robin Rice and Stuart Macdonals. Worcester, MA; 2011. Available from: http://esciencecommunity.umassmed.edu/2012/10/09/creating-an-online-research-data-management-course-a-conversation-with-data-librarians-robin-rice-and-stuart-macdonald/

11. Creamer A, Morales M, Crespo J, Kafel D, Martin E. An Assessment of Needed Competencies to Promote the Data Curation and Management Librarianship of Health Sciences and Science and Technology Librarians in New England. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 10];1(1):18–26. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/4/

12. Creamer A, Morales M, Crespo J, Kafel D, Martin E. Data Curation and Management Competencies of New England Region Health Sciences and Science and Technology Librarians [Internet]. University of Massachusetts and New England Area Librarian e-Science Symposium 2011. Available from: http://escholarship.umassmed.edu/escience_symposium/2011/posters/8

13. Crosas M. The Dataverse Network. The Institute of Quantitative Social Science 2012. Available from: http://thedata.org/

14. D’Ignazio J, Qin J, Kitlas J. Using internship experience to evaluate a new program in eScience librarianship. Proceedings of the 2012 iConference on – iConference  ’12 [Internet]. New York, New York, USA: ACM Press; 2012;601–2. Available from: http://dl.acm.org/citation.cfm?doid=2132176.2132304

15. Dukes P. Maximising value of population health sciences data The role for Data Management Plans MRC data strategy. 2012;(November). Available from: http://blogs.lshtm.ac.uk/rdmss/files/2012/11/4-Dukes-MRC1.pdf

16. Eynden AV Van Den, Corti L, Bishop L, Horton L. Managing and Sharing Data: Best Practices for Researchers. UK Data Arrchive; 2011. Available from: http://data-archive.ac.uk/media/2894/managingsharing.pdf

17. Ferguson J. Description and Annotation of Biomedical Data Sets. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 10];1(1):51–6. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/9/

18. Godlee F. Clinical trial data for all drugs in current use. BMJ [Internet]. 2012 Oct 29 [cited 2012 Nov 2];345(oct29 2):e7304–e7304. Available from: http://www.bmj.com/content/345/bmj.e7304

19. Gore S a. e-Science and data management resources on the Web. Medical reference services quarterly [Internet]. 2011 Jan;30(2):167–77. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21534116

20. Hackett Y. A National Research Data Management Strategy for Canada: The Work of the National Data Archive Consultation Working Group. 2001. Available from: http://www.interpares.org/display_file.cfm?doc=ip1_dissemination_janr_hackett_iassist_quarterly_25_2001.pdf

21. Heidorn PB. The Emerging Role of Libraries in Data Curation and E-science. Journal of Library Administration [Internet]. Routledge; 2011 Oct [cited 2012 Nov 9];51(7-8):662–72. Available from: http://dx.doi.org/10.1080/01930826.2011.601269

22. Hey A, Tansley S, Tolle K. The fourth paradigm: data-intensive scientific discovery [Internet]. Microsoft Research; 2009 [cited 2012 Nov 9]. Available from: http://iw.fh-potsdam.de/fileadmin/FB5/Dokumente/forschung/tagungen/i-science/TonyHey_-__eScience_Potsdam__Mar2010____complete_.pdf

23. Hswe P, Holt A. Guide for Research Libraries: The NSF Data Sharing Policy [Internet]. Association of Research Libraries. 2011 [cited 2012 Oct 11]. Available from: http://www.arl.org/rtl/eresearch/escien/nsf/index.shtml

24. Inouye D, Scheiner S. Some Simple Guidelines for Effective Data Management. Bulletin of the Ecological Society of America. 2009;2:1–10. Available from: http://www.nceas.ucsb.edu/files/computing/EffectiveDataMgmt.pdf

25. Interview with Svetia Baykoucheva and James Mullin: What Do Libraries Have to Do with e-Science ? ACS Division of Chemical Information (CINF ). 2011;1–2. Available from: http://drum.lib.umd.edu/bitstream/1903/11843/1/Baykoucheva_Mullins_eScience.pdf

26. Jahnke L, Asher A, Keralis SDC. The Problem of Data. Washington, DC: Council on Library and Information Resources; 2012. Available from: http://www.clir.org/pubs/reports/pub154/pub154.pdf

27. Johnston L, Lafferty M, Petsan B. Training Researchers on Data Management: A Scalable, Cross-Disciplinary Approach. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 8];1(2). Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss2/2/

28. Kafel D, Morales M, Vander Hart R, Gore S, Creamer A, Crespo J, et al. Building an e-Science Portal for Librarians: A Model of Collaboration. Journal of eScience Librarianship [Internet]. 2012;1(1):41–5. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/7/

29. LeFurgy B. Data-Intensive Librarians for Data-Intensive Research [Internet]. The Signal: Digital Preservation. 2012 [cited 2012 Nov 9]. Available from: http://blogs.loc.gov/digitalpreservation/2012/07/data-intensive-librarians-for-data-intensive-research/

30. Lamar Soutter Library, University of Massachusetts Medical School and the George C. Gordon Library, Worcester Polytechnic Institute. Frameworks for a Data Management Curriculum [Internet]. Worcester; 2011 p. 1–67. Available from: http://library.umassmed.edu/data_management_frameworks.pdf

31. Lesk M. Data curation : just in time , or just in case ? International Association of Scientific and Technological University Libraries, 31st Annual Conference. West Lafayette, IN; 2010. Available from: http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1021&context=iatul2010

32. Mayernik MS. The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation [Internet]. D-Lib Magazine. 2012. Available from: http://www.dlib.org/dlib/september12/mayernik/09mayernik.html

33. Minnesota U of. Data Management 101 – Planning Checklist.

34. Most WC. Keeping Research Data Safe: Cost issues in digital preservation of research data. 2:5–6. Available from: http://www.beagrie.com/KRDS_Factsheet_0910.pdf

35. NISO. Linked Data for Libraries, Archives and Museums. Information Standards Quarterly. 2012;24(2/3). Available from: http://www.niso.org/apps/group_public/download.php/9422/isqv24no2-3.pdf

36. Pathak J, Wang J, Kashyap S, Basford M, Li R, Masys DR, et al. Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience. Journal of the American Medical Informatics Association : JAMIA [Internet]. [cited 2012 Oct 29];18(4):376–86. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3128396&tool=pmcentrez&rendertype=abstract

37. Piorun M, Kafel D, Leger-Hornby T, Najafi S, Martin E, Colombo P, et al. Teaching Research Data Management: An Undergraduate/Graduate Curriculum. Journal of eScience Librarianship [Internet]. 2012;1(1):46–50. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/8/

38. Piwowar HA, Vision TJ, Whitlock MC. Data archiving is a good investment. Nature [Internet]. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2011 May 19 [cited 2012 Nov 9];473(7347):285. Available from: http://dx.doi.org/10.1038/473285a

39. Piwowar H a., Day RS, Fridsma DB. Sharing Detailed Research Data Is Associated with Increased Citation Rate. Ioannidis J, editor. PLoS ONE [Internet]. 2007 Mar 21 [cited 2012 Oct 25];2(3):e308. Available from: http://dx.plos.org/10.1371/journal.pone.0000308

40. Pryor G. Managing Research Data [Internet]. Facet Publishing; 2012 [cited 2012 Nov 9]. p. 224. Available from: http://www.amazon.com/Managing-Research-Data-Graham-Pryor/dp/1856047563

41. Rajaraman A, Ullman JD. Mining of Massive Datasets. Cambridge: Cambridge University Press; 2011; Available from: http://ebooks.cambridge.org/ref/id/CBO9781139058452

42. Reznik-Zellen R, Adamick J, McGinty S. Tiers of Research Data Support Services. Journal of eScience Librarianship [Internet]. 2012 [cited 2012 Nov 10];1(1):27–35. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss1/5/

43. Rosenthal DSH, Vargas DL. LOCKSS Boxes in the Cloud. 2012. Available from: http://www.lockss.org/locksswp/wp-content/uploads/2012/09/LC-final-2012.pdf

44. Rosenthal D, Rosenthal D, Miller E. The Economics of Long-Term Digital Storage. fsl.cs.sunysb.edu [Internet]. [cited 2012 Dec 2];1–8. Available from: http://www.fsl.cs.sunysb.edu/docs/unesco12/UNESCO2012-storage-econ.pdf

45. Salo D. Retooling Libraries for the Data Challenge [Internet]. Web Magazine for Information Professionals. 2010 [cited 2012 Nov 9]. Available from: http://www.ariadne.ac.uk/issue64/salo

46. Schemes M. Understanding Metadata. Bethesa, MD: NISO Press; 2004. Available from: http://www.niso.org/publications/press/UnderstandingMetadata.pdf

47. Society TR. Science as an open enterprise. London: The Royal Society; 2012. Available from: http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/projects/sape/2012-06-20-SAOE.pdf

48. Starr J, Willett P, Federer L, Horning C, Bergstrom M. A Collaborative Framework for Data Management Services: The Experience of the University of California. Journal of eScience Librarianship [Internet]. 2012 Oct 3 [cited 2012 Nov 10];1(2):109–14. Available from: http://escholarship.umassmed.edu/jeslib/vol1/iss2/7

49. Strasser C, Cook R, Michener W, Budden A. Primer on Data Management: What you always wanted to know [Internet]. 2012. p. 1–11. Available from: http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf

50. Tenopir C, Birch B, Allard S. Academic Libraries and Research Data Services: Current Practices and Plans for the Future [Internet]. 2012. Available from: http://www.ala.org/acrl/sites/ala.org.acrl/files/content/publications/whitepapers/Tenopir_Birch_Allard.pdf

51. Thibodeau K. Certificate of Advanced Study in Digital Preservation. Proceedings of the 1st International Digital Preservation Interoperability Framework Symposium on – INTL-DPIF  ’10 [Internet]. New York, New York, USA: ACM Press; 2010;1–9. Available from: http://dl.acm.org/citation.cfm?doid=2039263.2039264

52. Trinidad SB, Fullerton SM, Bares JM, Jarvik GP, Larson EB, Burke W. Genomic research and wide data sharing: views of prospective participants. Genetics in medicine : official journal of the American College of Medical Genetics [Internet]. 2010 Aug [cited 2012 Oct 29];12(8):486–95. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3045967&tool=pmcentrez&rendertype=abstract