How to integrate Social Web Publications Into Digital Libraries

As scholarly blogging is becoming more and more popular, the question of how libraries should handle it remains unanswered. Integrating social web resources into Digital Library collections is possible. But which challenges are linked to it and how could they be solved?

Social web (2.0) publications continue to take up important “publishing space” in our daily work engagements. Going beyond the daily news-, entertainment-, or commercially-related consumption of these publications, there is an emergence of a new breed of social web publications – in the form of scholarly blogs – which is getting more and more attention in the scholarly community. Sooner or later scholarly blogs could be considered as a viable publishing channel as well as a significant research resource. Being domain-specific, short, and timely – which is perhaps one of its key attributes – they promise to complement the experience of the traditional (digital) libraries. Whereas users (researchers, students, librarians, and so forth) turn to Digital Library collections for the most authoritative topic selection, they can turn to scholarly blogs for the latest developments regarding those topics.

Scholarly blogging is gaining traction

As an interesting development in scientific publishing, commenting, peer reviewing, and sharing, scholarly blogging is gaining continuous traction. Writing scientific blogs departs from the mainstream research channels. Users find blogs as appropriate channels for many reasons: disseminating their findings is faster and more convenient (no publication review “check points”); the feedback they get from the community is faster (we are all used commenting, or using other social network features to provide feedback on a publication); they share the information on their experiments – both the successful and unsuccessful ones (which rarely get reported on in a research paper or journal, for example), and other benefits.

Collaboration and different kinds of feedback

Scholarly blogs emphasize, among other aspects, collaboration in the lifecycle of a publication: users do not only consume a publication passively, but they also provide feedback in different formats. The collaboration works the other way round: the author of a blog post can choose to use terms that are common in the given community to better describe its content. The (re)used terms yield a better publication dissemination in the community.

Mediatalk-Blog-Scholarly-Blogging-Statistics

Quantity and quality: Do we have enough to make a difference?

The blogging phenomenon sees an increasing presence and popularity in today’s publishing ecosystem. A recent report by Wolfgang Jaegel regarding the blogging statistics provides very encouraging numbers. Some of the findings show that almost 1 in 2 people read blogs more than once a day; most people read about 5-6 blogs; and there are 31% more bloggers compared to the situation 3 years ago. In order to handle this increasing blog publication stream and help researchers stay up to date, we see many services that support user search and retrieval of blog publications. The blog index provider ACI, for example, preselects authors to maintain quality, and makes their publications easily searchable via indexing all blog publications of its collection.

As a further segment of the scholarly blog publication stream, we consider conferences like Science Blogging Conference – not just science and not just blogging that mostly consist of scholarly blog publications. This clearly conveys the importance and the acceptance of this publishing channel to and from the research community. The ever increasing quantity and quality of scholarly blog publications renders them interesting for Digital Libraries, an added value that users could capitalize on.

A use case – How can these resources be used in a Digital Library environment?

A researcher, for example, interested in the impact of stock market index stability and the gold reserves of a country, can search in a Digital Library collection for economic theories modeling this relationship. Digital Library collections provide authoritative, curated, and potentially peer-reviewed publications; hence they usually represent the first stop for a user. The user can additionally explore scholarly blog publications on this topic, and find one that covers the impact that repatriating state gold reserves can have to the stock market index stability; gaining complementary, up-to-date follow up on its original (Digital Library) publication.

Furthermore, while reading a Digital Library publication, the user could benefit from something like a “Top 5 blogs to follow up with” category, for example, which will list the top 5 relevant blog publications the user could be interested in, and offer supplemental, up-to-date material. While having consulted the high quality publications from a Digital Library collection, the researcher can further engage in a related and up-to-date blog post that relates and complements their research process.

Mediatalk-Blog-Scholarly-Blogging-Monitor--Bilder-Blog

Bringing scholarly blog publications into the Digital Library: Challenges

So far we focused on establishing the presence and value that scholarly blogs can bring to a Digital Library ecosystem. These publications, however, lack the metadata that one typically finds in a Digital Library publication, such as: keywords or keyphrases, topic description; as well as any categorization that can help the users find publications of interest. Furthermore, even if there are categories describing particular publications, these are used and understood within the community revolving around that blog collection. For example, while the category “Accelerators” and the publication (tag) description “Investors raising / Capital” could be easily understood and of help to a community that regularly contributes and reads that blog collection, it can be unknown, or of little help to another (Digital Library) community.

Searching across heterogeneous collections

One approach I am experimenting with to remedy this situation is to (automatically) assign terms to blog publications, based on a controlled vocabulary adopted by a Digital Library. This renders a compatible description of these publications to the Digital Library practices, and also addresses the application and usage of the same terminology regarding otherwise heterogeneous publications from blog and Digital Library collections. The value of this approach is even higher if we have into view that modern Digital Library environments provide different services that support the user in their search. Blog publication description based on a Digital Library’s vocabulary enables reusing these services, without any changes on the services’ side.

Similarity measures for blog and Digital Library publications

By bridging the gap between Digital Library and scholarly blog collections, the user has an increased pool of publications to choose from. Although having a richer set of publications for the user is a good start, we research similarity measures that use the nature of blog publications (for e.g., the user-generated feedback is something that we do not find in a Digital Library publication), and the Digital Library ecosystem’s metadata and services to make sure we suggest the most relevant blog publications to the Digital Library users. The contribution from this part should identify the most relevant scholarly blog publications to a publication that the user is currently reading in the Digital Library collection. It is the result from researching similarity measures in this context that supports the use cases (presented earlier in this article).

The major part of the research, thus, concentrates on identifying and implementing several similarity metrics which, depending on the situation, can provide the most relevant blog publication for the Digital Library user to follow up their reading. Whether the blog publication contains feedback in the “social network” forms (“likes”, “shares”, “tweets”, comments, and so on); which has a short content – automatic term assignment needs certain content length in order to perform -; or is another way similar to the Digital Library publication under consideration by the user, we need to be able to apply the best similarity measure for the task at hand.

Blog posts are ripe for the Digital Library ecosystem

Our initiative primarily focuses in integrating the content available of social web resources into Digital Library collections. Although Web 2.0 solutions usually imply user feedback and participation with regard to the content creation and completion (tags, comments, shares, etcetera), we focus here in integrating this content into the Digital Library collection in order to offer the Digital Library users choices to follow up with their reading in a Digital Library, with relevant scholarly blog publications.

There are initiatives that aim to better understand the goals and requirements of people which engage with scholarly blogs – see for example “My Science Blog – Who reads it?”, but we believe that the conditions for exploring scholarly blog publications is ripe for the Digital Library ecosystem.

References:

Wolfgang Jaegel: “Rise of the bloggers – blogging statistics 2014”

ACI: a Scholarly Blog Index

Andrea Novicki: Science Blogging Conference – not just science and not just blogging

Paige Brown Jarreau: “My Science Blog – Who reads it?”

→ Author: Fidan Limani, (PhD Candidate in Semantic Technologies; current areas of work: Scholarly Web 2.0; Digital Libraries; Semantic technologies; ZBW – Leibniz Information Centre for Economics)

Share this post: