Show newer

📣 Uncertain research country rankings. Should we continue producing uncertain rankings?. (arXiv:2312.17560v2 [cs.DL] UPDATED) arxiv.org/abs/2312.17560

Uncertain research country rankings. Should we continue producing uncertain rankings?

Citation based country rankings consistently categorize Japan as a developing country, even in those from the most reputed institutions. This categorization challenges the credibility of such rankings, considering Japan elevated scientific standing. In most cases, these rankings use percentile indicators and are accurate if country citations fit an ideal model of distribution, but they can be misleading in cases of deviations. The ideal model implies a lognormal citation distribution and a power law citation based double rank: in the global and country lists. This report conducts a systematic examination of deviations from the ideal model and their consequential impact on evaluations. The study evaluates six selected countries across three scientifically relevant topics and utilizes Leiden Ranking assessments of over 300 universities. The findings reveal three types of deviations from the lognormal citation distribution: i deviations in the extreme upper tail; ii inflated lower tails; and iii deflated lower part of the distributions. These deviations stem from structural differences among research systems that are prevalent and have the potential to mislead evaluations across all research levels. Consequently, reliable evaluations must consider these deviations. Otherwise, while some countries and institutions will be correctly evaluated, failure to identify deviations in each specific country or institution will render uncertain evaluations. For reliable assessments, future research evaluations of countries and institutions must identify deviations from the ideal model.

arxiv.org

📣 Knowledge Navigation: Inferring the Interlocking Map of Knowledge from Research Trajectories. (arXiv:2401.11742v2 [cs.IR] UPDATED) arxiv.org/abs/2401.11742

Knowledge Navigation: Inferring the Interlocking Map of Knowledge from Research Trajectories

"If I have seen further, it is by standing on the shoulders of giants," Isaac Newton's renowned statement hints that new knowledge builds upon existing foundations, which means there exists an interdependent relationship between knowledge, which, yet uncovered, is implied in the historical development of scientific systems for hundreds of years. By leveraging natural language processing techniques, this study introduces an innovative embedding scheme designed to infer the "knowledge interlocking map." This map, derived from the research trajectories of millions of scholars, reveals the intricate connections among knowledge. We validate that the inferred map effectively delineates disciplinary boundaries and captures the intricate relationships between diverse concepts. The utility of the interlocking map is showcased through multiple applications. Firstly, we demonstrated the multi-step analogy inferences within the knowledge space and the functional connectivity between concepts in different disciplines. Secondly, we trace the evolution of knowledge across domains, observing trends such as shifts from "Theoretical" to "Applied" or "Chemistry" to "Biomedical" along predefined functional directions. Lastly, by analyzing the high-dimensional knowledge network structure, we found that knowledge connects each other with shorter global pathways, and the interdisciplinary knowledge plays a critical role in accessibility of the global knowledge network. Our framework offers a novel approach to mining knowledge inheritance pathways in extensive scientific literature, which is of great significance for understanding scientific development patterns, tailoring scientific learning trajectories, and accelerating scientific progress.

arxiv.org

📣 Organizing Scientific Knowledge From Energy System Research Using the Open Research Knowledge Graph. (arXiv:2401.13365v1 [cs.DL]) arxiv.org/abs/2401.13365

Organizing Scientific Knowledge From Energy System Research Using the Open Research Knowledge Graph

Engineering sciences, such as energy system research, play an important role in developing solutions to technical, environmental, economic, and social challenges of our modern society. In this context, the transformation of energy systems into climate-neutral systems is one of the key strategies for mitigating climate change. For the transformation of energy systems, engineers model, simulate and analyze scenarios and transformation pathways to initiate debates about possible transformation strategies. For these debates and research in general, all steps of the research process must be traceable to guarantee the trustworthiness of published results, avoid redundancies, and ensure their social acceptance. However, the analysis of energy systems is an interdisciplinary field as the investigations of large, complex energy systems often require the use of different software applications and large amounts of heterogeneous data. Engineers must therefore communicate, understand, and (re)use heterogeneous scientific knowledge and data. Although the importance of FAIR scientific knowledge and data in the engineering sciences and energy system research is increasing, little research has been conducted on this topic. When it comes to publishing scientific knowledge and data from publications, software, and datasets (such as models, scenarios, and simulations) openly available and transparent, energy system research lags behind other research domains. According to Schmitt et al. and Nieße et al., engineers need technical support in the form of infrastructures, services, and terminologies to improve communication, understanding, and (re)use of scientific knowledge and data.

arxiv.org

📣 Visualization of rank-citation curves for fast detection of h-index anomalies in university metrics. (arXiv:2401.13490v1 [cs.DL]) arxiv.org/abs/2401.13490

Visualization of rank-citation curves for fast detection of h-index anomalies in university metrics

University rankings, despite facing criticism, continue to maintain their popularity. In the 2023 Scopus Ranking of Ukrainian Universities, certain institutions stood out due to their high h-index, despite modest publication and citation numbers. This phenomenon can be attributed to influential research topics or involvement in international collaborative research. However, these results may also be due to the authors' own efforts to increase the number of citations of their publications in order to improve their h-index. To investigate this, the publications from the top 30 universities in the ranking were analysed, revealing humpback rank-citation curves for two universities. These humpbacks indicate unusual trends in the citation data, especially considering the high percentage of self-citations and FWCI of analysed papers. While quantitative analysis has limitations, the combination of humped rank-citation curves, self-citations, FWCI, and previous research findings raises concerns about the possible causes of these anomalies in the citation data of the analysed universities. The method presented in this paper can aid ranking compilers and citation databases managers in identifying potential instances of citation data anomalies, emphasizing the importance of expert assessment for accurate conclusions.

arxiv.org

📣 Research Funding in the Middle East and North Africa: Analyses of Acknowledgments in Scientific Publications indexed in the Web of Science (2008-2021). (arXiv:2310.04426v2 [cs.DL] UPDATED) arxiv.org/abs/2310.04426

Research Funding in the Middle East and North Africa: Analyses of Acknowledgments in Scientific Publications indexed in the Web of Science (2008-2021)

Funding acknowledgments are important objects of study in the context of science funding. This study uses a mixed-methods approach to analyze the funding acknowledgments found in 2.3 million scientific publications published between 2008 and 2021 by authors affiliated with research institutions located in the Middle Eastern and North Africa (MENA). The aim is to identify the major funders, assess their contribution to national scientific publications, and gain insights into the funding mechanism in relation to collaboration and publication. Publication data from the Web of Science is examined to provide key insights about funding activities. Saudi Arabia and Qatar lead the region, as about half of their publications include acknowledgments to funding sources. Most MENA countries exhibit strong linkages with foreign agencies, mainly due to a high level of international collaborations. The distinction between domestic and international publications reveals some differences in terms of funding structures. For instance, Turkey and Iran are dominated by one or two major funders whereas a few other countries like Saudi Arabia showcase multiple funders. Iran and Kuwait are examples of countries where research is mainly funded by domestic funders. The government and academic sectors mainly fund scientific research in MENA whereas the industry sector plays little or no role in terms of research funding. Lastly, the qualitative analyses provide more context into the complex funding mechanism. The findings of this study contribute to a better understanding of the funding structure in MENA countries and provide insights to funders and research managers to evaluate the funding landscape.

arxiv.org

📣 Decoding University Hierarchy and Prestige in China through Domestic Ph.D. Hiring Network. (arXiv:2401.12739v1 [cs.DL]) arxiv.org/abs/2401.12739

Decoding University Hierarchy and Prestige in China through Domestic Ph.D. Hiring Network

The academic job market for fresh Ph.D. students to pursue postdoctoral and junior faculty positions plays a crucial role in shaping the future orientations, developments, and status of the global academic system. In this work, we focus on the domestic Ph.D. hiring network among universities in China by exploring the doctoral education and academic employment of nearly 28,000 scientists across all Ph.D.-granting Chinese universities over three decades. We employ the minimum violation rankings algorithm to decode the rankings for universities based on the Ph.D. hiring network, which offers a deep understanding of the structure and dynamics within the network. Our results uncover a consistent, highly structured hierarchy within this hiring network, indicating the imbalances wherein a limited number of universities serve as the main sources of fresh Ph.D. across diverse disciplines. Furthermore, over time, it has become increasingly challenging for Chinese Ph.D. graduates to secure positions at institutions more prestigious than their alma maters. This study quantitatively captures the evolving structure of talent circulation in the domestic environment, providing valuable insights to enhance the organization, diversity, and talent distribution in China's academic enterprise.

arxiv.org

📣 Promotion of Scientific Publications on ArXiv and X Is on the Rise and Impacts Citations. (arXiv:2401.11116v1 [cs.DL]) arxiv.org/abs/2401.11116

Promotion of Scientific Publications on ArXiv and X Is on the Rise and Impacts Citations

In the evolving landscape of scientific publishing, it is important to understand the drivers of high-impact research, to equip scientists with actionable strategies to enhance the reach of their work, and to understand trends in the use modern scientific publishing tools to inform their further development. Here, based on a large dataset of computer science publications, we study trends in the use of early preprint publications and revisions on ArXiv and the use of X (formerly Twitter) for promotion of such papers in the last 10 years. We find that early submission to ArXiv and promotion on X have soared in recent years. Estimating the effect that the use of each of these modern affordances has on the number of citations of scientific publications, we find that in the first 5 years from an initial publication peer-reviewed conference papers submitted early to ArXiv gain on average $21.1 \pm 17.4$ more citations, revised on ArXiv gain $18.4 \pm 17.6$ more citations, and promoted on X gain $44.4 \pm 8$ more citations. Our results show that promoting one's work on ArXiv or X has a large impact on the number of citations, as well as the number of influential citations computed by Semantic Scholar, and thereby on the career of researchers. We discuss the far-reaching implications of these findings for future scientific publishing systems and measures of scientific impact.

arxiv.org

📣 A multi-dimensional analysis of usage counts, Mendeley readership, and citations for journal and conference papers. (arXiv:2401.10504v1 [cs.DL]) arxiv.org/abs/2401.10504

A multi-dimensional analysis of usage counts, Mendeley readership, and citations for journal and conference papers

This study analyzed 16,799 journal papers and 98,773 conference papers published by IEEE Xplore in 2016 to investigate the relationships among usage counts, Mendeley readership, and citations through descriptive, regression, and mediation analyses. Differences in the relationship among these metrics between journal and conference papers are also studied. Results showed that there is no significant difference between journal and conference papers in the distribution patterns and accumulation rates of the three metrics. However, the correlation coefficients of the interrelationships between the three metrics were lower in conference papers compared to journal papers. Secondly, funding, international collaboration, and open access are positively associated with all three metrics, except for the case of funding on the usage metrics of conference papers. Furthermore, early Mendeley readership is a better predictor of citations than early usage counts and performs better for journal papers. Finally, we reveal that early Mendeley readership partially mediates between early usage counts and citation counts in the journal and conference papers. The main difference is that conference papers rely more on the direct effect of early usage counts on citations. This study contributes to expanding the existing knowledge on the relationships among usage counts, Mendeley readership, and citations in journal and conference papers, providing new insights into the relationship between the three metrics through mediation analysis.

arxiv.org

📣 Intentional and serendipitous diffusion of ideas: Evidence from academic conferences. (arXiv:2209.01175v3 [cs.DL] UPDATED) arxiv.org/abs/2209.01175

Intentional and serendipitous diffusion of ideas: Evidence from academic conferences

This paper investigates the effects of seeing ideas presented in-person when they are easily accessible online. Presentations may increase the diffusion of ideas intentionally (when one attends the presentation of an idea of interest) and serendipitously (when one sees other ideas presented in the same session). We measure these effects in the context of 25 computer science conferences using data from the scheduling application Confer, which lets users browse papers, Like those of interest, and receive schedules of their presentations. We address endogeneity concerns in presentation attendance by exploiting scheduling conflicts: when a user Likes multiple papers that are presented at the same time, she cannot see them both, potentially affecting their diffusion. Estimates show that being able to see presentations increases citing of Liked papers within two years by 1.5 percentage points (62.5% boost over the baseline citation rate). Attention to Liked papers also spills over to non-Liked papers in the same session, increasing their citing by 0.5 percentage points (125% boost), and this serendipitous diffusion represents 30.5% of the total effect. Both diffusion types were concentrated among papers semantically close to an attendee's prior work, suggesting that there are inefficiencies in finding related research that conferences help overcome. Overall, even when ideas are easily accessible online, in-person presentations substantially increase diffusion, much of it serendipitous.

arxiv.org

📣 Decades of Transformation: Evolution of the NASA Astrophysics Data System's Infrastructure. (arXiv:2401.09685v1 [astro-ph.IM]) arxiv.org/abs/2401.09685

Decades of Transformation: Evolution of the NASA Astrophysics Data System's Infrastructure

The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technological infrastructure underpinning it. We give an overview of the ADS's original architecture, constructed primarily around simple database models. This bespoke system allowed for the efficient indexing of metadata and citations, the digitization and archival of full-text articles, and the rapid development of discipline-specific capabilities running on commodity hardware. The move towards a cloud-based microservices architecture and an open-source search engine in the late 2010s marked a significant shift, bringing full-text search capabilities, a modern API, higher uptime, more reliable data retrieval, and integration of advanced visualizations and analytics. Another crucial evolution came with the gradual and ongoing incorporation of Machine Learning and Natural Language Processing algorithms in our data pipelines. Originally used for information extraction and classification tasks, NLP and ML techniques are now being developed to improve metadata enrichment, search, notifications, and recommendations. we describe how these computational techniques are being embedded into our software infrastructure, the challenges faced, and the benefits reaped. Finally, we conclude by describing the future prospects of ADS and its ongoing expansion, discussing the challenges of managing an interdisciplinary information system in the era of AI and Open Science, where information is abundant, technology is transformative, but their trustworthiness can be elusive.

arxiv.org

📣 Towards a Quality Indicator for Research Data publications and Research Software publications -- A vision from the Helmholtz Association. (arXiv:2401.08804v1 [cs.DL]) arxiv.org/abs/2401.08804

Towards a Quality Indicator for Research Data publications and Research Software publications -- A vision from the Helmholtz Association

Research data and software are widely accepted as an outcome of scientific work. However, in comparison to text-based publications, there is not yet an established process to assess and evaluate quality of research data and research software publications. This paper presents an attempt to fill this gap. Initiated by the Working Group Open Science of the Helmholtz Association the Task Group Helmholtz Quality Indicators for Data and Software Publications currently develops a quality indicator for research data and research software publications to be used within the Association. This report summarizes the vision of the group of what all contributes to such an indicator. The proposed approach relies on generic well-established concepts for quality criteria, such as the FAIR Principles and the COBIT Maturity Model. It does - on purpose - not limit itself to technical implementation possibilities to avoid using an existing metric for a new purpose. The intention of this paper is to share the current state for further discussion with all stakeholders, particularly with other groups also working on similar metrics but also with entities that use the metrics.

arxiv.org

📣 Similar but Faster: Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search. (arXiv:2401.08902v1 [cs.SD]) arxiv.org/abs/2401.08902

Similar but Faster: Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search

Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is similar, but similar in what way (e.g., wrt. tempo, mood or genre). Previous works have proposed disentangled embedding spaces where subspaces representing specific, yet possibly correlated, attributes can be weighted to emphasize those attributes in downstream tasks. However, no research has been conducted into the independence of these subspaces, nor their manipulation, in order to retrieve tracks that are similar but different in a specific way. Here, we explore the manipulation of tempo in embedding spaces as a case-study towards this goal. We propose tempo translation functions that allow for efficient manipulation of tempo within a pre-existing embedding space whilst maintaining other properties such as genre. As this translation is specific to tempo it enables retrieval of tracks that are similar but have specifically different tempi. We show that such a function can be used as an efficient data augmentation strategy for both training of downstream tempo predictors, and improved nearest neighbor retrieval of properties largely independent of tempo.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.