Show newer

📣 Linguistic and Structural Basis of Engineering Design Knowledge. (arXiv:2312.06355v2 [cs.CL] UPDATED) arxiv.org/abs/2312.06355

Empirical Basis of Engineering Design Knowledge

Engineering design knowledge is embodied in natural language text through intricate placement of entities and relationships. Ontological constructs of design knowledge often limit the performances of NLP techniques to extract design knowledge. Also, large-language models could be less useful for generating and explicating design knowledge, as these are trained predominantly on common-sense text. In this article, we present the constituents of design knowledge based on empirical observations from patent documents. We obtain a sample of 33,881 patents and populate over 24 million facts from the sentences in these. We conduct Zipf distribution analyses using the frequencies of unique entities and relationships that are present in the facts thus populated. While the literal entities cannot be generalised from the sample of patents, the relationships largely capture attributes ('of'), structure ('in', 'with'), purpose ('to', 'for'), hierarchy ('include'), exemplification ('such as'), and behaviour ('to', 'from'). The analyses reveal that over half of entities and relationships could be generalised to 64 and 24 linguistic syntaxes respectively, while hierarchical relationships include 75 syntaxes. These syntaxes represent the linguistic basis of engineering design knowledge. We combine facts within each patent into a knowledge graph, from which we discover motifs that are statistically over-represented subgraph patterns. Across all patents in the sample, we identify eight patterns that could be simplified into sequence [->...->], aggregation [->...<-], and hierarchy [<-...->] that form the structural basis of engineering design knowledge. We propose regulatory precepts for concretising abstract entities and relationships within subgraphs, while also explicating hierarchical structures. These precepts could be useful for better construction and management of knowledge in a design environment.

arxiv.org

📣 A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept. (arXiv:2401.03642v2 [cs.CL] UPDATED) arxiv.org/abs/2401.03642

A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept

Novelty, akin to gene mutation in evolution, opens possibilities for scholarly advancement. Although peer review remains the gold standard for evaluating novelty in scholarly communication and resource allocation, the vast volume of submissions necessitates an automated measure of scholarly novelty. Adopting a perspective that views novelty as the atypical combination of existing knowledge, we introduce an information-theoretic measure of novelty in scholarly publications. This measure quantifies the degree of 'surprise' perceived by a language model that represents the word distribution of scholarly discourse. The proposed measure is accompanied by face and construct validity evidence; the former demonstrates correspondence to scientific common sense, and the latter is endorsed through alignment with novelty evaluations from a select panel of domain experts. Additionally, characterized by its interpretability, fine granularity, and accessibility, this measure addresses gaps prevalent in existing methods. We believe this measure holds great potential to benefit editors, stakeholders, and policymakers, and it provides a reliable lens for examining the relationship between novelty and academic dynamics such as creativity, interdisciplinarity, and scientific advances.

arxiv.org

📣 Divergent Characteristics of Biomedical Research across Publication Types: A Quantitative Analysis on the Aging-related Research. (arXiv:2401.04323v1 [cs.DL]) arxiv.org/abs/2401.04323

Divergent Characteristics of Biomedical Research across Publication Types: A Quantitative Analysis on the Aging-related Research

This paper investigates differences in characteristics across publication types for aging-related genetic research. We utilized bibliometric data for five model species retrieved from authoritative databases including PubMed. Publications are classified into types according to PubMed. Results indicate substantial divergence across publication types in attention paid to aging-related research, scopes of studied genes, and topical preferences. For instance, comparative studies and meta-analyses show a greater focus on aging than validation studies. Reviews concentrate more on cell biology while clinical studies emphasize translational topics. Publication types also manifest variations in highly studied genes, like APOE for reviews versus GH1 for clinical studies. Despite differences, top genes like insulin are universally emphasized. Publication types demonstrate similar levels of imbalance in research efforts to genes. Differences also exist in bibliometrics like authorship numbers, citation counts, etc. Publication types show distinct preferences for journals of certain topical specialties and scope of readership. Overall, findings showcase distinct characteristics of publication types in studying aging-related genetics, owing to their unique nature and objectives. This study is the first endeavor to systematically depict the inherent structure of a biomedical research field from the perspective of publication types and provides insights into knowledge production and evaluation patterns across biomedical communities.

arxiv.org

📣 Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies. (arXiv:2401.02965v1 [cs.DL]) arxiv.org/abs/2401.02965

Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies

Metadata, often termed "data about data," is crucial for organizing, understanding, and managing vast omics datasets. It aids in efficient data discovery, integration, and interpretation, enabling users to access, comprehend, and utilize data effectively. Its significance spans the domains of scientific research, facilitating data reproducibility, reusability, and secondary analysis. However, numerous perceptual and technical barriers hinder the sharing of metadata among researchers. These barriers compromise the reliability of research results and hinder integrative meta-analyses of omics studies . This study highlights the key barriers to metadata sharing, including the lack of uniform standards, privacy and legal concerns, limitations in study design, limited incentives, inadequate infrastructure, and the dearth of well-trained personnel for metadata management and reuse. Proposed solutions include emphasizing the promotion of standardization, educational efforts, the role of journals and funding agencies, incentives and rewards, and the improvement of infrastructure. More accurate, reliable, and impactful research outcomes are achievable if the scientific community addresses these barriers, facilitating more accurate, reliable, and impactful research outcomes.

arxiv.org

📣 Who are the gatekeepers of economics? Geographic diversity, gender composition, and interlocking editorship of journal boards. (arXiv:2304.04242v2 [econ.GN] UPDATED) arxiv.org/abs/2304.04242

Who are the gatekeepers of economics? Geographic diversity, gender composition, and interlocking editorship of journal boards

This study investigates the role of editorial board members as gatekeepers in science, creating and utilizing a database of 1,516 active economics journals in 2019, which includes more than 44,000 scholars from over 6,000 institutions and 142 countries. The composition of these editorial boards is explored in terms of geographic affiliation, institutional affiliation, and gender. Results highlight that the academic publishing environment is primarily governed by men affiliated with elite universities in the United States. The study further explores social similarities among journals using a network analysis perspective based on interlocking editorship. Comparison of networks generated by all scholars, editorial leaders, and non-editorial leaders reveals significant structural similarities and associations among clusters of journals. These results indicate that links between pairs of journals tend to be redundant, and this can be interpreted in terms of social and intellectual homophily within each board, and between boards of journals belonging to the same cluster. Finally, the analysis of the most central journals and scholars in the networks suggests that journals probably adopt 'strategic decisions' in the selection of the editorial board members. The documented high concentration of editorial power poses a serious risk to innovative research in economics.

arxiv.org

📣 Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices. (arXiv:2401.02858v1 [cs.DB]) arxiv.org/abs/2401.02858

Dimensionality Reduced Clustered Data and Order Partition and Stepwise Dimensionality Increasing Indices

One of the goals of NASA funded project at IBM T. J. Watson Research Center was to build an index for similarity searching satellite images, which were characterized by high-dimensional feature image texture vectors. Reviewed is our effort on data clustering, dimensionality reduction via Singular Value Decomposition - SVD and indexing to build a smaller index and more efficient k-Nearest Neighbor - k-NN query processing for similarity search. k-NN queries based on scanning of the feature vectors of all images is obviously too costly for ever-increasing number of images. The ubiquitous multidimensional R-tree index and its extensions were not an option given their limited scalability dimension-wise. The cost of processing k-NN queries was further reduced by building memory resident Ordered Partition indices on dimensionality reduced clusters. Further research in a university setting included the following: (1) Clustered SVD was extended to yield exact k-NN queries by issuing appropriate less costly range queries, (2) Stepwise Dimensionality Increasing - SDI index outperformed other known indices, (3) selection of optimal number of dimensions to reduce query processing cost, (4) two methods to make the OP-trees persistent and loadable as a single file access.

arxiv.org

📣 Examining the Challenges in Archiving Instagram. (arXiv:2401.02029v1 [cs.DL]) arxiv.org/abs/2401.02029

Examining the Challenges in Archiving Instagram

To prevent the spread of disinformation on Instagram, we need to study the accounts and content of disinformation actors. However, due to their malicious nature, Instagram often bans accounts that are responsible for spreading disinformation, making these accounts inaccessible from the live web. The only way we can study the content of banned accounts is through public web archives such as the Internet Archive. However, there are many issues present with archiving Instagram pages. Specifically, we focused on the issue that many Wayback Machine Instagram mementos redirect to the Instagram login page. In this study, we determined that mementos of Instagram account pages on the Wayback Machine began redirecting to the Instagram login page in August 2019. We also found that Instagram mementos on Archive.today, Arquivo.pt, and Perma.cc are also not well archived in terms of quantity and quality. Moreover, we were unsuccessful in all our attempts to archive Katy Perry's Instagram account page on Archive.today, Arquivo.pt, and Conifer. Although in the minority, replayable Instagram mementos exist in public archives and contain valuable data for studying disinformation on Instagram. With that in mind, we developed a Python script to web scrape Instagram mementos. As of August 2023, the Python script can scrape Wayback Machine archives of Instagram account pages between November 7, 2012 and June 8, 2018.

arxiv.org

📣 Uncertain research country rankings. Should we continue producing uncertain rankings?. (arXiv:2312.17560v1 [cs.DL]) arxiv.org/abs/2312.17560

Uncertain research country rankings. Should we continue producing uncertain rankings?

Citation based country rankings consistently categorize Japan as a developing country, even in those from the most reputed institutions. This categorization challenges the credibility of such rankings, considering Japan elevated scientific standing. In most cases, these rankings use percentile indicators and are accurate if country citations fit an ideal model of distribution, but they can be misleading in cases of deviations. The ideal model implies a lognormal citation distribution and a power law citation based double rank: in the global and country lists. This report conducts a systematic examination of deviations from the ideal model and their consequential impact on evaluations. The study evaluates six selected countries across three scientifically relevant topics and utilizes Leiden Ranking assessments of over 300 universities. The findings reveal three types of deviations from the lognormal citation distribution: i deviations in the extreme upper tail; ii inflated lower tails; and iii deflated lower part of the distributions. These deviations stem from structural differences among research systems that are prevalent and have the potential to mislead evaluations across all research levels. Consequently, reliable evaluations must consider these deviations. Otherwise, while some countries and institutions will be correctly evaluated, failure to identify deviations in each specific country or institution will render uncertain evaluations. For reliable assessments, future research evaluations of countries and institutions must identify deviations from the ideal model.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.