Show newer

📣 PatSTEG: Modeling Formation Dynamics of Patent Citation Networks via The Semantic-Topological Evolutionary Graph arxiv.org/abs/2402.02158

PatSTEG: Modeling Formation Dynamics of Patent Citation Networks via The Semantic-Topological Evolutionary Graph

Patent documents in the patent database (PatDB) are crucial for research, development, and innovation as they contain valuable technical information. However, PatDB presents a multifaceted challenge compared to publicly available preprocessed databases due to the intricate nature of the patent text and the inherent sparsity within the patent citation network. Although patent text analysis and citation analysis bring new opportunities to explore patent data mining, no existing work exploits the complementation of them. To this end, we propose a joint semantic-topological evolutionary graph learning approach (PatSTEG) to model the formation dynamics of patent citation networks. More specifically, we first create a real-world dataset of Chinese patents named CNPat and leverage its patent texts and citations to construct a patent citation network. Then, PatSTEG is modeled to study the evolutionary dynamics of patent citation formation by considering the semantic and topological information jointly. Extensive experiments are conducted on CNPat and public datasets to prove the superiority of PatSTEG over other state-of-the-art methods. All the results provide valuable references for patent literature research and technical exploration.

arxiv.org

📣 A framework for improving the accessibility of research papers on arXiv.org rss.arxiv.org/abs/2212.07286

📣 Exploring the landscape of virtual academic conferences: A scoping review of the 1984-2021 literature rss.arxiv.org/abs/2402.00370

📣 HERITRACE: Tracing Evolution and Bridging Data for Streamlined Curatorial Work in the GLAM Domain rss.arxiv.org/abs/2402.00477

📣 University Students Motives and Challenges in Utilising Institutional Repository Resources arxiv.org/abs/2401.17959

University Students Motives and Challenges in Utilising Institutional Repository Resources

One of the core functions of an academic institution is to generate knowledge, disseminate it to the intended audiences, and preserve it for future use. Academic institutions are now establishing Institutional Repositories (IRs) to collect produced resources to facilitate accessibility, dissemination, utilization, and management of intellectual materials produced within an institution. This study aimed to assess postgraduate students motives for utilizing IR resources and the challenges they encounter when utilizing IR resources at the University of Dar es Salaam. This study was conducted using a descriptive study design whereby it used both qualitative and quantitative research approaches. The population of this study comprised postgraduate students, librarians, and ICT personnel from the University of Dar es Salaam. A sample of 102 respondents was drawn conveniently and purposively for this study. Data were collected through questionnaires, interviews, as well as a review of documentary sources. Quantitative data were analyzed through a Version 16 Statistics Package for Social Science and qualitative data were analyzed using content analysis. The findings indicate that access to fulltext documents, the relevance of IR resources, and easy searching of the materials in the repository system motivate the utilization of IR resources. However, several challenges impede the utilization of these resources including unreliable internet access, inaccessibility of full-text and lack of guiding policy have been revealed as the major challenges toward utilization of IR resources. The study recommends training postgraduate students on the general use of IRs. Also, the University management should develop an IR policy that will guide the utilization of IR resources

arxiv.org

📣 WikiTexVC: MediaWiki's native LaTeX to MathML converter for Wikipedia arxiv.org/abs/2401.16786

WikiTexVC: MediaWiki's native LaTeX to MathML converter for Wikipedia

MediaWiki and Wikipedia authors usually use LaTeX to define mathematical formulas in the wiki text markup. In the Wikimedia ecosystem, these formulas were processed by a long cascade of web services and finally delivered to users' browsers in rendered form for visually readable representation as SVG. With the latest developments of supporting MathML Core in Chromium-based browsers, MathML continues its path to be a de facto standard markup language for mathematical notation in the web. Conveying formulas in MathML enables semantic annotation and machine readability for extended interpretation of mathematical content, in example for accessibility technologies. With this work, we present WikiTexVC, a novel method for validating LaTeX formulas from wiki texts and converting them to MathML, which is directly integrated into MediaWiki. This mitigates the shortcomings of previously used rendering methods in MediaWiki in terms of robustness, maintainability and performance. In addition, there is no need for a multitude of web services running in the background, but processing takes place directly within MediaWiki instances. We validated this method with an extended dataset of over 300k formulas which have been incorporated as automated tests to the MediaWiki continuous integration instances. Furthermore, we conducted an evaluation with 423 formulas, comparing the tree edit distance for produced parse trees to other MathML renderers. Our method has been made available Open Source and can be used on German Wikipedia and is delivered with recent MediaWiki versions. As a practical example of enabling semantic annotations within our method, we present a new macro that adds content to formula disambiguation to facilitate accessibility for visually impaired people.

arxiv.org

📣 Combining topic modelling and citation network analysis to study case law from the European Court on Human Rights on the right to respect for private and family life arxiv.org/abs/2401.16429

Combining topic modelling and citation network analysis to study case law from the European Court on Human Rights on the right to respect for private and family life

As legal case law databases such as HUDOC continue to grow rapidly, it has become essential for legal researchers to find efficient methods to handle such large-scale data sets. Such case law databases usually consist of the textual content of cases together with the citations between them. This paper focuses on case law from the European Court of Human Rights on Article 8 of the European Convention of Human Rights, the right to respect private and family life, home and correspondence. In this study, we demonstrate and compare the potential of topic modelling and citation network to find and organize case law on Article 8 based on their general themes and citation patterns, respectively. Additionally, we explore whether combining these two techniques leads to better results compared to the application of only one of the methods. We evaluate the effectiveness of the combined method on a unique manually collected and annotated dataset of Aricle 8 case law on evictions. The results of our experiments show that our combined (text and citation-based) approach provides the best results in finding and grouping case law, providing scholars with an effective way to extract and analyse relevant cases on a specific issue.

arxiv.org

📣 Are ChatGPT and Other Similar Systems the Modern Lernaean Hydras of AI? arxiv.org/abs/2306.09267

Are ChatGPT and Other Similar Systems the Modern Lernaean Hydras of AI?

The rise of Generative Artificial Intelligence systems ("AI systems") has created unprecedented social engagement. AI code generation systems provide responses (output) to questions or requests by accessing the vast library of open-source code created by developers over the past few decades. However, they do so by allegedly stealing the open-source code stored in virtual libraries, known as repositories. This Article focuses on how this happens and whether there is a solution that protects innovation and avoids years of litigation. We also touch upon the array of issues raised by the relationship between AI and copyright. Looking ahead, we propose the following: (a) immediate changes to the licenses for open-source code created by developers that will limit access and/or use of any open-source code to humans only; (b) we suggest revisions to the Massachusetts Institute of Technology ("MIT") license so that AI systems are required to procure appropriate licenses from open-source code developers, which we believe will harmonize standards and build social consensus for the benefit of all of humanity, rather than promote profit-driven centers of innovation; (c) we call for urgent legislative action to protect the future of AI systems while also promoting innovation; and (d) we propose a shift in the burden of proof to AI systems in obfuscation cases.

arxiv.org

📣 Combining topic modelling and citation network analysis to study case law from the European Court on Human Rights on the right to respect for private and family life. (arXiv:2401.16429v1 [cs.IR]) arxiv.org/abs/2401.16429

Combining topic modelling and citation network analysis to study case law from the European Court on Human Rights on the right to respect for private and family life

As legal case law databases such as HUDOC continue to grow rapidly, it has become essential for legal researchers to find efficient methods to handle such large-scale data sets. Such case law databases usually consist of the textual content of cases together with the citations between them. This paper focuses on case law from the European Court of Human Rights on Article 8 of the European Convention of Human Rights, the right to respect private and family life, home and correspondence. In this study, we demonstrate and compare the potential of topic modelling and citation network to find and organize case law on Article 8 based on their general themes and citation patterns, respectively. Additionally, we explore whether combining these two techniques leads to better results compared to the application of only one of the methods. We evaluate the effectiveness of the combined method on a unique manually collected and annotated dataset of Aricle 8 case law on evictions. The results of our experiments show that our combined (text and citation-based) approach provides the best results in finding and grouping case law, providing scholars with an effective way to extract and analyse relevant cases on a specific issue.

arxiv.org

📣 WikiTexVC: MediaWiki's native LaTeX to MathML converter for Wikipedia. (arXiv:2401.16786v1 [cs.DL]) arxiv.org/abs/2401.16786

WikiTexVC: MediaWiki's native LaTeX to MathML converter for Wikipedia

MediaWiki and Wikipedia authors usually use LaTeX to define mathematical formulas in the wiki text markup. In the Wikimedia ecosystem, these formulas were processed by a long cascade of web services and finally delivered to users' browsers in rendered form for visually readable representation as SVG. With the latest developments of supporting MathML Core in Chromium-based browsers, MathML continues its path to be a de facto standard markup language for mathematical notation in the web. Conveying formulas in MathML enables semantic annotation and machine readability for extended interpretation of mathematical content, in example for accessibility technologies. With this work, we present WikiTexVC, a novel method for validating LaTeX formulas from wiki texts and converting them to MathML, which is directly integrated into MediaWiki. This mitigates the shortcomings of previously used rendering methods in MediaWiki in terms of robustness, maintainability and performance. In addition, there is no need for a multitude of web services running in the background, but processing takes place directly within MediaWiki instances. We validated this method with an extended dataset of over 300k formulas which have been incorporated as automated tests to the MediaWiki continuous integration instances. Furthermore, we conducted an evaluation with 423 formulas, comparing the tree edit distance for produced parse trees to other MathML renderers. Our method has been made available Open Source and can be used on German Wikipedia and is delivered with recent MediaWiki versions. As a practical example of enabling semantic annotations within our method, we present a new macro that adds content to formula disambiguation to facilitate accessibility for visually impaired people.

arxiv.org

📣 Are ChatGPT and Other Similar Systems the Modern Lernaean Hydras of AI?. (arXiv:2306.09267v3 [cs.CY] UPDATED) arxiv.org/abs/2306.09267

Are ChatGPT and Other Similar Systems the Modern Lernaean Hydras of AI?

The rise of Generative Artificial Intelligence systems ("AI systems") has created unprecedented social engagement. AI code generation systems provide responses (output) to questions or requests by accessing the vast library of open-source code created by developers over the past few decades. However, they do so by allegedly stealing the open-source code stored in virtual libraries, known as repositories. This Article focuses on how this happens and whether there is a solution that protects innovation and avoids years of litigation. We also touch upon the array of issues raised by the relationship between AI and copyright. Looking ahead, we propose the following: (a) immediate changes to the licenses for open-source code created by developers that will limit access and/or use of any open-source code to humans only; (b) we suggest revisions to the Massachusetts Institute of Technology ("MIT") license so that AI systems are required to procure appropriate licenses from open-source code developers, which we believe will harmonize standards and build social consensus for the benefit of all of humanity, rather than promote profit-driven centers of innovation; (c) we call for urgent legislative action to protect the future of AI systems while also promoting innovation; and (d) we propose a shift in the burden of proof to AI systems in obfuscation cases.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.