Mapping the structure of science through clustering in citation networks : granularity, labeling and visualization
Author: Sjögårde, Peter
Date: 2023-06-09
Location: Inghesalen, Widerströmska huset, Karolinska Institutet, Solna
Time: 13.00
Department: Inst för lärande, informatik, management och etik / Dept of Learning, Informatics, Management and Ethics
View/ Open:
Thesis (2.249Mb)
Abstract
The science system is large, and millions of research publications are published each year. Within the field of scientometrics, the features and characteristics of this system are studied using quantitative methods. Research publications constitute a rich source of information about the science system and a means to model and study science on a large scale. The classification of research publications into fields is essential to answer many questions about the features and characteristics of the science system.
Comprehensive, hierarchical, and detailed classifications of large sets of research publications are not easy to obtain. A solution for this problem is to use network-based approaches to cluster research publications based on their citation relations. Clustering approaches have been applied to large sets of publications at the level of individual articles (in contrast to the journal level) for about a decade. Such approaches are addressed in this thesis. I call the resulting classifications “algorithmically constructed, publications-level classifications of research publications” (ACPLCs).
The aim of the thesis is to improve interpretability and utility of ACPLCs. I focus on some issues that hitherto have not received much attention in the previous literature: (1) Conceptual framework. Such a framework is elaborated throughout the thesis. Using the social science citation theory, I argue that citations contextualize and position publications in the science system. Citations may therefore be used to identify research fields, defined as focus areas of research at various granularity levels. (2) Granularity levels corresponding to conceptual framework. In Articles I and II, a method is proposed on how to adjust the granularity of ACPLCs in order to obtain clusters corresponding to research fields at two granularity levels: topics and specialties. (3) Cluster labeling. Article III addresses labeling of clusters at different semantic levels, from broad and large to narrow and small, and compares the use of data from various bibliographic fields and different term weighting approaches. (4) Visualization. The methods resulting from Articles I-III are applied in Article IV to obtain a classification of about 19 million biomedical articles. I propose a visualization methodology that provides overview of the classification, using clusters at coarse levels, as well as the possibility to zoom into details, using clusters at a granular level.
In conclusion, I have improved interpretability and utility of ACPLCs by providing a conceptual framework, adjusting granularity of clusters, labeling clusters and, finally, by visualizing an ACPLC in a way that provides both overview and detail. I have demonstrated how these methods can be applied to obtain ACPLCs that are useful to, for example, identify and explore focus areas of research.
Comprehensive, hierarchical, and detailed classifications of large sets of research publications are not easy to obtain. A solution for this problem is to use network-based approaches to cluster research publications based on their citation relations. Clustering approaches have been applied to large sets of publications at the level of individual articles (in contrast to the journal level) for about a decade. Such approaches are addressed in this thesis. I call the resulting classifications “algorithmically constructed, publications-level classifications of research publications” (ACPLCs).
The aim of the thesis is to improve interpretability and utility of ACPLCs. I focus on some issues that hitherto have not received much attention in the previous literature: (1) Conceptual framework. Such a framework is elaborated throughout the thesis. Using the social science citation theory, I argue that citations contextualize and position publications in the science system. Citations may therefore be used to identify research fields, defined as focus areas of research at various granularity levels. (2) Granularity levels corresponding to conceptual framework. In Articles I and II, a method is proposed on how to adjust the granularity of ACPLCs in order to obtain clusters corresponding to research fields at two granularity levels: topics and specialties. (3) Cluster labeling. Article III addresses labeling of clusters at different semantic levels, from broad and large to narrow and small, and compares the use of data from various bibliographic fields and different term weighting approaches. (4) Visualization. The methods resulting from Articles I-III are applied in Article IV to obtain a classification of about 19 million biomedical articles. I propose a visualization methodology that provides overview of the classification, using clusters at coarse levels, as well as the possibility to zoom into details, using clusters at a granular level.
In conclusion, I have improved interpretability and utility of ACPLCs by providing a conceptual framework, adjusting granularity of clusters, labeling clusters and, finally, by visualizing an ACPLC in a way that provides both overview and detail. I have demonstrated how these methods can be applied to obtain ACPLCs that are useful to, for example, identify and explore focus areas of research.
List of papers:
I. Sjögårde, P., & Ahlgren, P. (2018). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics. Journal of Informetrics. 12(1), 133–152.
Fulltext (DOI)
View record in Web of Science®
II. Sjögårde, P., & Ahlgren, P. (2020). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of specialties. Quantitative Science Studies. 1(1), 207–238.
Fulltext (DOI)
View record in Web of Science®
III. Sjögårde, P., Ahlgren, P., & Waltman, L. (2021). Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches. Journal of the Association for Information Science and Technology. 72(7), 853–869.
Fulltext (DOI)
View record in Web of Science®
IV. Sjögårde, P. (2022). Improving overlay maps of science: Combining overview and detail. Quantitative Science Studies. 3(4), 1097–1118.
Fulltext (DOI)
View record in Web of Science®
I. Sjögårde, P., & Ahlgren, P. (2018). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics. Journal of Informetrics. 12(1), 133–152.
Fulltext (DOI)
View record in Web of Science®
II. Sjögårde, P., & Ahlgren, P. (2020). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of specialties. Quantitative Science Studies. 1(1), 207–238.
Fulltext (DOI)
View record in Web of Science®
III. Sjögårde, P., Ahlgren, P., & Waltman, L. (2021). Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches. Journal of the Association for Information Science and Technology. 72(7), 853–869.
Fulltext (DOI)
View record in Web of Science®
IV. Sjögårde, P. (2022). Improving overlay maps of science: Combining overview and detail. Quantitative Science Studies. 3(4), 1097–1118.
Fulltext (DOI)
View record in Web of Science®
Institution: Karolinska Institutet
Supervisor: Koch, Sabine
Co-supervisor: Sundberg, Carl Johan; Ahlgren, Per; Waltman, Ludo
Issue date: 2023-05-09
Rights:
CC BY 4.0
Publication year: 2023
ISBN: 978-91-8017-025-3
Statistics
Total Visits
Views | |
---|---|
Mapping ... | 914 |
Total Visits Per Month
March 2024 | April 2024 | May 2024 | June 2024 | July 2024 | August 2024 | September 2024 | |
---|---|---|---|---|---|---|---|
Mapping ... | 18 | 9 | 8 | 8 | 7 | 13 | 6 |
File Visits
Views | |
---|---|
Thesis_Peter_Sjögårde.pdf | 512 |
Top country views
Views | |
---|---|
United States | 218 |
Sweden | 119 |
Germany | 113 |
France | 66 |
Ireland | 58 |
Netherlands | 32 |
Denmark | 19 |
China | 18 |
Canada | 17 |
United Kingdom | 16 |
Top cities views
Views | |
---|---|
Paris | 44 |
Frankfurt am Main | 21 |
Dublin | 20 |
Stockholm | 18 |
Ashburn | 13 |
Umeå | 9 |
Berlin | 8 |
Lidkoeping | 8 |
Le Plessis-Robinson | 7 |
Montreal | 7 |