The role of arXiv, RePEc, ssrn and pmc in formal scholarly communication 1 Xuemei LI



Yüklə 343,39 Kb.
səhifə3/4
tarix31.08.2018
ölçüsü343,39 Kb.
#65563
1   2   3   4
    Bu səhifədəki naviqasiya:
  • Methods

Research Questions


The goal of this article is to assess trends in the uptake of the four major SRs and their interdisciplinary usage. The following questions drive the investigation:

  1. Has the level of use of arXiv, RePEc, SSRN and PMC increased over time, including in recent years?

  2. Have arXiv, RePEc, SSRN and PMC attracted use from other disciplines or are they essentially disciplinary silos?

The evidence used to address the above questions is taken from explicit mentions of the four SRs in academic literature citations. Although these citations are only very partial indicators of SR use, they can be used for comparisons over time and, to some extent, comparisons between SRs. They can also suggest the share of use of SRs from within different disciplines.

Methods


Scopus was chosen to count how many documents cite the four SRs because Scopus covers more publications than does WoS (journals: 21,000 vs 12,000, and conference proceedings: 17,000 vs 14,800 at the time of writing) (Elsevier, 2014; Thomson Reuters, 2014) and the overlap between Scopus and WoS is large (Gavel and Iselid, 2008). The difference in the total number of individual items, such as articles, may not be the same, however. More importantly, Scopus allows more comprehensive searches within the cited reference fields than does the WoS Cited Reference Search (Kousha et al., 2012). The following Scopus field codes were used.

  1. WEBSITE: To restrict the results to articles with a given URL in their cited references.

  2. REFSRCTITLE: To restrict the results to reference source titles.

  3. SUBJAREA: To limit the results to each of the four broad disciplinary areas.

    1. Social sciences (this encompasses the Scopus categories: Business, Management and Accounting; Social Sciences; Psychology; Economics, Econometrics and Finance; Decision Sciences): SUBJAREA(soci OR psyc OR busi OR econ OR deci)

    2. Natural sciences (this includes engineering, formal sciences and some life sciences and encompasses the Scopus categories: Agricultural and Biological Sciences; Chemistry; Mathematics; Physics; Materials Science; Engineering; Earth and Planetary Sciences; Multidisciplinary; Environmental Science; Computer Science; Biochemistry, Genetics and Molecular Biology; Veterinary; Chemical Engineering; Energy): SUBJAREA(chem OR math OR phys OR envi OR comp OR engi OR mate OR eart OR agri OR vete OR mult OR ceng OR ener OR bioc)

    3. Medical sciences (this excludes some life sciences and encompasses the Scopus categories: Health Professions; Dentistry; Pharmacology, Toxicology and Pharmaceutics; Nursing; Neuroscience; Medicine; Immunology and Microbiology): SUBJAREA(medi OR nurs OR heal OR phar OR immu OR neur OR dent)

    4. Arts and humanities (this is the Scopus Arts and Humanities category): SUBJAREA(arts)

  4. PUBYEAR: To limit the publication year, for example from 2000 to 2013: (PUBYEAR > 1999) AND (PUBYEAR < 2014)

To illustrate the above, to identify documents published from 2000 to 2013 citing arXiv URLs from the arts and humanities, the following query was used: SUBJAREA(arts) AND WEBSITE(arxiv) AND (PUBYEAR > 1999) AND (PUBYEAR < 2014). EuropePMC and PMC Canada were not included in the PMC search because PMC is the original authoritative SR; although EuropePMC and PMC Canada are in partnership with PMC, they are more biomedical literature databases (more abstracts than full-texts) rather than OA SRs. Moreover, (WEBSITE(ukpmc) OR WEBSITE(europepmc) OR WEBSITE(pubmedcentralcanada)) AND (PUBYEAR > 1999) AND (PUBYEAR < 2014) only returns 68 results and so would have little impact on the findings. WEBSITE(“*ncbi.nlm.nih.gov/pmc*”) was used for PMC. Searches for documents citing SSRN and RePEc were similar to those for arXiv, except using WEBSITE(ssrn) and WEBSITE(*repec.org*) respectively. RePEc tends to link to full-text documents on external servers which may include a ‘repec’ string in their URLs. In total, 100 random citing documents were visited for each SR citation query to check whether the matching documents cited the SR in question. Many arXiv citing documents cited arXiv in a very casual way (e.g., arXiv: 1408.6543) with no hyperlink and no category. In addition, the mirror site was heavily cited as well. WEBSITE(arxiv) therefore misses many citing documents while REF(arxiv) would include too many irrelevant results (e.g., citing documents with arXiv in document titles or anywhere else beyond the reference list). To try to capture as many relevant results as practical, query (a) was used.

(WEBSITE(*arxiv*) OR WEBSITE(*xxx.lanl.gov*) OR REFSRCTITLE(arxiv)) AND
(PUBYEAR > 1999) AND (PUBYEAR < 2014)



(a)

Random checks of 100 out of the 62,164 citing documents returned from the query (a) found one irrelevant citing document: Ivanov, P.P. (1940) Arxiv Xivinskix Xanov XIX V. Issledovanie i Opisanie Dokumentov s Istorièeskim Vvedeniem, p. 16. Leningrad: Izdanie Gosudarstvennoj Publiènoj Biblioteki from the query (REFSRCTITLE(arxiv)). To check for the prevalence of this problem, query (b) was run.

(REFSRCTITLE(arxiv) AND NOT WEBSITE(*arxiv*)) AND
(PUBYEAR > 1999) AND (PUBYEAR < 2014)



(b)

This returns 6,389 unique citing documents from REFSRCTITLE(arxiv) alone. To check how many citing documents could possibly be missing using the query (a), query (c) was run.

(REF(arxiv) AND NOT ( (WEBSITE(*arxiv*) OR WEBSITE(*xxx.lanl.gov*) OR REFSRCTITLE(arxiv)))) AND (PUBYEAR > 1999) AND (PUBYEAR < 2014)


(c)

This returns 1,524 citing documents which are mixed with more error matches. Query (a) was used despite it missing a few results and returning a few incorrect results.

All Scopus searches were conducted in August 2014 (see Appendix 1 and Appendix 2). Presumably, the majority of articles from 2013 had been indexed in Scopus by this time. Nevertheless, Scopus only counts citing documents rather than the exact number of citations and so if an article cites a repository more than once then the additional citations are ignored.

To check how often each of the citing documents cited the SRs, each citing document must be visited to find out the exact number of citations. Given the number of citing documents involved for all the four SRs, it is impractical to visit each of them. Although a random sample of 160 is reasonable (Thelwall, 2004, p. 37), in order to limit the sampling error to +/- 5%, a random sample size of 384 is necessary (Neuendorf, 2002, p. 89). After exporting all the citing documents from Scopus to Excel, the RAND() function generated a random sample of 384 citing documents for each of the four SRs. Duplicates were not checked for and removed because each sample should reflect the full spectrum of matching articles. Each of the citing documents was then visited to count the number of SR citations in order to record how many just cited the SR in question as a whole without pointing to any particular items archived by the SR and to find out how many wrongly returned citing documents from the Scopus queries to report their effectiveness. In particular, the cited arXiv and RePEc abstract/full-text links were tracked as well as SR-specific information (e.g., how many RePEc software component citations and how many PMC citations pointed to gold OA journal articles). These samples were used only for the citing checks; the main analyses were performed on the whole of Scopus.

Results and Discussion

The number of documents within the whole of Scopus citing each SR has grown quickly over time (Figure 1). The differing volumes may be due to different SR usage rates or differing sizes of the supporting scholarly communities. In addition, RePEc tends to link to full-text articles archived elsewhere rather than hosting copies of articles within the repository (Lyons and Booth, 2011), and was probably cited less as a result. PMC citing documents increased exponentially after 2009, perhaps due to NIH OA mandates since 2006.


Figure 1. Citing documents per 1000 Scopus publications from 2000 to 2013.


Documents Citing SRs at the Broad Disciplinary Level

Unsurprisingly, arXiv attracted the most citing documents from natural sciences; both RePEc and SSRN attracted the most citing documents from the social sciences; and PMC attracted the most citing documents from the medical sciences (figures 2-5). Medicine is in last place in the three non-medical SRs and so PMC is by far the dominant SR for medical research. Arts and humanities research is in second place in RePEc and SSRN, presumably due to the overlap between social science and humanities research within individual disciplines (and WoS subject categories). Natural science research within RePEc and SSRN may stem from mathematics and physics research applied to economic modelling issues, for example in econophysics and mathematical economics.




Figure 2: Documents citing arXiv per 1000 Scopus documents from the four broad disciplinary areas.

Figure 3: Documents citing RePEc per 1000 Scopus documents from the four broad disciplinary areas

c:\users\mike\documents\fig 2 arxiv new citations from broad discipline.tif

c:\users\mike\documents\fig 3 repec new citations from broad discipline.tif


Figure 4: Documents citing SSRN per 1000 Scopus documents from the four broad disciplinary areas.

Figure 5: Documents citing PMC per 1000 Scopus documents from the four broad disciplinary areas.

c:\users\mike\documents\fig 4 ssrn new citations from broad discipline.tif

c:\users\mike\documents\fig 5 pmc new citations from broad discipline.tif


Documents Citing SRs at the Individual Subject Level

The subjects most citing each SR give more detailed insights (figures 6-9). Unsurprisingly, arXiv is dominated by mathematics, physics and computer science, RePEC is dominated by economics, and PMC is mainly dominated by medical and health-related subjects. Contrasting RePEc and SSRN, both are dominated by economics but it is less dominant in SSRN. The profile of economics in SSRN is perhaps surprising, given the existence of a more specialist SR, although SSRN originated within financial economics. Within PMC, the wide range of subjects represented is perhaps surprising, although the non-medical subject areas have relevance to medicine. For example, biochemistry informs pharmaceutics, agriculture relates to the life sciences, and the environment can impact on health.



Figure 6: Top subjects citing arXiv per 1000 Scopus documents in the subject.

Figure 7: Top subjects citing RePEc per 1000 Scopus documents in the subject.







Figure 8: Top subjects citing SSRN per 1000 Scopus articles in the subject (journal and conference articles in English).

Figure 9: Top subjects citing PMC per 1000 Scopus articles in the subject (journal and conference articles in English).





Perhaps most surprisingly, arXiv attracts significantly more citations from mathematics than from any other subject area. The dominance of mathematics is not evident in Larivière et al.’s (2014) study, which found that similar proportions of 2010-2011 WoS physics (20%) and mathematics (21%) papers were in arXiv (Larivière et al., 2014, Figure 2) and a much higher proportion of references were to arXiv in WoS physics papers than in WoS mathematics papers (1995-2010). In addition, 1.4% of references in WoS physics papers from 2011 and 1% of references in WoS mathematics papers from 2011 cited arXiv preprints (Larivière et al., 2014, Figure 6A). Given that these papers have multiple references each, it is likely that this reflects a much higher proportion of papers in WoS citing arXiv. As illustrated in Figure 6, in 2011, the arXiv citing proportion is 2% for mathematics and 1% for physics. Both these numbers are much lower than could be expected from Larivière et al.’s (2014) study and also reverse the difference between mathematics and physics. The difference may be due to Larivière et al.’s (2014) method identifying ways of mentioning arXiv without using URLs, such as references with arXiv identifiers, that must have been more comprehensive than the combination of WEBSITE and REFSRCTITLE searches used here.

The physics/mathematics difference may also be due to classification and coverage differences between WoS and Scopus. Scopus covers more mathematics documents (1,447,750 at the time of writing by searching Scopus using SUBJAREA(math) AND (PUBYEAR > 1999) AND (PUBYEAR < 2014), and is 47.2% of the number of Scopus physics articles) than does WoS (689,156 at the time of writing by searching WoS using SU=(mathematics) AND PY=(2000-2013), and is 35.7% of the number of WoS indexed physics articles) and hence there is a substantial content difference between Scopus and WoS. Scopus may tend to classify documents as mathematics that are not classified as mathematics in WoS and the opposite for physics. Scopus may also index more computer science and classify some of it as mathematics (e.g., Information Processing Letters) as well as dual classifying some computer science as mathematics (e.g. Lecture Notes in Computer Science) and also dual classifying some physics as mathematics (e.g., Physica A: Statistical Mechanics and its Applications). To illustrate this, query (d) returns all arXiv citing documents in Scopus-indexed mathematics publications. Out of the five journals most citing arXiv (see table 1), only articles from Advances in Mathematics are overwhelmingly categorized as mathematics in both WoS and Scopus. Articles from the two most citing journals Lecture Notes in Computer Science and Communications in Mathematical Physics are both dually classified as mathematics with computer science and physics respectively. Both Physical Review D Particles Fields Gravitation and Cosmology and IEEE International Symposium on Information Theory Proceedings are not indexed in WoS, however, articles from the two journals are all partially mathematics although those from the former are also classified as physics, while those from the latter also as computer sciences and engineering.



(WEBSITE(*arxiv*) OR WEBSITE(*xxx.lanl.gov*) OR REFSRCTITLE(arxiv)) AND SUBJAREA(math) AND (PUBYEAR > 1999) AND (PUBYEAR < 2014)

(d)

Table 1. The five Scopus mathematics journals most citing arXiv 2000-2013.

Journal

Citing arXiv

WoS category and

% of articles in journal classified by WoS in the category



Scopus category and

% of articles in journal classified by Scopus in the category



Lecture Notes in Computer Science

1742

Maths: 5.4%

Computing:99.8%



Maths: 90.6%

Computing: 99.8%



Communications in Mathematical Physics

1205

Maths: 0%

Physics: 100%



Maths: 100%

Physics: 100%



Physical Review D Particles Fields Gravitation and Cosmology

732

Not indexed

Math: 44.6%

Physics: 100%



IEEE International Symposium on Information Theory Proceedings

606

Not indexed

Maths: 47.8%

Computing: 47.8%

Engineering:52.2%


Advances in Mathematics

474

Math: 100%

Math: 100%

Computing: 6.9%



Larivière et al. (2014)’s (probably better) method of relying upon the arXiv category that the article was uploaded to may also affect the results but the two major causes of the difference are probably the greater coverage of Scopus and the large number of citations to arXiv’s mirror site that were not included in that study. This suggests, but does not prove, that arXiv is more important in formal scholarly communication to mathematics (at least in comparison to physics) than has previously been explicitly acknowledged.

SR Citation Frequencies Per Citing Document in the random samples

Based upon the four random samples of matching documents, the Scopus queries used to search SR-citing documents were reasonably effective at returning correct citing documents. Only one arXiv citing document pointed to an irrelevant URL:




And only two SSRN-citing documents pointed to irrelevant URLs:

< http://www.landesbioscience.com/journals/rnabiology/article/SuessRNA5-1.pdf>

< http://www.cisco.com/univercd/cc/td/doc/solution/esm/qossrnd.pdf>
On average arXiv had the most citations per citing document (2.51) followed by SSRN (1.7), RePEC (1.27), and PMC (1.08) (tables 2 and 3). One article cited arXiv 37 times out of 52 references, which was more than double the maximum for the other SRs. Over 93% of the sampled citing documents only cited PMC once, in comparison to RePEc (87.8%) and then SSRN (72.9%) while less than 58% of the sampled citing documents cited arXiv only once (Table 3).

Table 2. Number of citations per paper for articles in the four random samples of 384 articles matching each respository query.






arXiv

RePEc

SSRN

PMC

Total citations

965

487

652

414

Mean

2.51

1.27

1.70

1.08

Median

1

1

1

1

Maximum

37

7

15

7

Minimum

0

1

0

1

Table 3. Frequencies of 1 to 4 citations per citing document for articles in the four random samples of 384 articles matching each respository query.

Citations

arXiv

RePEc

SSRN

PMC

1

221 (57.6%)

337 (87.8%)

280 (72.9%)

360 (93.8%)

2

69 (18.0%)

25 (6.5%)

51 (13.3%)

22 (3.6%)

3

32 (8.3%)

7 (1.8%)

14 (3.6%)

1 (0.3%)

4

16 (4.2%)

5 (1.3%)

12 (3.1%)

0 (0.0%)

Out of the 965 arXiv citations from the random sample of 384 articles matching the arXiv Scopus query, 70% were in arXiv physics categories and 17% were in arXiv mathematics categories. Out of the 384 random documents citing arXiv, 44% were categorized by Scopus at least once as physics while 36% were categorized at least once as mathematics, although the smaller difference for Scopus may be due to the way in which its journals are classified. Many of the arXiv citations are in a short format like arXiv:1011.3370 (arXiv e-print ID) rather than exact URLs. These were classified as pointing to arXiv abstracts, although the authors could assume that the link would also lead to the full text OA versions. There were 162 (17%) citations with full-text arXiv article links. Eight arXiv citations pointing to arXiv articles without indicating the article ID.

Almost all (97%) of the 487 RePEc citations in the random sample pointed to either IDEAS (393; 81%) or Econpapers (81; 17%). Most IDEAS and Econpapers citations pointed to external full-text download URLs and only 13 pointed to full-text documents, 12 of which were outside IDEAS and Econpapers. Two thirds (321; 66%) of the RePEc citations pointed to working papers, a substantial minority (73; 16%) cited software components (uniquely amongst the SRs here), and a few (39; 8%) pointed to non-OA full-text documents such as subscription-based journal articles. A fifth (105; 22%) of the RePEc citations pointed to university archives through either IDEAS or Econpapers, and the rest pointed to working paper series from the World Bank, the IMF, the NBER Working Papers, EconWPA and others. Although working papers are clearly central to RePEc, economics researchers may also get notified of new working papers through NEP (the free New Economics Papers email notification services). For example, WEBSITE(“nber.org/papers”) OR REFSRCTITLE(“NBER working paper”) returns 18,981 citing documents (also from year 2000 to 2013) for NBER working papers alone.

Only a few (48; 7%) of the 652 SSRN citations in the random sample point to SSRN articles at SSRN Working Papers or ssrn.com without article IDs or links. Twelve of the SSRN citations had disappeared, perhaps due to journal requests to remove them after submission, although faculty may also remove articles (Hahn and Wyatt, 2014).

Almost all (393; 95%) of the 414 PMC citations from the random sample pointed to full-text pdf links, although PMC provides different versions of full-text links, including HTML. Most (258; 62%) of the PMC citations pointed to gold OA journal articles (see: http://doaj.org/). the main journals were PLOS ONE (75 citations), the World Journal of Gastroenterology (69) and Environmental Health Perspectives (46). It is not clear why these authors cited the PMC archived articles rather than the OA journal sites.

Overall, arXiv was cited the most frequently in each citing document followed by SSRN, RePEc and PMC based on both the mean and frequency statistics from the four random samples, and all SR citations overwhelmingly pointed to particular articles, either their abstracts or full-texts, rather than citing a SR as a whole (exceptions: two articles cited RePEc and two cited PMC). Whilst arXiv allows links to its articles’ abstracts or full-texts; RePEc hosts abstracts and points to full-text to external servers from a wide range of working paper series; SSRN sets abstract page as the default link of an article and readers need to view the abstract page before reaching the full-text download page to ensure robust downloading counts; and PMC points to various versions of full-text articles. Not surprisingly in this context, RePEc and SSRN citations were dominated by abstract pages, a minority (17%) of arXiv citations pointed directly to full-text versions and almost all (95%) PMC citations pointed directly to full-text pdfs. Despite the substantial differences in the type of document linked to, it seems possible that the links serve broadly similar purposes for most authors, who may read the title and abstract first and then decide whether to read the full text of a paper.


Yüklə 343,39 Kb.

Dostları ilə paylaş:
1   2   3   4




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə