How many and which databases to search?
The defining characteristics of systematic literature reviews are the fact that they use an explicit, documented and reproducible methodology. A key aspect of this methodological rigor is the use of a comprehensive, unbiased and reproducible search strategy, since the extent and quality of the searches determine the body of evidence from which to select, thus influencing the outcome of the analysis and evidence synthesis.
Searching online databases of scholarly publications form the backbone of comprehensive literature reviews, making the selection of databases to search a crucial step in conducting a high-quality systematic review.
Characteristics of a high-quality search strategy
Requirement | Description |
Comprehensive | Identify all relevant records (within reasonable time and resource constraints) |
Transparent | Well-documented queries in a defined list of databases |
Reproducible | Repeating searches must yield the same results |
Database selection considerations
An optimal selection of databases will need to take into account the size and coverage of the database. Depending on the domain you are searching in, or the geography you want to target, the databases of choice may differ. Database search capabilities and indexing properties, in terms of speed, method and whether or not the full text is indexed, also determine how broad or tailored your searches need to be. Lastly, access and cost can also be important considerations.
Property | Item | Considerations |
---|---|---|
Database size | The number of publications in the database | |
Coverage | Domain |
e.g., PubMed, Embase, IEEE Xplore
e.g.,. CENTRAL (RCTs) , CINAHL (nursing), PsycINFO (psychology and behavioral sciences), PEDro (physiotherapy) |
Publication Type |
| |
Geographic focus |
| |
Search options | Search syntax | Search syntax Controlled vocabulary To map search terms to a defined topic e.g. MeSH terms in PubMed, Emtree for Embase Automatic term mapping and expansion Boolean operators To combine search terms into a structured query string e.g., AND, OR, NOT Wildcard operators To find alternative spellings and word variants e.g., therap* to find therapy, therapies, therapeutic etc. Proximity operators To find search terms that are mentioned within a specified distance (number of words) from each other: PubMed: [field:~N], e.g. “pacemaker outcomes”[tiab:~4] will retrieve any abstract where pacemaker and outcomes are within 4 words of each other, regardless of the order, as in “clinical outcomes of pacemaker implantation” or “ pacemaker implantation outcomes”. Embase: NEAR/n (or ADJn) NEXT/n (or PRE/n). NEAR does not take into account the word order, whereas NEXT will take into account the word order specified. |
Controlled vocabulary | To map search terms to a defined topic e.g. MeSH terms in PubMed, Emtree for Embase Automatic term mapping and expansion | |
Boolean operators | To combine search terms into a structured query string e.g., AND, OR, NOT | |
Wildcard operators | To find alternative spellings and word variants e.g., therap* to find therapy, therapies, therapeutic etc. | |
Proximity operators | To find search terms that are mentioned within a specified distance (number of words) from each other: PubMed: [field:~N], e.g. “pacemaker outcomes”[tiab:~4] will retrieve any abstract where pacemaker and outcomes are within 4 words of each other, regardless of the order, as in “clinical outcomes of pacemaker implantation” or “ pacemaker implantation outcomes”. Embase: NEAR/n (or ADJn) NEXT/n (or PRE/n). NEAR does not take into account the word order, whereas NEXT will take into account the word order specified. | |
Access and price | Free vs paid |
If you are uncertain where to start, Grusenbauer et al. created an extensive compilation of the most widely used literature databases, categorizing them into principal and supplementary databases.1 This landmark paper provides a solid starting point for learning about selection of literature databases. For a more directly applicable approach, you can use the SearchSmart tool to find suitable literature databases to search.2 After specifying a few criteria such as the types of record, subject coverage and the functionality you need, the tool provides you with a list of potentially suitable databases, with a detailed description of the number and type of records, but also the search and export options available.
How many databases to search?
The number of databases to include in your SLR depends on the type of review you are doing, as well as the domain you are searching, as for some domains database coverage may mostly be fragmentary, requiring the need to include more databases to ensure comprehensiveness.
While Medline (PubMed) is a solid and de facto first choice for a principal database for most biomedical domains, PubMed alone is usually not sufficient to ensure retrieval of all relevant publications.
Whereas in a specific context (depression research) an impressive relative recall of 94%4 has been reported for Medline searches alone, a larger study covering 120 systematic reviews in diverse domains found the average relative recall of Medline to be only 72.6%5. This study also found that 92.3% of the included references where present in the Medline database (although only 72.6% were retrieved by the searches), highlighting the importance of a solid search strategy to ensure that all relevant publications in the database are actually retrieved.
At least two databases are needed to reduce the risk of missing relevant information. Ewald et al.3 compared the conclusions of 60 published Cochrane reviews that included a meta-analysis with the conclusions that would have been reached with a limited set of databases. They found that while using just two databases sometimes retrieved a smaller number of included references, the risk of changing the conclusions of the review was low, although even the combination of the three major databases, Medline, Embase and CENTRAL could not guarantee total recall or avoid any change in conclusions.
Optimal database combinations
An extensive study by Bramer et al. covering the searches of 58 published systematic reviews found that the best recall was achieved by combining Medline (PubMed), Embase, Web of Science and the first 200 results of Google Scholar.6 Based on these results, they estimate that about 60% of the published systematic reviews do not reach the recommended 95% recall.
The optimal combination of databases depends on the type of review you are conducting . For instance, when conducting an umbrella review or review of reviews that intends to only include systematic reviews, Medline in combination with Epistemonikos and reference checking was shown to provide the best recall in this case.
For Cochrane systematic reviews, the Cochrane handbook recommends Medline and Cochrane CENTRAL as the bare minimum, preferably supplemented with Embase if available.7 For systematic literature reviews to support CE marking of medical devices, MEDDEV 2.7/1 Revision 4 recommends the use of multiple databases, with PubMed/Medline preferably supplemented with a database/databases with more coverage of the European region.
Indexing
Indexing is the mechanism by which articles, books, and other scholarly content are organized and made searchable within a literature database. Indexing ensures that publications can be located accurately based on specific search terms present in title, abstract, authors, keywords, or bibliographic metadata. The scope and depth of indexing varies between databases: some index only titles, abstracts and metadata, while other databases also index the full text.
Impact of full text indexing
Using literature databases with full text indexing can be particularly valuable to ensure comprehensive retrieval of published evidence in niche topics. Searching a database that indexes the full text will allow for more tailored searches and thus less results to screen, while protecting the recall of your search. In other words, you can make searches smaller and more specific without additional risk of missing important publications.
For example, when conducting a systematic literature review for a medical device, searching databases that index the full text such as Embase, PubMedCentral or EuropePMC will reduce the number of full text publications that need to be reviewed to verify the device name, which is often not mentioned in the title or abstract of the publication.
Databases with full text indexing:
- Embase
- Cochrane Library
- EuropePMC
- IEE explore
- PubMedCentral
Impact of indexing on reproducibility
Database indexation methods also have repercussions on the reproducibility of search results: a change of a record’s indexing in a bibliographic database, can change whether it is retrieved by a search or not. This is illustrated by the phenomenon that the number of search results retrieved by PubMed searches (slightly) increases over time, even when using a fixed data range.8
This can be explained by manual corrections of the automatic indexing method for some records and the yearly update of the MeSH controlled vocabulary (Annual MeSH Processing), which can result in small changes in term translations leading to discrepancies in searches run before and after the update. PubMed has been gradually moving from manual indexing to automatic indexing since 2002. As of April 2022, all Medline journals are primarily indexed automatically, using the MTIX (Medical Text Indexer-NeXt Generation) algorithm, a machine-learning model based on neural networks, trained on millions of MEDLINE citations. While automated indexing significantly reduces the time from citation appearance to MeSH assignment (often within 24 hours), human review and curation are still applied to selected citations, particularly for complex or ambiguous topics, to maintain the quality and conceptual appropriateness of the MeSH terms assigned.
Search options
The quality of the results retrieved by a search is not only influenced by the coverage of the selected database and how it is indexed, but also by the search options the database provided. The tools a database provides for constructing query strings directly impact the possibilities to create searches that provide an optimal balance between comprehensiveness (recall) on the one hand, and specificity or precision on the other.
Important search options include:
- Controlled vocabulary such as MeSH terms is available for mapping search terms to specific topics making searches less dependent on the specific terminology used by authors. The possibility to combine free-text keywords and controlled vocabulary in a search is essential, as newer articles may not yet be indexed with subject headings.
- Boolean operators (AND, OR, NOT) to define the logic for combining terms and keywords into structured queries.
- Wildcard operators (*) to include variants or spelling alternatives for search terms
- Proximity operators to identify terms appearing within a specified distance of each other allow targeted searching for conceptually linked terms
What about Google Scholar?
Google Scholar is a search engine and not a database. This means that the reproducibility of Google Scholar searches is less than stellar (because its index is built from crawling the web). Next, it has only limited query and filtering options and cannot reach sufficient recall when used alone5. Additionally, it has no easy bulk export options and will only show the first 1000 results of any search. Have a look at this excellent blog post by Aaron Tay to see Google Scholar through the eyes of a scholarly librarian.
But if you need a search with full text indexing that also encompasses recent information and grey literature, Google Scholar will provide added value to your search strategy. For example, if you are doing a systematic literature review for a very innovative medical device, or a low risk device on which very little data is available, then Google Scholar might be just wat you need.
Overview of the main literature databases
In the table below you will find an overview of frequently used literature databases for easy reference.
Database | Scope | Publication Types | Controlled Vocabulary | Geography | Key Strength | Indexes full text | Access |
---|---|---|---|---|---|---|---|
Embase | Biomedical | Journal articles, conference abstracts | Emtree | International | Comprehensive drug and pharmacology coverage | Yes | Paid |
Cochrane CENTRAL | Healthcare interventions | Randomized controlled trials, systematic reviews | MeSH | International | Focus on clinical trials | No | Free |
Livivo | Life sciences, health sciences | Journal articles, books, reports | MeSH, UMTHES, AGROVOC | Primarily German-speaking countries | Interdisciplinary approach | Partial | Free |
EuropePMC | Life sciences | Journal articles, preprints, clinical trials | MeSH | Europe-focused, but international | Open access content | Yes | Free |
CINAHL | Nursing and allied health | Journal articles, books, dissertations | CINAHL Subject Headings | International | Nursing and allied health focus | Partial | Paid |
PsycINFO | Psychology and behavioral sciences | Journal articles, books, dissertations | APA Thesaurus | International | Comprehensive psychology coverage | No | Paid |
PEDro | Physiotherapy | Randomized trials, reviews, guidelines | – | International | Physiotherapy-specific | No | Free |
EBSCO Host | Multidisciplinary | Varies by database | Varies by database | International | Wide range of databases | Partial | Paid |
Dimensions | Multidisciplinary | Articles, grants, patents, clinical trials | Fields of Research (FOR) | International | Linked research outputs | Partial | Free/Paid |
Epistemonikos | Health and social sciences | Systematic reviews, primary studies | – | International | Focus on evidence synthesis | No | Free |
Google Scholar | Multidisciplinary | Articles, theses, books, preprints | – | International | Broad coverage, easy accessibility | Yes | Free |
PubMed | Biomedical | Journal articles, books | MeSH | International | Comprehensive biomedical coverage | No | Free |
Scopus | Multidisciplinary | Journal articles, books, conference proceedings | Varies by database | International | Broad: biomedical, engineering, economy, humanities | No | Paid |
Web of Science | Multidisciplinary | Journal articles, conference proceedings, editorials | (Keywords Plus) | International | Historical depth, citation searching | No | Paid |
References and Reading
- Gusenbauer M, Haddaway NR. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res Synth Methods. 2020;11(2):181-217. doi:10.1002/jrsm.1378
- Gusenbauer M. Searchsmart.org: Guiding researchers to the best databases and search systems for systematic reviews and beyond. Res Synth Methods. Published online November 1, 2024. doi:10.1002/jrsm.1746
- Ewald H, Klerings I, Wagner G, et al. Searching two or more databases decreased the risk of missing relevant studies: a metaresearch study. J Clin Epidemiol. 2022;149:154-164. doi:10.1016/j.jclinepi.2022.05.022
- Rice DB, Kloda LA, Levis B, Qi B, Kingsland E, Thombs BD. Are MEDLINE searches sufficient for systematic reviews and meta-analyses of the diagnostic accuracy of depression screening tools? A review of meta-analyses. J Psychosom Res. 2016;87:7-13. doi:10.1016/j.jpsychores.2016.06.002
- Bramer WM, Giustini D, Kramer BMR. Comparing the coverage, recall, and precision of searches for 120 systematic reviews in Embase, MEDLINE, and Google Scholar: A prospective study. Syst Rev. 2016;5(1):1-7. doi:10.1186/s13643-016-0215-7
- Bramer WM, Rethlefsen ML, Kleijnen J, Franco OH. Optimal database combinations for literature searches in systematic reviews: A prospective exploratory study. Syst Rev. 2017;6(1):1-12. doi:10.1186/s13643-017-0644-y
- Higgins J, Thomas J, Chandler J, Al. E. Cochrane Handbook for Systematic Reviews of Interventions. 2nd edn. John Wiley & Sons; 2019.
- Burns CS, Nix T, Shapiro RM, Huber JT. Methodological Issues with Search in MEDLINE: A Longitudinal Query Analysis. Published online May 22, 2020. doi:10.1101/2020.05.22.110403