API Open Research Corpus Supp.ai Dataset CORD-19 Dataset Contact

Semantic Scholar Open Research Corpus

Semantic Scholar's records for research papers published in all fields provided as an easy-to-use JSON archive.

If you are interested in one-off, request-based data, please see our RESTful API.

Download full, sampled, and archived versions of the corpus.

Example Paper Record

  "id": "4cd223df721b722b1c40689caa52932a41fcc223",
  "title": "Knowledge-rich, computer-assisted composition of Chinese couplets",
  "paperAbstract": "Recent research effort in poem composition has focused on the use of
   automatic language generation...",
  "entities": [
  "fieldsOfStudy": [
      "Computer Science"
  "s2Url": "https://semanticscholar.org/paper/4cd223df721b722b1c40689caa52932a41fcc223",
  "pdfUrls": [
  "s2PdfUrl": "",
  "authors": [
      "name": "John Lee",
      "ids": [
  "inCitations": [
  "outCitations": [
  "year": 2016,
  "venue": "DSH",
  "journalName": "DSH",
  "journalVolume": "31",
  "journalPages": "152-163",
  "sources": [
  "doi": "10.1093/llc/fqu052",
  "doiUrl": "https://doi.org/10.1093/llc/fqu052",
  "pmid": "",
  "magId": "2050850752"

Attribute Definitions

id  string

S2 generated research paper ID.

title  string

Research paper title.

paperAbstract  string

Extracted abstract of the paper.

entities  list

Extracted entities (deprecated on 2019-09-17)

s2Url  string

URL to S2 research paper details page.

pdfUrls  list

URLs related to this PDF scraped from the web.

s2PdfUrl  string

Usable PDF Url (deprecated on 2020-05-27)

authors  list

List of authors with an S2 generated author ID and name.

inCitations  list

List of S2 paper IDs which cited this paper.

outCitations  list

List of S2 paper IDs which this paper cited.

fieldsOfStudy  list

Zero or more fields of study this paper addresses.

year  int

Year this paper was published as integer.

venue  string

Extracted publication venue for this paper.

journalName  string

Name of the journal that published this paper.

journalVolume  string

The volume of the journal where this paper was published.

journalPages  string

The pages of the journal where this paper was published.

sources  list

Identifies papers sourced from DBLP or Medline.

doi  string

Digital Object Identifier registered at doi.org.

doiUrl  string

DOI link for registered objects.

pmid  string

Unique identifier used by PubMed.

magId  string

Unique identifier used by Microsoft Academic Graph.


Semantic Scholar Open Research Corpus is licensed under ODC-BY.

When using the Semantic Scholar Open Research Corpus (“S2 ORC”) in a product or service, or including data in a redistribution, please cite the following paper:

Waleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL

This site is provided by The Allen Institute for Artificial Intelligence (“AI2”) as a service to the research community. The site is covered by AI2 Terms of Use and Privacy Policy. AI2 does not claim ownership of any materials on this site unless specifically identified. AI2 does not exercise editorial control over the contents of this site. AI2 respects the intellectual property rights of others. If you believe your copyright or trademark is being infringed by something on this site, please follow the "DMCA Notice" process set out in the Terms of Use.

BibTex format:

    {"title={Construction of the Literature Graph in Semantic Scholar},"}
    {"author={Waleed Ammar and Dirk Groeneveld and Chandra Bhagavatula and Iz Beltagy and Miles Crawford and Doug Downey"}
    {" and Jason Dunkelberger and Ahmed Elgohary and Sergey Feldman and Vu Ha and Rodney Kinney"}
    {" and Sebastian Kohlmeier and Kyle Lo and Tyler Murray and Hsu-Han Ooi and Matthew Peters and Joanna Power"}
    {" and Sam Skjonsberg and Lucy Lu Wang and Chris Wilhelm and Zheng Yuan and Madeleine van Zuylen and Oren Etzioni},"}