11. Arxiv
arXiv is an open-access archive of 2 million academic papers in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. API Documentation
To access the Arxiv document loader, you need to install the arxiv, PyMuPDF, and langchain-community integration packages.
PyMuPDF converts PDF files downloaded from the arxiv.org site to text format.
# installation
# !pip install -qU langchain-community arxiv pymupdfObject creation
Now we can instantiate a model object and load the document:from langchain_community.document_loaders import ArxivLoader
# Query Enter the topic of the paper you want to search for.
loader = ArxivLoader(
query="Chain of thought",
load_max_docs=2, # Maximum number of documents
load_all_available_meta=True, # Whether to load full metadata
)# Document loading result output
docs = loader.load()
docs[Document(metadata={'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning demonstrations, to guide the model to\nreason step-by-step while reducing reasoning mistakes. To improve\ngeneralization, we introduce an automatic method to construct contrastive\ndemonstrations. Our experiments on reasoning benchmarks demonstrate that\ncontrastive chain of thought can serve as a general enhancement of\nchain-of-thought prompting.', 'entry_id': 'http://arxiv.org/abs/2311.09277v1', 'published_first_time': '2023-11-15', 'comment': None, 'journal_ref': None, 'doi': None, 'primary_category': 'cs.CL', 'categories': ['cs.CL'], 'links': ['http://arxiv.org/abs/2311.09277v1', 'http://arxiv.org/pdf/2311.09277v1']}, page_content='Contrastive Chain-of-Thought Prompting\nYew Ken Chia∗1,\nDeCLaRe\nGuizhen Chen∗1, 2\nLuu Anh Tuan2\nSoujanya Poria\nDeCLaRe\nLidong Bing† 1\n1DAMO Academy, Alibaba Group, Singapore
...
(syncopation)
...
Least-to-most prompting enables com-\nplex reasoning in large language models. In The\nEleventh International Conference on Learning Rep-\nresentations.\n'),
Document(metadata={'Published': '2024-03-23', 'Title': 'Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models', 'Authors': 'Yao Yao, Zuchao Li, Hai Zhao', 'Summary': "With the widespread use of language models (LMs) in NLP tasks, researchers\nhave discovered the potential of Chain-of-thought (CoT) to assist LMs in\naccomplishing complex reasoning tasks by generating intermediate steps.\nHowever, human thought processes are often non-linear, rather than simply\nsequential chains of thoughts. Therefore, we propose Graph-of-Thought (GoT)\nreasoning, which models human thought processes not only as a chain but also as\na graph. By representing thought units as nodes and connections between them as\nedges, our approach captures the non-sequential nature of human thinking and\nallows for a more realistic modeling of thought processes. GoT adopts a\ntwo-stage framework with an additional GoT encoder for thought graph\nrepresentation and fuses the graph representation with the original input\nrepresentation through a gated fusion mechanism. We evaluate GoT's performance\non a text-only reasoning task (AQUA-RAT) and a multimodal reasoning task\n(ScienceQA). Our model achieves significant improvement over the strong CoT\nbaseline on the AQUA-RAT test set and boosts accuracy from 85.19% to 87.59%\nusing the T5-base model over the state-of-the-art Multimodal-CoT on the\nScienceQA test set.", 'entry_id': 'http://arxiv.org/abs/2305.16582v2', 'published_first_time': '2023-05-26', 'comment': None, 'journal_ref': None, 'doi': None, 'primary_category': 'cs.CL', 'categories': ['cs.CL'], 'links': ['http://arxiv.org/abs/2305.16582v2', 'http://arxiv.org/pdf/2305.16582v2']}, page_content='Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in\nLanguage Models\nYao Yao1,2, Zuchao Li3,∗and Hai Zhao1,2,∗
...
(syncopation)
...
The answer is (B)\n(D) mix\nwrong rationales wrong answer\nwrong rationales wrong answer\nFigure 11: Examples of ScienceQA\nthree objects\nhave in\ncommon\nobject\nhas\ndifferent properties\nput objects into\ngroups\na hard object\ncan attach to\nother things\nis\ncolor\nblue\n49.56\n44.00\nFigure 12: Representation visualization\n')]load_all_available_meta=False In this case, only part of the metadata is output, not all.
summation(summary)
If you want to print a summary rather than the full text of the paper, call the get_summaries_as_docs() function.
lazy_load()
When loading documents in bulk, if you can perform downstream operations on a subset of all loaded documents, you can lazy load the documents one at a time to minimize memory usage.
Last updated