Large language models (LLMs) have made significant progress in NLP. However, their ability to memorize, represent, and leverage commonsense knowledge has been a well-known pain point. In this paper, we specifically focus on ChatGPT, a widely used and easily accessible LLM, and ask the following questions: (1) Can ChatGPT effectively answer commonsense questions? (2) Is ChatGPT aware of the underlying commonsense knowledge for answering a specific question? (3) Is ChatGPT knowledgeable in commonsense? (4) Can ChatGPT effectively leverage commonsense for answering questions? We conduct a series of experiments on 11 datasets to evaluate ChatGPT's commonsense abilities, including answering commonsense questions, identifying necessary knowledge, generating knowledge descriptions, and using knowledge descriptions to answer questions again. Experimental results show that: (1) ChatGPT can achieve good QA accuracies in commonsense tasks, while still struggling with certain domains of datasets. (2) ChatGPT is knowledgeable, and can accurately generate most of the commonsense knowledge using knowledge prompts. (3) Despite its knowledge, ChatGPT is an inexperienced commonsense problem solver, which cannot precisely identify the needed commonsense for answering a specific question. These findings raise the need to explore improved mechanisms for effectively incorporating commonsense into LLMs like ChatGPT, such as better instruction following and commonsense guidance.

## Overview

- Large language models (LLMs) have made significant progress in natural language processing (NLP), but their ability to effectively represent and use commonsense knowledge has been a challenge.
- This paper focuses on evaluating the commonsense abilities of the widely-used LLM, ChatGPT, through a series of experiments.
- The key questions addressed are: Can ChatGPT answer commonsense questions effectively? Is it aware of the underlying commonsense knowledge required? How knowledgeable is it in commonsense? Can it leverage commonsense to answer questions?

## Plain English Explanation

Large language models (LLMs) like [ChatGPT](https://aimodels.fyi/papers/arxiv/if-machine-is-as-good-as-me) have shown impressive capabilities in natural language processing (NLP) tasks. However, one of their well-known limitations is their ability to truly understand and utilize commonsense knowledge. Commonsense refers to the basic, intuitive understanding of the world that humans acquire through everyday experiences.

In this paper, the researchers closely examined the commonsense abilities of ChatGPT, a widely-used and accessible LLM. They wanted to answer several key questions: Can ChatGPT correctly answer commonsense questions? Is it aware of the underlying commonsense knowledge needed to answer those questions? How knowledgeable is ChatGPT in commonsense overall? And can it effectively leverage that knowledge to answer questions?

To find out, the researchers conducted a series of experiments using 11 different datasets that test various aspects of commonsense reasoning. They looked at ChatGPT's performance in answering commonsense questions, its ability to identify the necessary knowledge, and its capacity to generate and then use that knowledge to answer questions again.

## Technical Explanation

The researchers conducted a comprehensive evaluation of ChatGPT's commonsense abilities through a series of experiments on 11 different commonsense reasoning datasets. 

First, they assessed ChatGPT's performance in directly answering commonsense questions across various domains, such as physical reasoning, social understanding, and causal relations. The results showed that ChatGPT can achieve good question-answering (QA) accuracy on many commonsense tasks, but still struggles with certain dataset domains.

Next, the researchers investigated whether ChatGPT is aware of the underlying commonsense knowledge needed to answer specific questions. They did this by prompting ChatGPT to generate descriptions of the relevant knowledge for a given question. The findings indicate that ChatGPT is generally knowledgeable about commonsense and can accurately describe the necessary knowledge for most questions.

However, the third experiment revealed that despite its commonsense knowledge, ChatGPT is not an experienced commonsense problem solver. When asked to identify the specific knowledge required to answer a question, ChatGPT often struggled to precisely pinpoint the relevant information.

Finally, the researchers had ChatGPT use the generated knowledge descriptions to answer the questions again. This showed that while ChatGPT has the necessary commonsense knowledge, it has difficulty effectively leveraging that knowledge to solve commonsense reasoning problems.

## Critical Analysis

The paper provides valuable insights into the commonsense capabilities of the widely-used ChatGPT model. The researchers' systematic approach of evaluating different aspects of commonsense reasoning, such as question answering, knowledge identification, and knowledge application, offers a comprehensive view of ChatGPT's strengths and limitations in this area.

One limitation mentioned in the paper is that the experiments were conducted on a fixed set of commonsense datasets, which may not fully capture the breadth of commonsense knowledge required in real-world scenarios. Additionally, the paper does not delve into the potential reasons behind ChatGPT's struggles with certain commonsense domains or its difficulty in precisely identifying and leveraging relevant knowledge.

Further research could explore ways to enhance ChatGPT's commonsense reasoning abilities, such as through improved instruction following, commonsense-specific fine-tuning, or the integration of additional knowledge sources. Investigating how ChatGPT's commonsense performance compares to other LLMs or human benchmarks could also provide valuable insights.

## Conclusion

This research paper offers a detailed examination of the commonsense capabilities of the widely-used ChatGPT model. The findings suggest that while ChatGPT can achieve good performance on many commonsense reasoning tasks, it still faces challenges in effectively representing, identifying, and leveraging commonsense knowledge to solve problems.

The study highlights the need for continued advancements in incorporating commonsense understanding into large language models like ChatGPT. Addressing these limitations could significantly improve the models' ability to reason about and interact with the world in a more natural and intuitive way, ultimately enhancing their real-world applicability across various domains.