Jump to content

Evaluating platforms

Whether a tool is good or bad depends on its use and the expectation of the result. On this page, we provide a number of questions that you should consider when using generative AI.

Evaluating platforms

Whether a tool is good or bad depends on its use and the expectation of the result. On this page, we provide a number of questions that you should consider when using generative AI.

Five key points to evaluate your use

  1. 1

    The scope of sources the platform covers

    Depending on your specific use, it may be more or less relevant to have a correct, trustworthy, and complete answer. For example, if you are working on tasks where it is necessary to know the exact scope of sources, then general platforms such as ChatGPT may have difficulty providing this information, other than that it has 'scraped' and read all publicly available information on the internet. However, if you are simply using generative AI to find inspiration, then it may be less important.

    Other platforms will be able to provide more precise information about the data that underlies their language model, including considerations about how material is processed to end up in a language model.

  2. 2

    Content cut-off date

    Many platforms have a content cut-off date, for when they have included content in their language model. This can often be important for assessing how up-to-date the result is. Similarly, it can be relevant to assess whether you can get a form of temporal understanding of the development within an area. This can mean that recent assumptions and results are drowned out by older ones, because the older ones dominate in the language model.

  3. 3

    Norms and values

    Platforms generate their responses (whether text or images) based on the training data that forms the basis of their language model. There may be several factors that make a given result biased. This can manifest itself in several ways:

    • The language model is trained on data that represents a particular point of view, culture, stereotypical perceptions, value system, or similar. Or that the data is pre-cleaned in accordance with ethical norms defined by the platform?
    • The platform is coded to suppress selected statements in the source material if they go against the ethical and moral norms. If you are investigating skepticism about climate change, will a given platform actually allow these views to emerge?

    Read more about some of these issues in Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study (Cao et al., C3NLP 2023).

  4. 4

    Can the platform generate fictional content?

    Some platforms use language models and algorithms that fill gaps in knowledge needed to generate an output with new, "creative" content that does not exist in reality. This phenomenon is also referred to as "hallucination" by some. However, not all platforms have this approach, so it is important to research whether the platform you want to use has this feature or not. A Google search or similar on "How does xyz platform work", a FAQ on the service, or other resources will often be able to provide you with the answer.

  5. 5

    Do you have enough knowledge to evaluate the result?

    You should always ask yourself the question: Do I have enough knowledge to evaluate whether an output from a generative AI platform is correct? If you have limited knowledge of a subject, it may be tempting to get an introduction or similar from generative AI. But given the challenges described above, this is not without risks. Therefore, you should always seek other sources that can verify the output created by generative AI.