4 Prompt Engineering for Research: Techniques, Workflows, and Evaluation

Content

4.0.0.0.0.1 Introduction to Prompt Engineering

4.0.0.0.0.2 Few-shot and zero-shot prompting: inline examples and template selection

4.0.0.0.0.3 Complex and reproducible prompt chaining and pipelines (e.g. bibliography → topic modeling → synthesis)

4.0.0.0.0.4 Advanced prompts: metadata extraction, outline generation, stylistic paraphrasing

4.0.0.0.0.5 Iterating and assessing the quality of responses

4.0.0.0.0.6 Guidelines for Creating Academic Prompts

4.1 1 Introduction to Prompt Engineering

4.1.1 1.1 What is a prompt?

In the context of Generative Artificial Intelligence (GAI), a prompt is a textual instruction given to the model to generate an output. It is not simply a question, but a structured linguistic command that defines objectives, constraints, register, and output format.
In academic research, prompt design is a crucial methodological moment, as the quality of the generated content largely depends on it.

Important

In SSH research (Social Sciences and Humanities), a well-crafted prompt typically:
- Clearly defines the task objective (e.g.”Analyze the sentiment of…“)
- Specifies the desired output format (e.g.”Return a bullet list of…“)
- Provides examples or constraints, when needed (e.g.”Use formal Italian register”)

4.1.2 1.2 The Prompt Engineering

Large Language Models (LLMs) respond to textual input with variable linguistic articulation depending on training, context, and prompt structure. The way the request is formulated directly influences the type of output. The same question can produce very different results depending on form, specificity, or presence of examples.

Prompt Engineering is the discipline of designing and optimizing prompts to guide AI models—particularly LLMs—toward the desired output. It involves methodological principles such as clarity, relevance, structure, logical progression and iteration.

4.1.3 1.3 Hallucinations and how to prevent them

“Hallucinations” occur when the model generates unverified or entirely fabricated statements, often attributing fictitious details to nonexistent sources.
These errors stem from the statistical nature of Machine Learning and require strategies of cross-validation and guided reinforcement.

Anti-hallucination strategies

- Cross-source validation: ask the model for references, citations, or URLs and manually verify them.
- Confirmation queries: include a verification clause such as: *“Are you sure?* Please briefly validate this data.”
- Few-shot learning: provide 2–3 examples distinguishing between accurate and incorrect responses in order to guide its behavior and improve its accuracy.
- Iterative refinement: generate a first draft and request successive revisions to address inaccuracies, documenting each iteration.

4.1.4 1.4. Foundational principles for effective and reliable prompts

4.1.4.1 Clarity and Specificity

Frame the task unambiguously. Avoid vague requests like “write something about…”. Always indicate the desired result (e.g.,“return a table with columns X, Y, and Z”).
Explicitly define the output format (e.g., bullet list, 100-word paragraph, numbered list).

4.1.4.2 Full Context

Provide all relevant information, such as text excerpts, definitions of technical terms, methodological constraints, or the target audience (e.g., “text aimed at SSH researchers”).
Specify tone and register (formal, didactic, expository) to ensure stylistic consistency.

4.1.4.3 Guided Iteration

Do not settle for the first draft. Test small variations to observe how each affects the output.
Test A/B versions, saving each version for comparative analysis and preserve the various iterations for later benchmarking.

4.2 2. “Few-shot” and “Zero-shot” prompting: inline examples and template selection

Prompt design plays a crucial role in output quality when using LLMs for academic purposes.
In particular, zero-shot and few-shot prompting strategies represent two distinct but complementary approaches, aimed at orienting the behavior of the model in the absence or presence of explicit examples.
The choice between zero-shot and few-shot prompting must be guided by the nature of the task, the degree of formalization expected and the need to control the variability of the output.
Both modes can be adopted within more complex pipelines, integrated with validation, review and iterative refinement tools.

4.2.0.1 Zero-shot Prompting

Zero-shot prompting uses explicit instructions without concrete examples. It assumes the model can infer the task solely from a clear command. This is useful for standard or generic tasks but more prone to ambiguity or result variability.

4.2.0.2 Few-shot Prompting

Few-shot prompting embeds one or more examples within the prompt, guiding the model to replicate a demonstrated pattern. In academic contexts, it is especially effective for standardizing formats (e.g., abstracts, bibliographic entries, methodological summaries) and maintaining stylistic coherence.

4.2.0.3 Template Use

Template selection and adaptation are central to this strategy. Recurring structures — e.g., “Question → Extract → Summary” or “Title → Objective → Methodology → Results” —facilitate comparable outputs and enhance compatibility with archival and analytical systems.

4.3 3. Prompt chaining and complex pipelines: a modular approach to output construction

Academic tasks involving progressive information processing — such as literature reviews, thematic analysis or argument construction — benefit significantly from complex pipelines based on prompt chaining.
This approach entails sequential execution of multiple prompts, each serving a specific function within a structured workflow.
The output from each stage becomes the input for the next, following a modular, cumulative logic.

Complex pipelines

Complex pipelines differ markedly from the isolated use of LLMs, as they aim to structure a distributed cognitive process, in which each step contributes to the construction of a coherent, documentable and verifiable final result.
Their adoption makes it possible not only to divide cognitively dense tasks into more manageable units, but also to improve methodological control and transparency of automatic processing processes.

Application example	“Systematic review of the literature”
Retrieving relevant sources	Using a prompt to query databases or tools such as Elicit or Perplexity AI, in order to identify relevant articles on a given topic.
Structured metadata extraction	Prompts aimed at extracting and organizing information such as title, authors, date, methodology, type of study, subject area.
Theme recognition and clustering	Application of prompts aimed at identifying recurring concepts, semantic classification and building thematic maps.
Comparative synthesis of results	Synthesis command to produce an integrated view of the evidence, with comparison between approaches, results and theoretical positions.
Generation of the final output	Last prompt to transform the collected material into a finished product, such as a thematic overview, a bibliographic annotation or an articulated abstract.

Complex pipelines differ from isolated prompt use: they aim to construct a distributed cognitive process.

Benefits include: Modularity - Reproducibility - Methodological transparency - Output traceability.

4.4 4. Advanced prompts: metadata extraction, outline generation, stylistic paraphrasing

The use of LLMs in Academia is not limited to the simple production of texts, but can be extended to more sophisticated functions through the use of advanced prompts.
These allow you to guide the model in carrying out structured, analytical or transformative tasks, which require a more precise configuration of the prompt and a greater awareness of the semantic capabilities of the model.

Among the most relevant applications are:
- The automatic extraction of metadata from scientific articles or other structured documents.

Through targeted prompts, it is possible to isolate information such as author, year of publication, methodological context, reference discipline or type of study. This type of operation is particularly useful for the construction of bibliographic datasets, the compilation of structured repertoires and the automated analysis of literature.
- The generation of outlines.
During the planning or drafting of scientific contributions, it is possible to ask the model to build logical schemes, argumentative structures or section plans consistent with disciplinary standards. These outlines can then be integrated, modified or expanded by the researcher, acting as a support for the design of articles, reports, project proposals or theses.
- Targeted stylistic paraphrases.
These are reformulations of existing content according to a specific style: formal, technical, popular, or compliant with certain disciplinary registers. This functionality is used in the revision of texts, in the linguistic adaptation for international publications or in the production of multiple versions of the same content for teaching, editorial or communication purposes.

Applications include:

Structured metadata from PDFs (author, date, methods).
Academic outlines following disciplinary logic.
Stylistic rewriting (formal → plain, Italian → English, etc.).

Obiettivo:	Scrivere una sintesi tematica a partire da un corpus bibliografico
	Funzione	Tecnica
1	Inserimento articoli	prompt + upload PDF / DOI
2	Estrazione metadati (autore, anno, metodo)	prompt strutturato
3	Riconoscimento dei temi ricorrenti	prompt di classificazione
4	Sintesi comparativa dei risultati	prompt di sintesi con vincolo stilistico
5	Output finale in formato accademico	template APA o report Markdown

Warning

Effective use of advanced prompts requires a fine understanding of the model’s capabilities and limitations, as well as a design ability geared toward controlling the output.
This is a particularly promising area of experimentation for the world of research, in which AI is used not to replace writing, but to enhance its preparatory, analytical and stylistic phases.

See: Giray,L.”Prompt Engineering with ChatGPT: A Guide for Academic Writers”..
See: Generative Artificial Intelligence Prompt Engineering Overview.

4.5 5. Iteration and Evaluation of output quality

The effectiveness of a prompt cannot be considered a static datum, but the result of an iterative optimization process.
The interaction with the model requires an experimental and progressive logic, in which the answers obtained must be constantly subjected to verification, reformulation and comparison.
Iteration consists of the targeted repetition of the prompting with incremental changes as variations in the vocabulary, in the instructions order, in the level of specificity or in the structure of the expected format.
This process refines the quality of the output, reducing the interpretative ambiguities of the model and improving consistency with the researcher’s objectives.
More than a simple linguistic refinement, it is a methodological mechanism that allows you to explore the sensitivity of the model to the different input parameters.

The evaluation of the quality of the answers requires clear and shared criteria.
In the academic field, the analysis includes not only formal and grammatical correctness, but also other aspects such as:
- conceptual accuracy (absence of factual errors or unjustified inferences)
- relevance to the demand
- argumentative cohesion
- adherence to stylistic or disciplinary standards
- transparency of sources and implicit assumptions (where relevant).

Evaluation cannot be entrusted to generic indicators or intuitive judgments, but must be based on grids or reference models compatible with scientific research and communication practices.
In particular, when the results are used as preliminary materials for publications, systematic reviews or teaching support, it is advisable to document the choices made, justify any reformulations and point out the limits of the output generated.

In summary, employing LLMs in the academic field means applying reflective and responsible logic to ensure the reliability and relevance of the content produced.Prompt Engineering in research must be treated as a methodological discipline, not a casual interaction. It supports transparency, reproducibility, and epistemic control in AI-assisted scholarship.

4.6 6. Guidelines for creating academic prompts

Below a structured in 6 steps useful for obtaining maximum results in academic research.

1 - Precise definition of the purpose of the research

You need to be transparent about what you want to achieve. A vague prompt leads to vague answers, and you have to turn the search questions into direct instructions for the AI model.
The goal must be clear:
(incorrect) “Give me an overview of X.”
(corrected) “Identify the most recent peer-reviewed articles on topic X.”
The question must be transformed into an instruction.
To the question “What is the role of AI in qualitative research?”, the prompt should be:
“List the 5 peer-reviewed articles published from 2022 to 2024 on the role of AI in qualitative research, indicating title, authors, journal, and DOI.”

2 - Provide context and selection criteria

Artificial intelligence needs the necessary context to focus the search and provide relevant results.
It is important to include the following parameters:
- Disciplinary scope: e.g., “in the context of social sciences” or “with a focus on bioinformatics”.
- Time range: e.g., “from the last three years” or “before 2020”.
- Type of output desired: “Return a CSV table with columns: Title, Authors, Year, DOI.”
- “Generate a bulleted list.”
- “Provide a detailed explanation.”

3 - Provide examples (Few-Shot prompting)

Showing the AI one or two examples of the desired output, as if it were a model to follow, allows for a more accurate and coherent response.
Examples must be clear, promptly including a few pre-filled rows in the desired format: Title, Authors, Year, DOI
“AI in Social Sciences: A Review”, Rossi et al., 2023, 10.1234/ai.socsci.2023.01
“Qualitative Artificial Intelligence Methods”, Smith & Lee, 2022, 10.5678/qual.ai.2022.02
Then ask:
“Now continue with three more articles following the same format.”

4 - Break the task into multiple steps (Chain of Thought)

For complex or multidisciplinary tasks, break the workflow into sequential steps to help the model reason and reduce hallucinations.
How to guide AI reasoning:
step 1) “Search for articles using keywords X, Y, Z.”
step 2) “Filter for high-impact journals (Q1/Q2).”
step 3) “Sort results in reverse chronological order.”
step 4) “Summarize the main findings of each article in 50 words.”

5 - Implement subsequent consistency checks

It is advised not to blindly trust the outputs received from AI. You can ask the model to verify its own results or justify its choices.
You may proceed as follows:
- Request for justification: “Briefly explain why each article is relevant to my research on X.”
- Fact-Check: “For each DOI, confirm its existence and validity on CrossRef.”
- Cross-Review: “Review the table and flag any inconsistencies or potential hallucinations.”

6 - Iterate and document

Building effective prompts can be considered an iterative process, as it is unlikely to achieve the desired result on the first try.
- Track versions: record each prompt variant with date and changes to identify best practices.
- Compare results: keep only the prompts that produce the most accurate and complete outputs.
- Analyze errors: when hallucinations occur (e.g., non-existent articles, incorrect DOIs), refine your prompt by adding specific constraints (e.g., “use only Scopus or PubMed databases” or “exclude predatory journals”).