Overview

Generative AI (genAI for short) for software development is rapidly evolving. Everyone has their own take on what works, and almost everything is out of date within a few months.

These notes focus on genAI for rapidly prototyping experiments for testing new React app ideas. The goal is to get something that can be deployed to users to test that idea as quickly as possible. Experiments, not production code. There are other uses of genAI, e.g., for debugging or code completion, that are not discussed here.

genAI systems are based on large language models (LLMs). We will not get into how genAI and LLMs work, beyond the most basic idea that genAI uses an LLM to extend a given text with more text, using statistics derived from terabytes of texts. We will refer to "words" rather than the more accurate term "tokens".

Concepts and Concerns

Concerns, ethical and other

No one should use generative AI systems without being aware that there are valid ethical concerns about these tools. There are serious concerns about applications of genAI for policing, surveillance, hiring, mortgaging, and so on. But when the application area seems useful, such as assisting in coding, there remain concerns.

The largest most popular LLMs make use of bodies of work often under copyright. The vendors do this for profit, without compensation or even acknowledgement to the original authors. For coding, LLMs are trained are millions of lines of open-source software that is often under a license requiring attribution of use. No such attribution is given in AI-generated code. This could raise legal problems not only for genAI vendors liable, but for companies releasing products with genAI code.

A critical phase in building LLMs is reinforcement learning where hundreds of humans ask LLMs thousands of questions and rate the answers. This is tedious very low-paid labor.

Both training and generation are computationally intensive, which means high energy use. None of the major vendors are transparent about this, but there has been a clear increase in electrical demand, higher prices, and greater pollution from energy generation. The iterated complex queries that support the kind of agentic AI used in programming are especially expensive.

The economic future of genAI for coding is unclear. Right now, most genAI vendors are investing a lot of money for potential future return, but losing money right now. For this to continue, someone -- developers or the companies -- will have to pay much more than the current fairly low $10 - $20 / month subscription rates.

Links

Some resources on the ethical issues related to Large Language Models:

Prompt Engineering

You give a genAI a text, called a prompt as a starting point for the text generation. The prompt can be a question or command, or a long description of an app you want built. The genAI then iteratively and randomly selects words to follow a prompt, using word correlation data derived from the statistical analysis of terabytes of texts, fine-tuned to look more convincing by thousands of hours of human feedback on good versus bad generated texts.

Prompting is not like telling another human developer what you want, nor is it like writing code for a compiler, despite the natural language used. The results you get are far more random and variable. A genAI does not understand a prompt. It does not know what a request is. It does not think about the problem, develop an idea of the solution, and translate that into code. A genAI just takes the words in the prompt and context and adds more words based on stored word correlation statistics.

The term prompt engineering is used to describe the crafting of prompts to get better results. This is a very misleading term. There is no engineering here. In real engineering, diagnosis and design are based on well-defined principles from physics, chemistry, materials, computation, and so on. There are as yet no such causal principles explaining how generative AI systems will respond to different prompts. Prompt engineering to date is trial and error. Claims about the best way to write prompts are based on limited experiments that rarely replicate.

Agentic AI

Most people first experiment usng genAI for coding by asking a web app such as ChatGPT to generate an app that meets some specification. Then they copy the code generated into an integrated development environment (IDE) such as VS CODE.

A more common approach for serious AI-based coding is to use genAI tools in the IDE directly. Such tools are available in VS Code with Github CoPilot, Cursor, Firebase Studio, and others. This is an example of what is called agentic AI, because the AI tool is allowed to create files, install libraries, run tests, and so on. This greatly speeds up the development and testing process, but is also risky. Mistakes can lead to corrupted or deleted files. If the AI forms an incorrect diagnosis of a compiler error, it can get into a loop of making increasingly bad coding decisions.

Choosing a GenAI Model

Underlying an agentic AI system is a model. A model is an LLM that has been trained on a large data set and tuned by humans to favor specific forms of responses. Because models costs millions of dollars to create, the better models are not free. Cost is usually based on the number of tokens sent, which includes the prompt and context files, with rate limits for number of requests per day.

Most development environments, like VS Code and Cursor, let you select from several models. More can be added if you have a subscription. As with prompt engineering, picking the best model is more trial and error than science. VS Code currently uses GPT 4.1 as the default. It's fast and capable of developing a small initial app. Many developers feel Claude Sonnet 2.5 produces better code though it's slower.

Links

The many roles for AI in development

Software development is much more than writing the app code. Here are some of the many other tasks involved in development that genAI may assist, given the appropriate prompts:

  • Designing the software architecture
  • Designing data schema
  • Writing automated tests to verify the code does what is desired.
  • Generating test cases for code
  • Generating dummy data for testing
  • Helping refactor the code for maintainability
  • Checking code for performance and security issues.
  • Explaining how some existing code works
  • Understanding and fixing compile-time errors

Common GenAI Coding Failures

One thing is clear: you must be an expert in the kind of code you are asking genAI to create, especially when protoyping a small but complete app. GenAI will generate a lot of code, but it will always need debugging. Agentic AI will try to fix compile-time and linting errors, but sometimes makes the problem worse, or even write code to hide the error. If the code runs but does the wrong thing, agentic AI will not see the problem unless it shows up in a unit test -- and it's not very good at writing unit tests. You need to be skilled in reading and debugging long code to fix genAI output.

Below some of the problems that occurred repeatedly when I asked GPT 4.1 and Clause Sonnet 2.5 in VS Code to develop the mob programming rotation timer example.

  • Typescript violations:
    • Using any to avoid typing
    • Disabling tslint warnings with a comment
  • Broken imports:
    • Importing from a file that was moved, renamed, or never created
    • Incorrect handling of export default ...
    • Locally redefining an imported name
  • Broken state management:
    • Incorrect sharing of state between components
    • Inconsistent state type definition in contexts
  • Code smells:
    • Functions with repeated code
    • Very long components with many nested functions
  • Broken unit tests:
    • Incorrect mocking of modules
    • Incorrect selectors for elements on a page, e.g., assuming a label where none exists

Sometimes, GPT 4.1 would try to fix import errors by adding more imports or locally defining missing functions, but not deleting prior code, leading to even more errors.

Links

Using Github CoPilot

Github CoPilot Modes

Github CoPilot is the genAI coding tool we'll focus on here, as integrated into VS Code. VS Code comes with several ways to interact with Github CoPilot:

  • Ask mode is for asking questions about the code, brainstorming possible approaches, and so on.
  • Agent mode is for asking CoPilot to generate either an app from scratch or a similar high-level major task.
  • Edit mode is for asking CoPilot to make changes to a specific files, e.g., to refactor some repeated code into a shared utility file.
  • Inline chat is for making changes to specific lines of code.

We'll focus on Agent Mode in these notes.

CoPilot Instruction files

To get more consistent results, many AI coding tools have you create a file with the rules you want the AI to follow with all projects.

In Github CoPilot, you do this by putting the file .github/copilot-instructions.md in the top-level of your repository, i.e., outside src. In this Markdown file you put your general coding preferences. Github CoPilot will automatically include this file in all your Agent mode prompts.

The August 2025 release of VS Code added support for AGENTS.md.

Here is a example set of instructions following the coding recommendations for my agile development and prototyping classes.

    # Coding
    
    - Use Typescript.
    - Use React 19 with React Compiler enabled.
    - Define the main app component in `App.tsx`.
    - Use React hooks for state and lifecycle management.
    - Use React Context for shared state.
    - Use functional components.
    - Use arrow functions.
    - Do not use `React.FC` to type component functions.
    - Indent in prettier format.
    - Fix all linting errors.
    - Put calls to network or database services in separate module files.
    - Import functions and types explicitly by name.
    - Use semicolons at the end of each statement.
    - Use single quotes for strings.
    - Use const variables when possible.
    - Use async/await for promises.
    - Use try/catch to catch errors in async code.
    - If authentication is needed, use Firebase Google authentication.
    - If a persistent data store is needed, use Firebase Realtime Database.
    - If an app has more than one screen, add a navigation bar with links to each screen.
    
    # Styling
    
    - Use Tailwind 4 for styling.
    - Use Tailwind class names for all styling.
    - Create a responsive design that works for mobile and desktop.
    
    # Testing
    
    - Use Vitest for testing.
    - Put test files in the same directory as the component being tested.
    - Use React Testing Library for testing React components.
    - Use `vi.mock()` to mock modules with network or database calls when testing code that imports those modules.
    - Import functions and types into test code explicitly by name.
    
    ## Folder Structure
    
    - `/src`: the source code for the frontend
    - `/src/components`: files that define React components
    - `/src/types`: files that define TypeScript types or Zod schemas
    - `/src/utilities`: files that define shared JavaScript, including modules that make network or database calls
    - `/docs`: documentation for the project, including API specifications and user guides
    

    As noted earlier, an LLM is not a human intelligence. LLMs are text expanders, not reasoners. Negative prompts like "do not use setTimeout" can be problematic because they add to the expander's context the very words that you don't want the LLM to use. It's like saying "don't think of an elephant". Try to use positive prompts like "use setInterval" instead.

    Prompting for prototypes

    When it comes to defining a prototype, remember what they're for.

    The goal of a prototype is to rapidly test new ideas in real use.

    Do not confuse prototypes with products. A product needs to be stable, scalable, secure, well-engineered, and so on, A prototype needs to be complete enough for users to experience your vision of the app. Key properties of a prototype are that it works, it's easy to deploy, it's easy to use, and it has the delighter that distinguishes your idea from what's already out there. genAI can help you build and iterate such prototypes quickly.

    Focus on what you want the user experience to be, especially the part that makes your idea different. Don't focus on the architecture. Focus on the value and what it would be like to use the app. For example, here is a description of an app to manage rotations for software developers doing mob programming. It's a Markdown document, stored in the repository, for both humans and CoPilot to read and use.

      # Name
      
      - The app is called TeamTime.
      
      # Users
      
      - Users are software development teams who do mob programming with drivers and navigators.
      
      # Value proposition
      
      An easy to use rotation timer for managing and tracking mob programming sessions.
      
      # Key features
      
      Simple mobile-friendly one-screen design with the app name at the top, and below it:
        - large countdown timer, defaulting to 10 minutes, but adjustable at the start of each session,
        - a single start/pause buttonm
        - the team members, shuffled at the start of each session, with the first name highlighted
      Simple operations:
        - Tap a name to skip or include that team member in the rotation.
        - Tap start to start the timer, tap again to pause it.
        - When one minute is left, timer beeps and starts flashing.
        - When time is up, timer sounds an alarm, resets time, rotates to the next team member, and waits for start.
      Recording-keeping:
        - At end of each turn, the app logs to the console the current time, rotation duration setting, the driver, and the navigators.
      
      # Example scenario
      
      Here is an example session.
      
      - Alice, Bob, Cathy, and Dave are a team of developers.
      - Alice, Cathy, and Dave meet to do mob programming for 90 minutes.
      - Alice starts the app on her phone. 
      - It shows a countdown timer, set to 10 minutes, a start button, and a shuffled list of team member names with checkmarks.
      - The first name is highlighted. It happens to be Bob.
      - Alice taps Bob's nam because he is not there. The highlight moves to Dave.
      - Dave sits at the keyboard and starts the timer. He begins entering code suggested by the other team members. 
      - Pizza arrives, so Dave stops the timer and grabs a slice. After a few minutes, he starts the timer to continue his turn.
      - A beep at 9 minutes warns the team is almost time to rotate.
      - Whem time goes to zero, an alarm sounds. Dave stops. The highlight moves to Cathy
      - Cathy taps the start button to begin her turn.
      
      # Coding notes
      
      - Use setInterval() to implement the timer.
      - Use AudioContext to play sounds.
      - Define and import a MockAudioContext class for unit testing sounds. 
      
      # Testing notes
      - Define unit tests for skipping team members in the rotation.
      - Define unit tests for when Start and Stop should appear.
      - Define unit tests for when sounds should happen.

      Create an app vision design, focused on the user experience. Run through it in your head. Brainstorm ways to remove as many steps and interface elements as possible.

      To be effective in communicating to genAI exactly what you want, every step of the scenario should decribe either something the user does or something the app shows. To be effective in communicating the point of your app, every step should be clear about why the user does what they do -- what's their goal -- and why the app shows what it does -- what's the value being delivered.

      The scenario should include specific data. If the user is selecting from a list of items, specify those items in detail. The agentic AI will use that information to create test data. If there's a lot of data, you can try asking the genAI to generate realistic data first, then include it in your app vision. Be sure that the data includes any properties important to showing the value of your app.

      Put the vision in your repository as one of your design documents. In Agent Mode in VS Code, add the document to the context. Then your prompt to Github CoPilot can be just "Implement the app described app-vision.md."

      Links

      © 2025 Chris Riesbeck
      Template design by Andreas Viklund