discover-important-function

@benchflow-ai/discover-important-function

benchflow-ai

230

165 forks

Updated 1/18/2026

View on GitHub

When given a project codebase, this skill observes the important functions in the codebase for future action.

Installation

$skills install @benchflow-ai/discover-important-function

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositorybenchflow-ai/skillsbench

Pathtasks/setup-fuzzing-py/environment/skills/discover-important-function/SKILL.md

Branchmain

Scoped Name@benchflow-ai/discover-important-function

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions

name: discover-important-function description: "When given a project codebase, this skill observes the important functions in the codebase for future action."

Fuzz Target Localizer

Purpose

This skill helps an agent quickly narrow a Python repository down to a small set of high-value fuzz targets. It produces:

A ranked list of important files
A ranked list of important functions and methods
A summary of existing unit tests and inferred oracles
A final shortlist of functions-under-test (FUTs) for fuzzing, each with a structured "note to self" for follow-up actions

When to use

Use this skill when the user asks to:

Find the best functions to fuzz in a Python package/library
Identify parsers, decoders, validators, or boundary code that is fuzz-worthy
Decide what to fuzz based on existing test coverage and API surface
Produce a structured shortlist of fuzz targets with harness guidance
Automating testing pipeline setup for a new or existing Python project.

Do not use this skill when the user primarily wants to fix a specific bug, refactor code, or implement a harness immediately (unless they explicitly ask for target selection first).

Inputs expected from the environment

The agent should assume access to:

Repository filesystem
Ability to read files
Ability to run local commands (optional but recommended)

Preferred repository listing command:

Use tree to get a fast, high-signal view of repository structure (limit depth if needed).

If tree is not available, fall back to a recursive listing via other standard shell tooling.

Outputs

Produce result:

A report in any format with the following information: - Localize important files - Localize important functions - Summarize existing unit tests - Decide function under test for fuzzing This file should be placed in the root of the repository as APIs.txt, which servers as the guidelines for future fuzzing harness implementation.

Guardrails

Prefer analysis and reporting over code changes.
Do not modify source code unless the user explicitly requests changes.
If running commands could be disruptive, default to read-only analysis.
Avoid assumptions about runtime behavior; base conclusions on code and tests.
Explicitly consider existing tests in the repo when selecting targets and deciding harness shape.
Keep the final FUT shortlist small (typically 1–5).

Localize important files

Goal

Produce a ranked list of files that are most likely to contain fuzz-worthy logic: parsing, decoding, deserialization, validation, protocol handling, file/network boundaries, or native bindings.

Procedure

Build a repository map
- Start with a repository overview using tree to identify:
  - Root packages and layouts (src/ layout vs flat layout)
  - Test directories and configs
  - Bindings/native directories
  - Examples, docs, and tooling directories
- Identify packaging and metadata files such as:
  - pyproject.toml
  - setup.cfg
  - setup.py
- Identify test configuration and entry points:
  - tests/
  - conftest.py
  - pytest.ini
  - tox.ini
  - noxfile.py
Exclude low-value areas
- Skip: virtual environments, build outputs, vendored code, documentation-only directories, examples-only directories, generated files.
Score files using explainable heuristics Assign a file a higher score when it matches more of these indicators:
- Public API exposure: init.py re-exports, all, api modules
- Input boundary keywords in file path or symbols: parse, load, dump, decode, encode, deserialize, serialize, validate, normalize, schema, protocol, message
- Format handlers: json, yaml, xml, csv, toml, protobuf, msgpack, pickle
- Regex-heavy or templating-heavy code
- Native boundaries: ctypes, cffi, cython, extension modules, bindings
- Central modules: high import fan-in across the package
- Test adjacency: directly imported or heavily referenced by tests
Rank and select
- Produce a Top-N list (default 10–30) with a short rationale per file.

Output format

For each file:

path
score (relative, not necessarily normalized)
rationale (2–5 bullets)
indicators_hit (list)

Localize important functions

Goal

From the important files, identify and rank functions or methods that are strong fuzz targets while minimizing full-body reading until needed.

Approach 1: AST-based header and docstring scan

For each localized Python file, parse it using Python’s ast module and extract only:

Module docstring
Function and async function headers (name, args, defaults, annotations, decorators)
Class headers and method headers
Docstrings for modules, classes, and functions/methods

Do not read full function bodies during the initial pass unless needed for disambiguation or final selection.

Procedure

Build per-file declarations via AST
- Parse the file with ast
- Enumerate:
  - Top-level functions
  - Classes and their methods
  - Nested functions only if they are likely to be directly fuzzable via an exposed wrapper
- For each symbol, collect:
  - Fully-qualified name
  - Signature details (as available from AST)
  - Decorators
  - Docstring (if present)
  - Location information (file, line range if available)
Generate an initial candidate set using headers and docstrings Prioritize functions/methods that:
- Accept bytes, str, file-like objects, dicts, or user-controlled payloads
- Convert between representations (raw ↔ structured)
- Perform validation, normalization, parsing, decoding, deserialization
- Touch filesystem/network/protocol boundaries
- Call into native extensions or bindings
- Clearly document strictness, schemas, formats, or error conditions
Use tests to refine candidate selection early
- Before reading full bodies, check if tests reference these functions/modules:
  - Direct imports in tests
  - Fixtures that exercise particular entry points
  - Parameterizations over formats and inputs
- Down-rank candidates that are already well-covered unless they are high-risk boundaries (parsers/native bindings).
Confirm with targeted reading only for top candidates For the top candidates (typically 10–20), read the full function bodies and capture:
- Preconditions and assumptions
- Internal helpers called
- Error handling style and exception types
- Any obvious invariants and postconditions
- Statefulness and global dependencies
Rank and shortlist Rank candidates using an explainable rubric:
- Input surface and reachability
- Boundary risk (parsing/decoding/native)
- Structural complexity (from targeted reading only)
- Existing test coverage strength and breadth
- Ease of harnessing

Output format

For each function/method:

qualname
file
line_range (if available)
score (relative)
rationale (2–6 bullets)
dependencies (key helpers, modules, external state)
harnessability (low/medium/high)

Approach 2: Scanning all important files yourself

For each localized Python file, read the file contents directly to extract:

Module docstring and overall structure
Function and async function definitions (name, args, defaults, annotations, decorators)
Class definitions and their methods
Full function bodies and implementation details
Docstrings for modules, classes, and functions/methods

This approach reads complete file contents, allowing for deeper analysis at the cost of higher token usage.

Procedure

Read important files sequentially
- For each file from the important files list, read the full contents.
- Extract by direct inspection:
  - Top-level functions and their complete implementations
  - Classes and their methods with full bodies
  - Nested functions if they are exposed or called by public APIs
- For each symbol, collect:
  - Fully-qualified name
  - Complete signature (from source text)
  - Decorators
  - Docstring (if present)
  - Full function body
  - Location information (file, approximate line range)
Generate an initial candidate set using full source analysis Prioritize functions/methods that:
- Accept bytes, str, file-like objects, dicts, or user-controlled payloads
- Convert between representations (raw ↔ structured)
- Perform validation, normalization, parsing, decoding, deserialization
- Touch filesystem/network/protocol boundaries
- Call into native extensions or bindings
- Contain complex control flow, loops, or recursive calls
- Handle exceptions or edge cases
- Clearly document strictness, schemas, formats, or error conditions
Analyze implementation details from full bodies For each candidate function, inspect the body for:
- Preconditions and assumptions (explicit checks, assertions, early returns)
- Internal helpers called and their purposes
- Error handling style and exception types raised
- Invariants and postconditions (explicit or implicit)
- Statefulness and global dependencies
- Input transformations and data flow
- Native calls or external process invocations
- Resource allocation and cleanup patterns
Use tests to refine candidate selection
- Check if tests reference these functions/modules:
  - Direct imports in tests
  - Fixtures that exercise particular entry points
  - Parameterizations over formats and inputs
- Down-rank candidates that are already well-covered unless they are high-risk boundaries (parsers/native bindings).
- Note which aspects of each function are tested vs untested.
Rank and shortlist Rank candidates using an explainable rubric:
- Input surface and reachability
- Boundary risk (parsing/decoding/native)
- Structural complexity (from full body analysis)
- Existing test coverage strength and breadth
- Ease of harnessing
- Observable implementation risks (unsafe operations, unchecked inputs, complex state)

Output format

For each function/method:

qualname
file
line_range (if available)
score (relative)
rationale (2–6 bullets)
dependencies (key helpers, modules, external state)
harnessability (low/medium/high)
implementation_notes (key observations from body analysis)

Summarize existing unit tests

Goal

Summarize what is already tested, infer test oracles, and identify gaps that fuzzing can complement.

Hard requirement

Always inspect and incorporate existing tests in the repository when:

Ranking functions
Selecting FUTs
Designing input models and oracles
Proposing seed corpus sources

Procedure

Inventory tests
- Locate tests and their discovery configuration (pytest.ini, pyproject.toml, tox.ini, noxfile.py).
- Enumerate test modules and map them to source modules via imports.
- Identify shared fixtures and data factories (conftest.py, fixture files, test utilities).
Summarize test intent For each test module:
- What behaviors are asserted
- What inputs are used
- What exceptions are expected
- What invariants are implied
Infer oracles and properties Common fuzz-friendly oracles include:
- Round-trip properties
- Idempotence of normalization
- Parser consistency across equivalent inputs
- Deterministic output given deterministic input
- No-crash and no-hang for malformed inputs
Identify coverage gaps
- FUT candidates with no direct tests
- Input classes not covered by tests (size extremes, malformed encodings, deep nesting, edge unicode, invalid schemas)
- Code paths guarded by complex conditionals or exception handlers with no tests

Output format

test_map: module_under_test → tests → asserted behaviors
inferred_oracles: list of reusable invariants with the functions they apply to
gaps: ranked list of untested or weakly-tested candidates

Decide function under test for fuzzing

Goal

Select a final set of FUTs (typically 1–5) and produce a structured "note to self" for each FUT so the agent can proceed to harness implementation, corpus seeding, and fuzz execution.

Selection criteria

Prefer FUTs that maximize:

Security/robustness payoff (parsing, decoding, validation, native boundary)
Reachability with minimal setup
Low existing test assurance or narrow test input coverage
High fuzzability (simple input channel, clear oracle or crash-only target)

Explicitly weigh:

What tests already cover (and what they do not)
What seeds can be extracted from tests, fixtures, and sample data

Required "note to self" template

For each selected FUT, produce exactly this structure:

Fuzzing Target Note

Target:
File / location:
Why this target:
- Input surface:
- Boundary or native considerations:
- Complexity or path depth:
- Current test gaps:
Callable contract:
- Required imports or initialization:
- Preconditions:
- Determinism concerns:
- External dependencies:
Input model:
- Primary payload type:
- Decoding or parsing steps:
- Constraints to respect:
- Edge classes to emphasize:
Oracles:
- Must-hold properties:
- Acceptable exceptions:
- Suspicious exceptions:
- Crash-only vs correctness-checking:
Harness plan:
- Recommended approach:
- Minimal harness signature:
- Seed corpus ideas:
- Timeouts and resource limits:
Risk flags:
- Native extension involved:
- Potential DoS paths:
- External I/O:
Next actions: 1. 2. 3. 4.

Output format

selected_futs: list of chosen FUTs with brief justification
notes_to_self: one "Fuzzing Target Note" per FUT

Final JSON block

At the end of the report, include a JSON object with:

important_files: [{path, score, rationale, indicators_hit}]
important_functions: [{qualname, file, line_range, score, rationale, dependencies, harnessability}]
test_summary: {test_map, inferred_oracles, gaps}
selected_futs: [{qualname, file, line_range, justification}]
notes_to_self: [{target_qualname, note}]

More by benchflow-ai

View all

fjsp-baseline-repair-with-downtime-and-policy

230

Repair an (often imperfect) Flexible Job Shop Scheduling baseline into a downtime-feasible, precedence-correct schedule while staying within policy budgets and matching the evaluator’s exact metrics and “local minimal right-shift” checks.

temporal-python-testing

230

Test Temporal workflows with pytest, time-skipping, and mocking strategies. Covers unit testing, integration testing, replay testing, and local development setup. Use when implementing Temporal workflow tests or debugging test failures.

locational-marginal-prices

230

Extract locational marginal prices (LMPs) from DC-OPF solutions using dual values. Use when computing nodal electricity prices, reserve clearing prices, or performing price impact analysis.

Managed Package Architecture

230

This skill should be used when the user asks to "design package structure", "create managed package", "configure 2GP", "set up namespace", "version management", or mentions managed package topics like "LMA", "subscriber orgs", or "package versioning". Provides comprehensive guidance for second-generation managed package (2GP) architecture, ISV development patterns, and package lifecycle management.