Create bibliography citations from web page URLs with automatic Wayback Machine archival and metadata extraction. Use when the user asks to cite a website, create a citation for a URL, archive and cite a web page, or generate a bibliography entry from a web address.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: bib-cite-web description: Create bibliography citations from web page URLs with automatic Wayback Machine archival and metadata extraction. Use when the user asks to cite a website, create a citation for a URL, archive and cite a web page, or generate a bibliography entry from a web address.
Web Page Citation Creator
Create bibliography citations from web page URLs with automatic archival snapshot and metadata extraction.
Features
- Wayback Machine Integration: Automatically submits URLs to the Internet Archive for preservation
- Metadata Extraction: Extracts title, author, description, site name, and publish date from semantic HTML
- Multiple Formats: Outputs citations in BibTeX or CSL JSON format
- Smart Citation Keys: Generates citation keys from domain + author + year
Usage
npx tsx plugins/bib/scripts/cite-web.ts <url>
npx tsx plugins/bib/scripts/cite-web.ts <url> --format=bibtex
npx tsx plugins/bib/scripts/cite-web.ts <url> --no-wayback
npx tsx plugins/bib/scripts/cite-web.ts <url> --output=citations.bib
Metadata Extraction
The script extracts metadata from semantic HTML tags:
Title
<title>tag- Open Graph:
<meta property="og:title"> - Twitter Card:
<meta name="twitter:title"> - Standard:
<meta name="title">
Author
<meta name="author">- Open Graph:
<meta property="og:author">or<meta property="article:author"> - Twitter Card:
<meta name="twitter:creator">
Description
<meta name="description">- Open Graph:
<meta property="og:description"> - Twitter Card:
<meta name="twitter:description">
Site Name
- Open Graph:
<meta property="og:site_name"> <meta name="application-name">
Published Date
- Open Graph:
<meta property="article:published_time"> <meta name="publish-date">or<meta name="date">
Arguments
- Positional argument: URL to cite
--file <path>: Read URL from file (uses first line)--format <format>: Output format (default: bibtex)bibtexorbib: BibTeX formatcsl,json, orcsl-json: CSL JSON format
--no-wayback: Skip Wayback Machine submission (faster, but no archive)--output <file>: Write output to file (default: stdout)
Output Formats
BibTeX
@online{smithexample2024,
author = {John Smith},
title = {Example Article Title},
url = {https://example.com/article},
urldate = {2024-03-15},
year = {2024}
}
CSL JSON
[
{
"id": "smithexample2024",
"type": "webpage",
"title": "Example Article Title",
"author": [{"literal": "John Smith"}],
"URL": "https://example.com/article",
"accessed": {"date-parts": [[2024, 3, 15]]},
"archive-url": "https://web.archive.org/web/20240315123456/https://example.com/article"
}
]
Examples
Basic citation
npx tsx plugins/bib/scripts/cite-web.ts "https://example.com/article"
Output:
@online{example2024,
title = {Example Article Title},
url = {https://example.com/article},
urldate = {2024-03-15}
}
With Wayback archival
npx tsx plugins/bib/scripts/cite-web.ts "https://blog.example.com/post"
Output includes archive URL:
@online{example2024,
title = {Blog Post Title},
url = {https://blog.example.com/post},
urldate = {2024-03-15},
note = {Archived at https://web.archive.org/web/20240315123456/...}
}
CSL JSON format
npx tsx plugins/bib/scripts/cite-web.ts "https://docs.example.com" --format=csl
Skip archival (faster)
npx tsx plugins/bib/scripts/cite-web.ts "https://example.com" --no-wayback
Save to file
npx tsx plugins/bib/scripts/cite-web.ts "https://example.com" --output=citations.bib
Batch processing
# Create file with URLs (one per line)
echo "https://example.com/article1" > urls.txt
# Cite each URL
while read url; do
npx tsx plugins/bib/scripts/cite-web.ts "$url" >> citations.bib
done < urls.txt
Citation Key Generation
Citation keys are automatically generated from:
- Domain name:
example.comβexample - Author (if available):
John Smithβsmith - Year: Archive date or publish date or current year
Examples:
https://blog.example.com/postby John Smith (2024) βsmithexample2024https://example.com/article(no author, 2023) βexample2023
Wayback Machine Integration
By default, the script submits URLs to the Internet Archive's Wayback Machine for preservation:
- Submission: Sends URL to
https://web.archive.org/save/<url> - Archive URL: Extracts the permanent archive URL from response
- Archive Date: Records the snapshot timestamp
- Fallback: If submission fails, continues without archive
The archive URL is included in the citation:
- BibTeX: In
notefield or customarchiveurl/archivedatefields - CSL JSON: In
archive-urlfield
Skip archival with --no-wayback for faster execution when archiving isn't needed.
Error Handling
The script handles various error scenarios:
- Invalid URL: Validates URL format before processing
- Fetch failures: Reports HTTP errors with status codes
- Missing metadata: Falls back to "Untitled" for missing titles
- Wayback failures: Continues without archive if submission fails
- No author: Omits author field if not found
Errors are written to stderr, while citations are written to stdout (or file).
Limitations
- JavaScript-heavy sites: May not extract metadata from dynamically rendered content
- Paywalls: Cannot access content behind authentication
- Rate limiting: Wayback Machine may rate-limit submissions
- No PDF support: Only HTML pages (use separate tool for PDFs)
- Simple parsing: Uses regex matching, not full DOM parsing
For complex pages or JavaScript-rendered content, consider:
- Using
--no-waybackto skip archival - Manually editing the citation after generation
- Using browser developer tools to inspect metadata tags
Related Skills
- bib-create: Create bibliography entries interactively
- bib-read: View existing bibliography entries
- bib-convert: Convert between bibliography formats
- wayback-submit: Submit URLs to Wayback Machine without citation generation
More by Mearman
View allScan project dependencies for known vulnerabilities. Automatically detect and parse package files (package.json, requirements.txt, Gemfile, go.mod, pom.xml) and check all dependencies against the CVE database. Use when you want to audit a project for security vulnerabilities, check if dependencies have known CVEs, or generate a vulnerability report for compliance.
Convert LaTeX to Markdown format. Use when the user asks to convert, transform, or change LaTeX files to Markdown, or mentions converting .tex files to .md files.
Analyze npm package quality using NPMS.io scores for quality, popularity, and maintenance. Use when the user asks for package quality analysis, NPMS scores, or package evaluation metrics.
Search for Common Vulnerabilities and Exposures (CVEs) by ID (e.g., CVE-2024-1086) or by product name (e.g., OpenSSL, Apache Tomcat). Get detailed vulnerability information including severity scores, affected software versions, and references. Use when the user wants to look up CVE information, check if a product has known vulnerabilities, or research security issues.
