content-analysis

@liangdabiao/content-analysis

liangdabiao

109

20 forks

Updated 1/18/2026

View on GitHub

Analyze text content using both traditional NLP and LLM-enhanced methods. Extract sentiment, topics, keywords, and insights from various content types including social media posts, articles, reviews, and video content. Use when working with text analysis, sentiment detection, topic modeling, or content optimization.

Installation

$skills install @liangdabiao/content-analysis

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositoryliangdabiao/claude-data-analysis-ultra-main

Path.claude/skills/content-analysis/SKILL.md

Branchmain

Scoped Name@liangdabiao/content-analysis

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions

name: content-analysis description: Analyze text content using both traditional NLP and LLM-enhanced methods. Extract sentiment, topics, keywords, and insights from various content types including social media posts, articles, reviews, and video content. Use when working with text analysis, sentiment detection, topic modeling, or content optimization. allowed-tools: Read, Write, Edit, Bash, Grep, Glob

Content Analysis Skill

Analyze text content using advanced NLP techniques and LLM-powered insights to extract sentiment, topics, and actionable intelligence from various content sources.

Quick Start

This skill helps you:

Analyze sentiment using both traditional NLP and LLM methods
Extract topics and keywords from large text datasets
Classify and cluster content automatically
Identify viral content patterns and characteristics
Generate content insights and recommendations
Support multiple languages and content formats

When to Use

Social Media Analysis: Facebook, Twitter, Instagram, Weibo posts
Content Marketing: Blog posts, articles, marketing copy analysis
Video Content: YouTube titles, descriptions, comments analysis
Product Reviews: Amazon, e-commerce customer feedback
News Analysis: Article categorization, sentiment tracking
Customer Feedback: Support tickets, surveys, reviews analysis

Key Requirements

Traditional NLP Analysis

pip install pandas numpy matplotlib seaborn nltk scikit-learn wordcloud

LLM-Enhanced Analysis (Optional)

pip install openai dashscope  # For OpenAI and Qwen API access

Setup NLTK Data

import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')
nltk.download('stopwords')

Core Workflow

1. Data Preparation

Your data should include:

Text Content: Main text to analyze (titles, descriptions, comments, etc.)
Metadata: Optional (author, date, category, engagement metrics)
Multiple Languages: Support for English, Chinese, and other languages

2. Analysis Process

Text Preprocessing: Clean, tokenize, and normalize text
Sentiment Analysis: Traditional VADER + LLM-enhanced analysis
Topic Extraction: TF-IDF keywords + LLM semantic topics
Content Classification: Automated categorization and clustering
Pattern Recognition: Identify viral content characteristics
Insight Generation: Actionable recommendations

3. Output Deliverables

Sentiment analysis reports with confidence scores
Topic models and keyword extractions
Content classification results
Viral content pattern analysis
Optimization recommendations

Example Usage Scenarios

Social Media Content Analysis

# Analyze Twitter posts for brand sentiment
# Identify trending topics and hashtags
# Measure engagement patterns

YouTube Video Analysis

# Analyze video titles and descriptions
# Extract topics from comments
# Identify viral content patterns

Product Review Analysis

# Analyze customer feedback sentiment
# Extract product feature mentions
# Identify improvement opportunities

Key Analysis Methods

Traditional NLP Techniques

VADER Sentiment Analysis: Rule-based sentiment scoring
TF-IDF Keyword Extraction: Statistical term importance
Text Clustering: K-means and hierarchical clustering
Word Frequency Analysis: Term frequency and co-occurrence
Language Detection: Automatic language identification

LLM-Enhanced Analysis

Context-Aware Sentiment: Nuanced emotion understanding
Semantic Topic Extraction: Meaning-based topic identification
Content Summarization: Automatic text summarization
Multi-Language Support: Cross-lingual analysis
Zero-Shot Classification: Categorization without training data

Advanced Analytics

Time Series Analysis: Content trends over time
Engagement Prediction: Predict viral potential
Competitive Analysis: Compare content performance
Audience Insights: Demographic and preference analysis

Common Business Questions Answered

What is the overall sentiment toward our brand?
Which topics are trending in our industry?
What makes content go viral?
How does sentiment vary by demographic or region?
What are customers saying about our products?
Which content formats perform best?

Integration Examples

See examples/ directory for:

basic_content_analysis.py - Traditional NLP analysis
llm_enhanced_analysis.py - LLM-powered analysis
social_media_analysis.py - Social media specific analysis
Sample datasets for testing

LLM Configuration

Supported LLM Providers

OpenAI: GPT-3.5, GPT-4 models
Qwen (通义千问): Chinese-optimized models
Open Source: Local models via HuggingFace

API Setup Examples

# OpenAI Configuration
import openai
openai.api_key = 'your-api-key'

# Qwen Configuration
import dashscope
dashscope.api_key = 'your-api-key'

Best Practices

Data Quality: Ensure clean, consistent text data
Sampling Strategy: Use representative samples for LLM analysis
Cost Management: Balance traditional NLP with LLM calls
Language Handling: Configure appropriate language models
Validation: Cross-validate sentiment analysis results
Privacy: Ensure compliance with data protection regulations

Performance Optimization

For Large Datasets

Use data sampling for LLM analysis
Implement batch processing
Cache LLM responses when possible
Use traditional NLP for initial filtering

Cost Management

Prioritize important content for LLM analysis
Use traditional NLP for bulk processing
Implement smart sampling strategies
Monitor API usage and costs

Advanced Features

Real-time Analysis: Stream processing for live content
Multi-modal Analysis: Text + image + video content
Custom Models: Fine-tune models for specific domains
Integration APIs: Connect with content management systems
Automated Reporting: Scheduled analysis and reporting

Troubleshooting

Common Issues

Low Sentiment Accuracy: Check language settings and text preprocessing
High API Costs: Optimize sampling and caching strategies
Slow Processing: Implement parallel processing and batching
Language Support: Ensure appropriate models for non-English content

Performance Tips

Pre-process text data effectively
Use appropriate model sizes for tasks
Implement result caching
Monitor resource usage and optimize

More by liangdabiao

View all

synthesizer

163

将多个研究智能体的发现综合成连贯、结构化的研究报告。解决矛盾、提取共识、创建统一叙述。当多个研究智能体完成研究、需要将发现组合成统一报告、发现之间存在矛盾时使用此技能。

question-refiner

163

将原始研究问题细化为结构化的深度研究任务。通过提问澄清需求，生成符合 OpenAI/Google Deep Research 标准的结构化提示词。当用户提出研究问题、需要帮助定义研究范围、或想要生成结构化研究提示词时使用此技能。

citation-validator

163

验证研究报告中所有声明的引用准确性、来源质量和格式规范性。确保每个事实性声明都有可验证的来源，并提供来源质量评级。当最终确定研究报告、审查他人研究、发布或分享研究之前使用此技能。

got-controller

163

Graph of Thoughts (GoT) Controller - 管理研究图状态，执行图操作（Generate, Aggregate, Refine, Score），优化研究路径质量。当研究主题复杂或多方面、需要策略性探索（深度 vs 广度）、高质量研究时使用此技能。