ModernBERT in Action: A Hands-On Guide to Zero-Shot NLP Tasks

ModernBERT is a robust and versatile language model designed for natural language understanding tasks. It leverages the power of transformer architecture to provide state-of-the-art performance in various NLP applications. In this post, we will illustrate several tasks that can be performed using the ModernBERT model, including masked language modeling, feature extraction, sentence similarity, and next-word prediction. These tasks will showcase the capabilities of ModernBERT without requiring any fine-tuning, highlighting its effectiveness in handling diverse language processing challenges right out of the box.

Install Dependencies

!uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
!uv pip install git+https://github.com/huggingface/transformers.git
!uv pip install skikit-learn

Masked Language Modeling (Fill-Mask Task)

This example demonstrates ModernBERT’s contextual understanding capabilities, showing how the model can predict missing words based on surrounding context.

flowchart LR
    A["Input"] --> B["ModernBERT<br/>Model"]
    B --> C["Contextual<br/>Analysis"]
    C --> D["Multiple<br/>Predictions"]
    D --> E["Ranked Results:<br/>Paris (92.33%)<br/>Lyon (3.59%)<br/>Nancy (2.31%)"]

    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e8f5e8
    style E fill:#fff8e1

The key aspects illustrated include:

Multi-candidate Prediction: The model provides multiple predictions ranked by confidence scores
High Accuracy: Achieves 92.33% confidence for the correct answer in context
Zero-shot Performance: Works without any fine-tuning on the specific task
Bidirectional Understanding: Leverages both left and right context to make predictions

from transformers import pipeline

# Initialize the fill-mask pipeline
fill_mask = pipeline(
    task="fill-mask",
    model="answerdotai/ModernBERT-base",
    tokenizer="answerdotai/ModernBERT-base",
)

#Example masked text
masked_text = "The capital of France is [MASK]."

#Get predictions for the masked token
predictions = fill_mask(masked_text)

# Display predictions
print("Masked Text:", masked_text)
print("Predictions:")

for pred in predictions:
    print(f"  - {pred['sequence']} (score: {pred['score']:.4f})")

Output

Masked Text: The capital of France is [MASK].

Predictions:
The capital of France is Paris. (score: 0.9233)
The capital of France is Lyon. (score: 0.0359)
The capital of France is Nancy. (score: 0.0231)
The capital of France is Nice. (score: 0.0062)
The capital of France is Orleans. (score: 0.0026)

Feature Extraction

This example showcases ModernBERT’s ability to encode semantic meaning into numerical representations.

The key aspects demonstrated include:

Numerical Representation: Converts text into dense numerical vectors (embeddings)
Dimensionality Transparency: Shows the output shape for understanding data structure
Versatility: These features serve as input for downstream machine learning tasks
Sentence-level Encoding: Demonstrates meaningful representation of entire sentences

from transformers import pipeline

# Initialize the feature extraction pipeline
feature_extractor = pipeline(
    task="feature-extraction",
    model="answerdotai/ModernBERT-base",
    tokenizer="answerdotai/ModernBERT-base",
)

# Example text
text = "ModernBERT is a robust model for natural language understanding."

# Extract features
features = feature_extractor(text)

# Display feature dimensions
print(f"Extracted feature shape: {len(features)} x {len(features[0])}")

Output

Extracted feature shape: 1 x 14

Sentence Similarity

This example illustrates ModernBERT’s capability to measure semantic relationships between different pieces of text.

flowchart LR
    A["Sentence 1:<br/>ModernBERT is a great<br/>language model."] --> C["Feature<br/>Extraction"]
    B["Sentence 2:<br/>ModernBERT excels in<br/>understanding language."] --> C
    C --> D["Embedding 1<br/>(Vector)"]
    C --> E["Embedding 2<br/>(Vector)"]
    D --> F["Cosine<br/>Similarity"]
    E --> F
    F --> G["Similarity Score:<br/>0.9572<br/>(95.72% similar)"]

    style A fill:#e1f5fe
    style B fill:#e1f5fe
    style C fill:#fff3e0
    style D fill:#f3e5f5
    style E fill:#f3e5f5
    style F fill:#e8f5e8
    style G fill:#fff8e1

The key aspects highlighted include:

Semantic Relationship Measurement: Uses extracted embeddings to compute similarity between sentences
Quantitative Analysis: Employs standard similarity metrics (cosine similarity) for measurable results
High Correlation Detection: Achieves 0.9572 similarity score for semantically related sentences
Practical Application: Shows real-world usage for similarity tasks and semantic search

from transformers import pipeline
from sklearn.metrics.pairwise import cosine_similarity

# Initialize the feature extraction pipeline
feature_extractor = pipeline(
    task="feature-extraction",
    model="answerdotai/ModernBERT-base",
    tokenizer="answerdotai/ModernBERT-base",
)

# Example sentences
sentence_1 = "ModernBERT is a great language model."
sentence_2 = "ModernBERT excels in understanding language."

# Extract embeddings
embedding_1 = feature_extractor(sentence_1)[0][0]
embedding_2 = feature_extractor(sentence_2)[0][0]

# Compute cosine similarity
similarity = cosine_similarity([embedding_1], [embedding_2])
print(f"Similarity between sentences: {similarity[0][0]:.4f}")

Output

Similarity between sentences: 0.9572

Next Word Prediction

This example demonstrates ModernBERT’s generative capabilities and context-aware text completion.

The key aspects shown include:

Context-aware Generation: Predicts appropriate continuations based on preceding context
Domain Adaptation: Shows ability to predict domain-specific terms and technical vocabulary
Sequential Understanding: Demonstrates understanding of sentence structure and logical flow
Ranking Capabilities: Provides confidence scores for different prediction options

from transformers import pipeline

# Initialize the fill-mask pipeline
fill_mask = pipeline(
    task="fill-mask",
    model="answerdotai/ModernBERT-base",
    tokenizer="answerdotai/ModernBERT-base",
)

# Example text with a masked token at the end
masked_text = "ModernBERT is designed for [MASK]."

# Get predictions
predictions = fill_mask(masked_text)

# Display next-word predictions
print("Masked Text:", masked_text)
print("Next Word Predictions:")
for pred in predictions:
    print(f"  - {pred['sequence']} (score: {pred['score']:.4f})")

Conclusion

Four approaches were presented showing how ModernBERT can be used to perform various natural language processing tasks without requiring any fine-tuning. Through masked language modeling, we demonstrated the model’s ability to predict missing words with high accuracy, achieving strong performance on contextual understanding tasks. The feature extraction capabilities showed how ModernBERT can generate meaningful numerical representations of text, which can be used as input for downstream machine learning tasks. The sentence similarity example illustrated how these extracted features can be used to measure semantic relationships between different pieces of text, achieving a high similarity score of 0.9572 for semantically related sentences. Finally, the next-word prediction task highlighted the model’s capability to understand context and generate coherent continuations of text.

These examples showcase ModernBERT as a general-purpose language model that can handle diverse NLP tasks with minimal setup, making it accessible for both research and practical applications while maintaining high performance across different domains.