Hashtag Jakarta EE #324

Hashtag Jakarta EE #324

Welcome to issue number three hundred and twenty-four of Hashtag Jakarta EE!

Last week, I was at JavaLand 2066. It was my eleventh time at this conference. This year, it was back in a theme park again with almost 1500 attendees registered. Next week, I will go to California for JavaOne which is located for the second time at the Oracle conference center in Redwood Shores. I am not going to be a speaker this year, but in some way I am anyway since I will be hosting a mentoring session in the JavaOne Mentorship Hub. After that, I will also be present at Voxxed Days Amsterdam where Jakarta EE will have a booth in the Community Square.

Jakarta EE 12 moves along. The release is planned for Q4, 2026, but there is no reason to wait until then to try out some of the new features. Jakarta Persistence 4.0 has released a Milestone 1 that is implemented in Hibernate 8.0.0.alpha1. I am also working on a SkillsJar for Jakarta EE. Of course, I will let you know immediately when I have it ready.

If you are interested in attending Open Community eXperience in Brussels this April, please reach out to me for a 30% discount code. This is an opportunity to meet the rest of the quite diverse Eclipse Foundation community.

Ivar Grimstad


WBS Review Process: 1 Missing Task Out of 1000 Ruins the Project

“This… no one made it?”

A comment that came out during payment system integration testing, 3 days before launch.

Completed over 1000 tasks,
But just one – payment failure rollback handling – wasn’t in WBS.
That one caused 2-week launch delay, and 2 billion won in marketing costs flew away.

Post-analysis results were absurd.
Had 3 review meetings, but everyone thought “someone must have checked it.”
Did reviews, but there was no review process.

A WBS that seemed perfect at first,
But when launch approaches, missing tasks are discovered.
“Why did we miss this?” and regret.

According to NASA research, projects with systematic WBS reviews have 89% success rate,
While those without are only 34%.
Review isn’t optional but essential.

Why Do We Miss Important Things?

1. The Trap of Confirmation Bias

We see what’s there, but don’t see what’s not.

If “login feature” is in WBS, we’re relieved. But “password reset”, “account unlock”, “social login disconnect”? These peripheral features are easily missed.

2. Expert’s Blind Spot

“Too obvious, didn’t need to write it down.”

Senior developers make this mistake more. What’s obvious to them is unknown territory to juniors.

3. Paradox of Responsibility Diffusion

More reviewers means missing more. Because of bystander effect “others must have seen it.”

4-Layer Review System: Verify from Multiple Angles

Intel’s chip design team reviews the same WBS four times from different perspectives. Seems bothersome, but this method reduced defect rate to 0.001%.

Layer 1: Completeness Review

Purpose: Check if no tasks are missing

Reviewers: Product Manager, Business Analyst

Checklist:

  • Are all requirements mapped to WBS?
  • Are there tasks for each deliverable?
  • Are non-functional requirements (performance, security) included?

Layer 2: Technical Review

Purpose: Verify technical feasibility and dependencies

Reviewers: Tech Lead, Architect

Checklist:

  • Are technical dependencies correct?
  • Are all necessary technical tasks present?
  • Are risk mitigation tasks included?

Layer 3: Feasibility Review

Purpose: Verify if realistically executable

Reviewers: Project Manager, Team Lead

Checklist:

  • Is schedule realistic?
  • Is resource allocation appropriate?
  • Are there bottlenecks?

Layer 4: Cross Validation

Purpose: Check conflicts with other projects

Reviewers: PMO, Other Project PMs

Checklist:

  • Are there shared resource conflicts?
  • Are there dependencies on other projects?
  • Does it comply with organizational standards?

Review Techniques: 7 Ways to Find Missing Items

1. Backward Tracing

Start from final deliverable and trace backward:

Deployed System
  ← Deployment Task
    ← Test Complete
      ← Integration Test
        ← Unit Test
          ← Code Implementation
            ← Design
              ← Requirements Analysis

Ask “What’s needed for this?” at each step.

2. Scenario Walkthrough

Follow actual usage scenarios to verify:

“User login → Product search → Add to cart → Payment → Delivery tracking”

Verify all necessary tasks are in WBS at each step.

3. 5W1H Check

Ask questions for all tasks:

  • What: What are we building?
  • Why: Why is it needed?
  • Who: Who does it?
  • When: When should it be done?
  • Where: Where is it deployed?
  • How: How is it implemented?

If there’s a question without an answer, a task is missing.

4. Interface Analysis

Check all connection points:

Frontend ↔ Backend: API definition, authentication, error handling
Backend ↔ Database: Schema, migration, backup
System ↔ External: External API, webhook, callback

Each interface needs at least 3 tasks (definition, implementation, testing).

5. Negative Testing

Ask “What if this fails?”:

  • If login fails? → Need error handling task
  • If payment fails? → Need rollback task
  • If server goes down? → Need recovery task

6. Checklist Comparison

Compare with standard checklist:

## Essential Web Project Tasks

- [ ] Security (authentication, authorization, encryption)
- [ ] Performance (caching, optimization, CDN)
- [ ] Monitoring (logging, metrics, alerts)
- [ ] Backup/recovery
- [ ] Documentation
- [ ] Deployment automation

7. Delphi Technique

Experts review independently then synthesize results:

  1. Each independently writes missing task list
  2. Collect anonymously
  3. Share entire list
  4. Re-review and reach consensus

Review Tools and Templates

WBS Review Checklist

## WBS Review Checklist v2.0

### Structure Validation

- [ ] Do all levels satisfy 100% rule?
- [ ] Is task breakdown level consistent?
- [ ] Are task IDs systematic?

### Content Validation

- [ ] Are all requirements covered?
- [ ] Are non-functional requirements included?
- [ ] Are risk response tasks present?

### Dependency Validation

- [ ] No circular dependencies?
- [ ] External dependencies specified?
- [ ] Critical path identified?

### Resource Validation

- [ ] Clear assignee per task?
- [ ] No resource conflicts?
- [ ] Backup resources planned?

### Schedule Validation

- [ ] Buffer included?
- [ ] Milestones realistic?
- [ ] Holidays/vacations considered?

Review Record Template

## WBS Review Record

**Project**: [Project Name]
**Review Date**: 2024-XX-XX
**Reviewers**: [Names]

### Issues Found

1. [Issue Description] - Severity: High/Medium/Low
2. ...

### Added Tasks

- WBS ID: Description

### Modified Tasks

- WBS ID: Change Content

### Next Review Schedule

- Date:
- Attendees:

Review Process Automation

1. Auto Validation Script

def validate_wbs(wbs_data):
    errors = []

    # 100% rule validation
    for parent in wbs_data.get_parents():
        if not parent.is_fully_decomposed():
            errors.append(f"Incomplete: {parent.id}")

    # Circular dependency validation
    if has_circular_dependency(wbs_data):
        errors.append("Circular dependency detected")

    # Missing assignee validation
    for task in wbs_data.get_tasks():
        if not task.assignee:
            errors.append(f"No assignee: {task.id}")

    return errors

2. Review Dashboard

Monitor WBS status in real time:

  • Completeness: 87%
  • Review Status: 3/4 layers complete
  • Issues Found: 12
  • Issues Resolved: 8

Key Summary

Some PMs say “No time to review.”

But 1 hour of review prevents 100 hours of rework.
Review isn’t time waste but time savings.

There’s no perfect WBS.
But can achieve 99.9% completeness through systematic review.
That 0.1% difference determines project success or failure.

One way to improve your starting point is using AI-powered task breakdown. When you input specs or requirements, AI can automatically decompose them into tasks with time estimates — giving reviewers a more complete WBS to start with, rather than building from scratch.

Introduce 4-layer review system from today.
Will seem bothersome at first, but will soon realize its value.

Need systematic WBS management with AI-powered task breakdown? Check out Plexo.

LinguaCam live: AI-Translated Global Captions, Unified Dynamic Chat, and Interactive Stream Widgets.

LinguaCam live:Professional OBS Overlay Suite: AI-Translated Global Captions, Unified Dynamic Chat, and Interactive Stream Widgets.

LinguaCam Live

Experience floating wave danmu and live translated captions in your webcam streams

favicon
lingua-cam-live.vercel.app

From just being an overlay with only two features.

We want to build a platform that utilized bullet chat and live captions to create a streamlined overlay for OBS. The original vision was a simple theme setup for streamers but as development progressed the project evolved into a real-time interaction dashboard. I am obsessed with danmu chat because in most streams if you dont speak the streamer’s language or cant keep up with the fast vertical chat, you are basically invisible. I wanted to build a platform where the audience’s voice becomes a first class part of the video feed.

The vision was clear, yet two primary hurdles stood in my ways…:

The Latency Trap: Captions and bullet chats lose all value if they suffer from delays.

The Clutter Issue: Managing hundreds of messages flowing from right to left without slaughtering and processing the content was important for us.

This project was never about building another video player to show to your Google Meet buddies. It was about building a translation and interaction layer. The world is full of amazing content locked behind language barriers. Many creators provide high value in this economy but remain shadowed by technical difficulties or a lack of time to engage globally. This tool was designed to bridge that gap and open up those walled gardens.

The Ear.

Without this: You wait 10 seconds. You see the chat react. You reply. By then the vibe has moved on and you look like you are reacting to something from the past.

Built a custom React hook called useYouTubeChat. It bypasses heavy polling (asking for data repeatedly) by creating a stream of raw data from the Youtube API. It’s an always-on listener that catches messages the millisecond they are posted.

The translatorrrr…

This is our middleware. Before a message hits the UI, its intercepted by a translator utility. It processes the text in-flight ensuring that by the time the data packet reaches the viewer it has already been localized.

The fast lane.

We use Socket.io (WebSockets). it keep a live wire open between the server and the browser. This cuts the round trip time ensuring that the bullet chat stays perfectly synced with the live action.

The lane was smooth felt like magic.

how the page hooks into the Socket.io instance and the YouTube listener simultaneously. It manages a delicate lifecycle: capturing a message passing it through the translation engine and then firing it into the WebSocket relay so that every connected client sees the same “bullet” at the exact same millisecond.

What makes this module work is its event-driven nature. Rather than relying on heavy re-renders or global state stores that would lag under the pressure of a fast-moving chat it uses a localized push mechanism. When useYouTubeChat detects a new entry the module triggers a chain reaction:

Validation: The module strips away heavy, useless metadata. It keeps only what matters (User, Text and Timestamp). This keeps the packet size tiny so it travels faster across the internet.

Transformation: It invokes the translator utility. This happens asynchronously, meaning the rest of the app doesnt have to freeze while waiting for the translation to finish.

Emission: Once the message is ready, it is emitted via Socket.io. This is a broadcast. It tells every single person watching the stream: “Hey 123, render this specific message at this exact millisecond.”

By centralizing this logic in a single high level controller, the application maintains a single source of truth. This prevents the “Clutter Issue” mentioned earlier as the page can choke or prioritize messages before they ever hit the rendering engine ensuring the stream remains readable even during peak activity.

The Reality Check.

The initial vision for the dashboard was a clean reactive masterpiece but the first time the system encountered a high traffic stream the perfect design broke. the clutter issue wasnt just visual it was computational. the primary point of failure was the state synchronization. In a standard react app updating state is trivial but in a live dashboard receiving 20 messages per second every state update triggers a rerender. initially the entire dashboard was rerendering every time a single bullet appeared. this caused the video feed to stutter and the ui to become unresponsive the exact latency trap Iwas trying to avoid. Another reality check came from the hydration race condition. I attempted to use a sophisticated notification library to show top donors or new subscribers but because the websockets were firing before the next.js client-side hydration was complete the app would crash or throw mismatch errors. To survive the finish line i had to pivot to a functional over fancy mindset. I stripped away the heavy state management and moved to a ref-based queue system for the danmu. I also replaced the complex notification components with a simplified showtoast utility that used a basic alert or a raw dom injection. It wasnt the polished ui I envisioned on day one but it was the only way to ensure the platform didnt collapse under the weight of its own data.

Trade offs:

In the race to build a functional prototype certain long-term optimizations were sacrificed. The most significant trade off is the client side translation. currently every users browser handles its own translation calls. While this works for a demo a production scale app would move this to a server side worker or an edge function to protect api keys and reduce the cpu load on the viewers device. Another accepted debt is the memory management of the message queue. Right now the app keeps a running list of chat history in memory. For a short stream this is fine but for a 24/7 broadcast it would eventually lead to a memory leak. The immediate fix was to focus on the now but the long-term solution requires a robust garbage collection logic to prune old messages.

The Debugging War Story: The Hydration Race Condition

The most frustrating bug involved the websocket connecting before next.js had fully hydrated the page. Because the socket was ready to receive data faster than react was ready to render it the app would attempt to inject danmu into a dom that didnt technically exist yet. The fix was a specific unglamorous check inside a useeffect to ensure the component was fully mounted before the socket listeners were allowed to fire. Its a simple guard clause but it was the difference between a white screen of death and a working dashboard.

Validation: How to Verify the Flow Locally

to verify this project on your own machine clone the repository and run npm install. for environment setup add your youtube api key and lingo.dev api key to a .env.local file. launch the dev server with npm run dev and navigate to the /live route for the live test. input a video id using any active youtube live id. if the architecture is working you will see the console log the connection and translated messages should begin flying across the overlay within seconds.

The Missing Piece: What to Build Next

if someone forks this repository today the most obvious missing piece is dynamic sentiment styling. the next step for this project is to analyze the mood of the incoming text. if the community is hyped the danmu bullets should change color to a vibrant red or gold and increase in velocity. if the chat is calm they should turn blue and slow down. this would turn the overlay from a simple message board into a living pulse of the creators audience.

again and again LinguaCam Live is more than just a dashboard, it is a high performance interactive bridge between a streamer and a global audience. It is designed to be pulled into OBS as a browser source to turn a basic video feed into a professional AI-powered broadcast.

  1. The Communication Core (Breaking Language Barriers)

AI-Translated Captions: It uses Lingo.dev to listen to the streamer’s voice and turn it into English captions instantly.

Multilingual Support: It doesn’t just transcribe; it translates. This allows a creator speaking any language to be understood by a global English-speaking audience in real-time.

YouTube Sync: It hooks into your YouTube Live chat, pulling comments out of the side window and into the actual video feed.

  1. The Visual Experience (Wave Danmu & FX)

Wave Danmu: Instead of boring, static text, chat messages move in a fluid, “sinus-based” wave across the screen.

Collision-Free Logic: It uses an 8-lane vertical positioning system. This ensures that even if 100 people chat at once, the messages never overlap or hide the streamer’s face.

Cinematic FX: Streamers can apply 20+ filters (like film grain, retro, or vibrant boosts) directly to their camera feed within the browser.

  1. The Interaction Suite (Audience Engagement)

Sticker Reaction Pop: Viewers can trigger emoji explosions (😊❤️🔥) that pop up on the stream overlay.

Voice Sounds: The streamer can trigger specific sound effects (SFX) using custom voice commands, making the broadcast feel like a high-budget TV show.

Quick Chat: A one-tap button system for the audience to engage instantly without typing long sentences.

  1. The Technical Edge (Speed & Control)

Live Pipeline: Everything happens with sub-100ms latency. This “ultra-low latency” is the “Ear” we discussed — it ensures the translation and the chat happen at the exact same time as the video.

Smart Focus: The software uses automatic pan-zoom framing to keep the streamer centered and in focus, mimicking a professional cameraman.

Setup APIs: A simple, secure control center to link your YouTube and Lingo.dev keys without needing to be a coding expert.

Thank you

🚀 Day 16 of My Automation Journey – Installing Java, Eclipse & Setting Up Maven for Selenium

Welcome back to Day 16 of My Automation Journey! ☕💻

In the previous days, I focused on Java fundamentals like:

🔐 Encapsulation
📦 Packages
🧩 Access Modifiers
🔁 Method Overriding

But before writing Selenium automation scripts, we need to prepare our development environment properly.

So today’s goal was simple but important:

⚙️ Install and configure the tools required for Selenium Automation

🧰 Tools Required for Selenium Automation

Before writing our first automation script, we need the following tools.

Tool             Purpose
☕ Java (JDK)   Programming language used for Selenium
💻 Eclipse IDE     Writing and managing automation code
📦 Maven   Dependency management & project structure
🤖 Selenium    Automation library for browser testing

Setting these up correctly helps avoid environment issues later.

☕ Step 1 – Install Java (JDK)

Selenium with Java requires the Java Development Kit (JDK).

📥 Download JDK

Download the latest LTS version such as:

JDK 17

JDK 21

After downloading, run the installer and complete the setup.

Typical installation path:

C:Program FilesJavajdk-17

⚙️ Step 2 – Configure JAVA_HOME

To allow the system to access Java globally, we must configure environment variables.

Steps

1️⃣ Open System Properties
2️⃣ Click Environment Variables
3️⃣ Under System Variables, add:

JAVA_HOME = C:Program FilesJavajdk-17

Now update the Path variable and add:

%JAVA_HOME%bin

✅ Step 3 – Verify Java Installation

Open Command Prompt and run:

java -version

Example output:

java version "17.0.x"

Now verify the compiler:

javac -version

If both commands work, Java is installed correctly. 🎉

💻 Step 4 – Install Eclipse IDE

Next, we need an IDE to write and manage our automation code.

One of the most popular IDEs for Java automation is Eclipse.

📥 Download Eclipse

Download:

👉 Eclipse IDE for Java Developers

Installation

1️⃣ Run the Eclipse Installer
2️⃣ Select Eclipse IDE for Java Developers
3️⃣ Choose installation location
4️⃣ Launch Eclipse

The first time Eclipse opens, it will ask for a Workspace location.

Example:

C:UsersYourNameworkspace

📦 Step 5 – Maven (No Separate Installation Needed!)

Here’s something interesting I learned today. 👀

👉 Eclipse already includes Maven support by default.

This is called the m2e (Maven Integration for Eclipse) plugin.

So for most Selenium automation setups:

✅ You DO NOT need to install Maven separately.

Eclipse automatically handles:

  • Maven project creation
  • Dependency management
  • Build lifecycle

This makes setup much simpler for beginners. 🚀

🏗 Step 6 – Create a Maven Project in Eclipse

Now let’s create our automation project.

Inside Eclipse:

1️⃣ Click File → New → Maven Project
2️⃣ Select Create a simple project
3️⃣ Enter:

Group Id → com.automation

Artifact Id → selenium-project

Click Finish.

Eclipse will automatically generate the Maven structure.

📂 Maven Project Structure

After creation, your project will look like this:

src/main/java
src/test/java
pom.xml
Important File

📄 pom.xml

This file manages all project dependencies like:

  • Selenium
  • TestNG
  • WebDriverManager
  • Logging libraries

Instead of manually downloading jars, Maven handles everything automatically.

💡 My Key Learning Today

Today was all about building the right foundation for Selenium automation.

Things I learned today:

✔ How Java powers Selenium automation
✔ Why Eclipse is widely used for automation testing
✔ Maven is already integrated in Eclipse
✔ Dependencies can be managed easily using pom.xml

This setup will help me build clean and scalable automation frameworks.

🤖 A Small Note
I used ChatGPT to help structure and refine this blog while ensuring the concepts remain aligned with my trainer’s explanations.

How to Create the Perfect OG Image (With AI + A Simple Screenshot)

What is an OG Image?

OG (Open Graph) image is the preview image that shows up when you share a link on social media, Slack, Discord, or any platform that unfurls URLs.

You’ve seen it hundreds of times — that card with an image, title, and description that appears when someone drops a link in a chat.

It’s set with a simple meta tag in your HTML:

<meta property="og:image" content="/og-image.png" />

Why Should You Care?

  • Links with OG images get significantly more clicks
  • It makes your site look professional and intentional
  • Without one, platforms show a blank card or a random page element

The Easiest Way to Create One

You don’t need Figma or Photoshop. Ask any AI (ChatGPT, Claude, etc.) to:

“Create an HTML file for an OG image with my name, title, and website URL. Use a dark background, 1200×630 dimensions.”

You’ll get a simple HTML file with a styled card. Open it in your browser, and screenshot it.

Here’s a minimal example:

<!doctype html>
<html>
  <head>
    <style>
      body {
        margin: 0;
        display: flex;
        align-items: center;
        justify-content: center;
        height: 100vh;
        background: #111;
      }
      .card {
        width: 1200px;
        height: 630px;
        background: #1a1a2e;
        display: flex;
        flex-direction: column;
        align-items: center;
        justify-content: center;
        gap: 20px;
        font-family: system-ui, sans-serif;
      }
      .name { font-size: 56px; color: #fff; }
      .title { font-size: 28px; color: #888; }
    </style>
  </head>
  <body>
    <div class="card">
      <div class="name">Your Name</div>
      <div class="title">Your Title</div>
    </div>
  </body>
</html>

Taking a Pixel-Perfect Screenshot

Here’s the trick — you can’t just screenshot the browser window. You need exactly 1200×630 pixels.

Steps (Chrome):

  1. Open the HTML file in Chrome
  2. Open DevTools (F12)
  3. Click the device toolbar icon (or Ctrl+Shift+M)
  4. Set the dimensions to 1200 x 630

Now here’s the part most people miss:

  1. In the device toolbar, click the three dots menu (⋮)
  2. Select “Add device pixel ratio”
  3. Set it to 1.0

Without this step, if your display runs at 2x scaling (most modern laptops), your screenshot will be 2400×1260 instead of 1200×630.

  1. Click the three dots menu (⋮) again → “Capture screenshot”

Done. You now have a pixel-perfect 1200×630 OG image.

Adding It to Your Site

Drop the image in your public folder (for Vite/Next.js) or root directory, then add these meta tags to your <head>:

<meta property="og:image" content="/og-image.png" />
<meta property="og:title" content="Your Name - Your Title" />
<meta property="og:description" content="A short description." />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:image" content="/og-image.png" />

Test It

After deploying, paste your URL into these tools to verify:

  • OpenGraph.xyz
  • Twitter Card Validator

That’s it. A 5-minute task that makes every shared link to your site look polished.

How to Scrape Data Directly in Google Sheets (No Code Required)

A practical guide to web scraping inside Google Sheets using built-in functions and add-ons like Unlimited Sheets — no Python, no terminal, no hassle.

You don’t need Python, Selenium, or a terminal window to scrape the web. Google Sheets has built-in functions that can pull data from websites directly into your cells. And when those hit their limits, add-ons like Unlimited Sheets take things much further.

In this post I’ll walk you through both approaches: the native functions you can use right now, and how to level up with dedicated scraping functions when you need more power.

Click here if you want directly the solution regarding Web Scraping with Google Sheets: https://unlimitedsheets.com/scraping

The Built-in Functions

Google Sheets ships with a handful of IMPORT functions that act as lightweight scrapers. No extensions, no setup — just type a formula.

IMPORTXML — The Swiss Army Knife

This is the most versatile native option. It fetches structured data from any URL using XPath queries.

=IMPORTXML("https://quotes.toscrape.com/", "//span[@class='text']/text()")

This pulls every quote from the page. The first argument is the URL; the second is an XPath expression that targets specific HTML elements.

A few practical uses:

  • Extract all <h2> headings from a page: =IMPORTXML(A1, "//h2")
  • Get every link: =IMPORTXML(A1, "//a/@href")
  • Pull meta descriptions: =IMPORTXML(A1, "//meta[@name='description']/@content")

IMPORTHTML — Tables and Lists

If your target data lives in an HTML <table> or <ul>/<ol>, this one’s simpler:

=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_highest-grossing_films", "table", 1)

The third parameter is the index (starting at 1) of which table on the page you want. Great for pulling financial data, rankings, sports stats, or anything already organized in a table.

IMPORTDATA — CSV and TSV Files

When a URL points directly to a .csv or .tsv file, this function imports it cleanly:

=IMPORTDATA("https://example.com/data/export.csv")

Useful for government open data portals, public datasets, and API endpoints that return CSV.

IMPORTFEED — RSS and Atom

Need to track blog posts, news headlines, or podcast episodes? This function parses RSS and Atom feeds:

=IMPORTFEED("https://rss.nytimes.com/services/xml/rss/nyt/Technology.xml", "items title", FALSE, 10)

This returns the titles of the latest 10 items from the feed.

The Problem with Native Functions

These built-in functions are great for quick tasks, but they hit a wall fast:

  • No JavaScript rendering. If the page loads content dynamically (React, Vue, SPAs), you’ll get nothing.
  • Rate limits. Google throttles these functions. Too many calls and they start returning errors.
  • No custom headers. You can’t set a user-agent, cookies, or authentication tokens. Many sites will block you.
  • No CSS selectors. You’re stuck with XPath, which has a steeper learning curve than CSS selectors.
  • Fragile at scale. Drag a formula down 500 rows and watch things break.

For anything beyond basic extraction from static pages, you need something more robust.

Leveling Up with Unlimited Sheets

Unlimited Sheets is a Google Sheets add-on that extends your spreadsheet with 30+ functions for web scraping, SEO, and AI — all usable as regular cell formulas.

For scraping specifically, two functions stand out:

=scrapeByCssPath(url, cssSelector)

If you’re comfortable with CSS selectors (and most web developers are), this is significantly more intuitive than writing XPath:

=scrapeByCssPath("https://example.com", "h1.title")

Target elements by class, ID, attribute, or any valid CSS selector — the same syntax you’d use in document.querySelector().

=scrapeByXPath(url, xpathQuery)

Prefer XPath? This function works like IMPORTXML but runs through Unlimited Sheets’ infrastructure, which means better reliability, no Google rate limits, and support for more complex queries.

=scrapeByXPath("https://example.com", "//div[@class='price']/text()")

Why Use an Add-on Over Native Functions?

The key advantages:

  • Reliability. Requests go through dedicated infrastructure instead of Google’s shared servers, so you’re far less likely to get blocked or throttled.
  • CSS selector support. No more wrestling with XPath for simple extractions.
  • Combine scraping with AI. Unlimited Sheets also includes GPT-4 and Claude functions, so you can scrape a page and then process the content with AI in the next column — all as formulas.
  • SEO functions in the same toolkit. Need keyword positions, search volumes, or SERP data alongside your scrapes? It’s all there.

A Practical Workflow Example

Let’s say you want to monitor competitor pricing. Here’s a realistic workflow entirely inside Google Sheets:

Column A Column B Column C
Competitor URL =scrapeByCssPath(A2, ".product-price") =AI("Compare this price to ours: " & B2)
  1. Column A: List your competitor product URLs
  2. Column B: Use scrapeByCssPath to extract the price element
  3. Column C: Feed the scraped data into an AI function for analysis

No scripts. No external tools. No context switching. Everything lives in the spreadsheet.

When to Use What

Scenario Best tool
Quick one-off table extraction IMPORTHTML
Pulling structured data with XPath IMPORTXML
Importing a public CSV IMPORTDATA
Reliable scraping at scale Unlimited Sheets (scrapeByCssPath / scrapeByXPath)
Scraping + AI processing Unlimited Sheets (combine scraping + AI functions)
Dynamic/JS-rendered pages Dedicated scraping API or headless browser

Getting Started

For the native functions, you’re already set — just open a Google Sheet and start typing.

For Unlimited Sheets:

  1. Install it from the Google Workspace Marketplace
  2. Create a free account at unlimitedsheets.com
  3. Start using the functions in any cell

The free tier gives you access to several utility functions, and premium unlocks the scraping, SEO, and AI capabilities.

Web scraping doesn’t have to mean setting up a Python environment or maintaining a codebase. For a huge range of use cases, Google Sheets — especially with the right add-on — is more than enough. Start with the built-in functions, and when you need more power, give Unlimited Sheets a try.

Happy scraping. 🕷️