• Tech Dev NotesTech Dev Notes
Apps
  • App lookup
  • App compare
Market movement
  • App charts
  • App rankings
Visual proof
  • App screens
  • App listing screenshots
  • App icons
Build intelligence
  • App tech stacks
  • Tool releases
  • Developers
More
  • X feature flags
  • Grokipedia
  • Blog
  • Follow on X
Skip to content
All content/ filesChangelog

grok-build/latest/content · 0.2.69

skills/xlsx/scripts/office/unpack.py

Skill·3.9 KB·132 lines

content/

  • .

    • README.md
  • docs/user-guide

    • 01-getting-started.md
    • 02-authentication.md
    • 03-keyboard-shortcuts.md
    • 04-slash-commands.md
    • 05-configuration.md
    • 06-theming.md
    • 07-mcp-servers.md
    • 08-skills.md
    • 09-plugins.md
    • 10-hooks.md
    • 11-custom-models.md
    • 12-project-rules.md
    • 13-memory.md
    • 14-headless-mode.md
    • 15-agent-mode.md
    • 16-subagents.md
    • 17-sessions.md
    • 18-sandbox.md
    • 19-plan-mode.md
    • 20-background-tasks.md
    • 21-terminal-support.md
    • 22-permissions-and-safety.md
  • skills/check-work

    • SKILL.md
  • skills/code-review

    • SKILL.md
  • skills/create-skill

    • SKILL.md
  • skills/docx

    • SKILL.md
  • skills/docx/scripts

    • __init__.py
    • accept_changes.py
    • comment.py
  • skills/docx/scripts/office

    • pack.py
    • soffice.py
    • unpack.py
    • validate.py
  • skills/docx/scripts/office/helpers

    • __init__.py
    • merge_runs.py
    • simplify_redlines.py
  • skills/docx/scripts/office/schemas/ecma/fouth-edition

    • opc-contentTypes.xsd
    • opc-coreProperties.xsd
    • opc-digSig.xsd
    • opc-relationships.xsd
  • skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016

    • dml-chart.xsd
    • dml-chartDrawing.xsd
    • dml-diagram.xsd
    • dml-lockedCanvas.xsd
    • dml-main.xsd
    • dml-picture.xsd
    • dml-spreadsheetDrawing.xsd
    • dml-wordprocessingDrawing.xsd
    • pml.xsd
    • shared-additionalCharacteristics.xsd
    • shared-bibliography.xsd
    • shared-commonSimpleTypes.xsd
    • shared-customXmlDataProperties.xsd
    • shared-customXmlSchemaProperties.xsd
    • shared-documentPropertiesCustom.xsd
    • shared-documentPropertiesExtended.xsd
    • shared-documentPropertiesVariantTypes.xsd
    • shared-math.xsd
    • shared-relationshipReference.xsd
    • sml.xsd
    • vml-main.xsd
    • vml-officeDrawing.xsd
    • vml-presentationDrawing.xsd
    • vml-spreadsheetDrawing.xsd
    • vml-wordprocessingDrawing.xsd
    • wml.xsd
    • xml.xsd
  • skills/docx/scripts/office/schemas/mce

    • mc.xsd
  • skills/docx/scripts/office/schemas/microsoft

    • wml-2010.xsd
    • wml-2012.xsd
    • wml-2018.xsd
    • wml-cex-2018.xsd
    • wml-cid-2016.xsd
    • wml-sdtdatahash-2020.xsd
    • wml-symex-2015.xsd
  • skills/docx/scripts/office/validators

    • __init__.py
    • base.py
    • docx.py
    • pptx.py
    • redlining.py
  • skills/docx/scripts/templates

    • comments.xml
    • commentsExtended.xml
    • commentsExtensible.xml
    • commentsIds.xml
    • people.xml
  • skills/help

    • SKILL.md
  • skills/imagine

    • SKILL.md
  • skills/pptx

    • editing.md
    • pptxgenjs.md
    • SKILL.md
  • skills/pptx/scripts

    • __init__.py
    • add_slide.py
    • clean.py
    • thumbnail.py
  • skills/pptx/scripts/office

    • pack.py
    • soffice.py
    • unpack.py
    • validate.py
  • skills/pptx/scripts/office/helpers

    • __init__.py
    • merge_runs.py
    • simplify_redlines.py
  • skills/pptx/scripts/office/schemas/ecma/fouth-edition

    • opc-contentTypes.xsd
    • opc-coreProperties.xsd
    • opc-digSig.xsd
    • opc-relationships.xsd
  • skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016

    • dml-chart.xsd
    • dml-chartDrawing.xsd
    • dml-diagram.xsd
    • dml-lockedCanvas.xsd
    • dml-main.xsd
    • dml-picture.xsd
    • dml-spreadsheetDrawing.xsd
    • dml-wordprocessingDrawing.xsd
    • pml.xsd
    • shared-additionalCharacteristics.xsd
    • shared-bibliography.xsd
    • shared-commonSimpleTypes.xsd
    • shared-customXmlDataProperties.xsd
    • shared-customXmlSchemaProperties.xsd
    • shared-documentPropertiesCustom.xsd
    • shared-documentPropertiesExtended.xsd
    • shared-documentPropertiesVariantTypes.xsd
    • shared-math.xsd
    • shared-relationshipReference.xsd
    • sml.xsd
    • vml-main.xsd
    • vml-officeDrawing.xsd
    • vml-presentationDrawing.xsd
    • vml-spreadsheetDrawing.xsd
    • vml-wordprocessingDrawing.xsd
    • wml.xsd
    • xml.xsd
  • skills/pptx/scripts/office/schemas/mce

    • mc.xsd
  • skills/pptx/scripts/office/schemas/microsoft

    • wml-2010.xsd
    • wml-2012.xsd
    • wml-2018.xsd
    • wml-cex-2018.xsd
    • wml-cid-2016.xsd
    • wml-sdtdatahash-2020.xsd
    • wml-symex-2015.xsd
  • skills/pptx/scripts/office/validators

    • __init__.py
    • base.py
    • docx.py
    • pptx.py
    • redlining.py
  • skills/xlsx/scripts

    • recalc.py
  • skills/xlsx/scripts/office

    • pack.py
    • soffice.py
    • unpack.py
    • validate.py
  • skills/xlsx/scripts/office/helpers

    • __init__.py
    • merge_runs.py
    • simplify_redlines.py
  • skills/xlsx/scripts/office/schemas/ecma/fouth-edition

    • opc-contentTypes.xsd
  • skills/xlsx/scripts/office/validators

    • __init__.py
    • base.py
    • docx.py
    • pptx.py
    • redlining.py
"""Unpack Office files (DOCX, PPTX, XLSX) for editing.

Extracts the ZIP archive, pretty-prints XML files, and optionally:
- Merges adjacent runs with identical formatting (DOCX only)
- Simplifies adjacent tracked changes from same author (DOCX only)

Usage:
    python unpack.py <office_file> <output_dir> [options]

Examples:
    python unpack.py document.docx unpacked/
    python unpack.py presentation.pptx unpacked/
    python unpack.py document.docx unpacked/ --merge-runs false
"""

import argparse
import sys
import zipfile
from pathlib import Path

import defusedxml.minidom
from helpers.merge_runs import merge_runs as do_merge_runs
from helpers.simplify_redlines import simplify_redlines as do_simplify_redlines

SMART_QUOTE_REPLACEMENTS = {
    "\u201c": "&#x201C;",
    "\u201d": "&#x201D;",
    "\u2018": "&#x2018;",
    "\u2019": "&#x2019;",
}


def unpack(
    input_file: str,
    output_directory: str,
    merge_runs: bool = True,
    simplify_redlines: bool = True,
) -> tuple[None, str]:
    input_path = Path(input_file)
    output_path = Path(output_directory)
    suffix = input_path.suffix.lower()

    if not input_path.exists():
        return None, f"Error: {input_file} does not exist"

    if suffix not in {".docx", ".pptx", ".xlsx"}:
        return None, f"Error: {input_file} must be a .docx, .pptx, or .xlsx file"

    try:
        output_path.mkdir(parents=True, exist_ok=True)

        with zipfile.ZipFile(input_path, "r") as zf:
            zf.extractall(output_path)

        xml_files = list(output_path.rglob("*.xml")) + list(output_path.rglob("*.rels"))
        for xml_file in xml_files:
            _pretty_print_xml(xml_file)

        message = f"Unpacked {input_file} ({len(xml_files)} XML files)"

        if suffix == ".docx":
            if simplify_redlines:
                simplify_count, _ = do_simplify_redlines(str(output_path))
                message += f", simplified {simplify_count} tracked changes"

            if merge_runs:
                merge_count, _ = do_merge_runs(str(output_path))
                message += f", merged {merge_count} runs"

        for xml_file in xml_files:
            _escape_smart_quotes(xml_file)

        return None, message

    except zipfile.BadZipFile:
        return None, f"Error: {input_file} is not a valid Office file"
    except Exception as e:
        return None, f"Error unpacking: {e}"


def _pretty_print_xml(xml_file: Path) -> None:
    try:
        content = xml_file.read_text(encoding="utf-8")
        dom = defusedxml.minidom.parseString(content)
        xml_file.write_bytes(dom.toprettyxml(indent="  ", encoding="utf-8"))
    except Exception:
        pass


def _escape_smart_quotes(xml_file: Path) -> None:
    try:
        content = xml_file.read_text(encoding="utf-8")
        for char, entity in SMART_QUOTE_REPLACEMENTS.items():
            content = content.replace(char, entity)
        xml_file.write_text(content, encoding="utf-8")
    except Exception:
        pass


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Unpack an Office file (DOCX, PPTX, XLSX) for editing"
    )
    parser.add_argument("input_file", help="Office file to unpack")
    parser.add_argument("output_directory", help="Output directory")
    parser.add_argument(
        "--merge-runs",
        type=lambda x: x.lower() == "true",
        default=True,
        metavar="true|false",
        help="Merge adjacent runs with identical formatting (DOCX only, default: true)",
    )
    parser.add_argument(
        "--simplify-redlines",
        type=lambda x: x.lower() == "true",
        default=True,
        metavar="true|false",
        help="Merge adjacent tracked changes from same author (DOCX only, default: true)",
    )
    args = parser.parse_args()

    _, message = unpack(
        args.input_file,
        args.output_directory,
        merge_runs=args.merge_runs,
        simplify_redlines=args.simplify_redlines,
    )
    print(message)

    if "Error" in message:
        sys.exit(1)
Previousskills/xlsx/scripts/office/soffice.pyNextskills/xlsx/scripts/office/validate.py

© 2026 Tech Dev Notes

RSSAboutAPIPrivacyTermsSitemap@techdevnotes