SA
๋ฉ”์ธ ๋‚ด์šฉ์œผ๋กœ ์ด๋™

Project DANSO ๐Ÿชˆ: Document Abstract Notation for Semantic Operations

DANSO is a specialized markdown parser designed to enhance the performance of Language Models (LLMs) when processing markdown documents. It addresses the limitations of naive document processing approaches and provides a targeted solution to improve processing quality, coherence, and fidelity to the original content.

Key Featuresโ€‹

  • Shallow Parsing โ€” DANSO performs a shallow parsing of markdown elements, categorizing them into frontmatter, MDX/HTML tags, code blocks, and paragraphs. This allows LLMs to better understand the structure and context of the document.
  • Targeted Logic โ€” Each type of parsed entity undergoes corresponding processing logic, ensuring that only the necessary parts are processed while preserving the structure and functionality of the document.
  • Frontmatter Handling โ€” DANSO identifies and processes only the string values in the frontmatter, such as title and description, keeping the frontmatter structure intact.
  • MDX/HTML Tag Preservation โ€” The parser recognizes MDX and HTML tags and processes only the inner strings, preserving the tags and their functionality.
  • Code Block Extraction โ€” Code blocks are extracted as a whole, along with basic metadata. DANSO can optionally process code comments while keeping the code itself unmodified.
  • Paragraph Context โ€” The raw text of each paragraph is provided as a whole, allowing LLMs to understand the context and generate more coherent outputs.

Why DANSO?โ€‹

Traditional document processing approaches often face challenges when dealing with markdown documents, such as:

  • Unintentional processing of MDX tags and modification of code blocks, leading to syntax errors and altered functionality.
  • Limited context due to parsing at the HTML tag level, resulting in suboptimal outputs and difficulty in reordering sentences.
  • Improper handling of proper nouns, causing confusion and altering the intended meaning.

DANSO addresses these challenges by providing a specialized parsing approach that preserves important elements, maintains context, and applies appropriate processing strategies to different parts of the document. This leads to improved processing quality, coherence, and fidelity to the original content.

Usageโ€‹

To use DANSO in your project, follow these steps:

  1. Install the DANSO package:
    npm install danso
  2. Import the DANSO parser in your code:
    import { parse } from 'danso'
  3. Parse your markdown document using DANSO:
    const parsed = parse(markdownString)
  4. Process the parsed entities according to your requirements, such as further analysis or transformation.

Contributingโ€‹

We welcome contributions to DANSO! If you encounter any issues or have suggestions for improvements, please open an issue on our GitHub repository. Feel free to submit pull requests with bug fixes, enhancements, or new features.

Licenseโ€‹

DANSO is open-source software licensed under the MIT License. You are free to use, modify, and distribute the software in accordance with the terms of the license.

Acknowledgementsโ€‹

We would like to thank the contributors and the open-source community for their valuable input and support in developing DANSO.

Contactโ€‹

For any questions or inquiries, please contact us at hey@cho.sh


Happy parsing and processing with DANSO ๐Ÿชˆ!

Inspired & Modified from wobsoriano/bun-lib-starter: My minimal bun library starter

์ด ๋ฌธ์„œ๋ฅผ ์–ธ๊ธ‰ํ•œ ๋ฌธ์„œ๋“ค