Files
ai_dify_plat_api/doc_convert/ReadMe.md

90 lines
2.1 KiB
Markdown
Raw Permalink Normal View History

2026-06-01 16:31:17 +08:00
## doc_convert 模块
独立文档转换服务,用于消费 MQ 任务,将 Word 转为分页 HTML、上传 OSS并生成 ZIP 包供前端下载。
### 功能概览
- MQ 消费任务(长耗时任务容器)
- Word -> HTML按页切分
- HTML 与资源上传 OSS
- 打包 ZIP 上传 OSS
- 回写结果消息MQ callback
- 幂等/去重/版本过期处理Redis
### 依赖准备
-`aspose-words-23.1.jar` 放入 `api/doc_convert/lib/` 并确保合法授权
- 配置 Redis用于 OSS 配置和幂等/去重)
- 配置 RabbitMQ
### 任务消息示例JSON
```json
{
"taskId": "t-10001",
"docId": "doc-888",
"version": 3,
"ossKey": "upload/documents/xxx.docx",
"ossConfigKey": "default",
"fileHash": "sha256...",
"splitPages": 10,
"options": {
"splitPages": 10,
"embedImages": false,
"extractImages": true,
"cssInline": true,
"outputPrefix": "doc_convert"
},
"callback": {
"exchange": "doc_convert.result.exchange",
"routingKey": "doc_convert.result"
}
}
```
### 结果消息示例JSON
```json
{
"taskId": "t-10001",
"docId": "doc-888",
"version": 3,
"status": "ready",
"message": "ok",
"pageCount": 42,
"splitPages": 10,
"zipUrl": "https://oss.xxx/doc_convert/doc-888/3/zip/doc-888_3.zip",
"manifestUrl": "https://oss.xxx/doc_convert/doc-888/3/manifest.json",
"parts": [
{
"startPage": 1,
"endPage": 10,
"htmlKey": "doc_convert/doc-888/3/parts/part_1-10.html",
"htmlUrl": "https://oss.xxx/doc_convert/doc-888/3/parts/part_1-10.html",
"assetsPrefix": "doc_convert/doc-888/3/assets/part_1-10"
}
]
}
```
### 打包说明
ZIP 包结构示例:
```
manifest.json
parts/
part_1-10.html
part_1-10.css
assets/
part_1-10/
image1.png
```
### 配置
参考 `api/doc_convert/src/main/resources/application-*.yml`
- `doc-convert.default-split-pages`:默认每份 HTML 页数
- `doc-convert.output-prefix`OSS 输出前缀
- `doc-convert.task-ttl-hours`:幂等任务缓存 TTL
- `doc-convert.hash-ttl-days`:文件 Hash 去重 TTL
### 运行
```bash
mvn -pl doc_convert -am clean package -P dev
```