Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: CI

on:
pull_request:
branches: [ main ]
push:
branches: [ main ]
paths:
- '**/*.py'
- 'requirements.txt'
- '.github/workflows/ci.yml'

jobs:
lint-test:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'

- name: Install deps
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Syntax check
run: |
python -m py_compile $(git ls-files '*.py' | tr '\n' ' ')

- name: Import smoke
run: |
python - << 'PY'
from importlib import import_module
import_module('main')
print('Import OK')
PY

35 changes: 35 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Docker Build & Push (Backend)

on:
push:
branches: [ main ]
paths:
- '**'
- '!README.md'

jobs:
docker:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4

- name: Log in to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile
push: true
tags: |
ghcr.io/${{ github.repository }}:backend-latest
ghcr.io/${{ github.repository }}:backend-${{ github.sha }}

20 changes: 20 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
FROM python:3.11-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1

WORKDIR /app

# System dependencies for pdf2image
RUN apt-get update && apt-get install -y --no-install-recommends \
poppler-utils \
&& rm -rf /var/lib/apt/lists/*

COPY Backend/requirements.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

COPY Backend/ ./

EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

55 changes: 55 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Noteflow Backend (FastAPI)

## Overview
- FastAPI backend for Noteflow
- OCR pipeline supports images, PDF, DOC/DOCX, HWP (via utilities and system tools)

## Run (local)
```
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8080 --reload
```

Env (optional):
- `SECRET_KEY`, `ACCESS_TOKEN_EXPIRE_MINUTES`
- Database URLs if you connect a DB (current code uses provided models)

## OCR system tools (optional but recommended)
- PyMuPDF (Python) used by default for PDF text extraction
- Optional fallbacks/tools:
- Poppler (`pdftoppm`) for `pdf2image`
- LibreOffice (`soffice`) for .doc → .pdf
- `hwp5txt` for .hwp text extraction
- If missing, the API still returns 200 with `warnings` explaining limitations.

## API Highlights
- `POST /api/v1/files/ocr` — OCR and create note (accepts file + optional `folder_id`, `langs`, `max_pages`)
- `POST /api/v1/files/upload` — Upload files to folder
- `POST /api/v1/files/audio` — STT from audio, create/append to note

## CI (GitHub Actions)
- This folder includes `.github/workflows/ci.yml` to lint/smoke-test on push/PR.
- Python 3.11, `pip install -r requirements.txt`, syntax check and import smoke.

## Docker (optional; for later)
- Dockerfile included. Build & run locally:
```
docker build -t noteflow-backend .
docker run --rm -p 8080:8080 noteflow-backend
```
- GitHub Actions container build:
- `.github/workflows/docker.yml` pushes to GHCR:
- `ghcr.io/<owner>/<repo>:backend-latest`
- `ghcr.io/<owner>/<repo>:backend-<sha>`
- Deployment example (SSH) once you’re ready:
```
docker login ghcr.io -u <USER> -p <TOKEN>
docker pull ghcr.io/<owner>/<repo>:backend-latest
docker run -d --name backend --restart=always -p 8080:8080 ghcr.io/<owner>/<repo>:backend-latest
```

## Notes
- If you split this folder into its own repository root, the included `.github/workflows/*.yml` files will work as-is.
- OCR uses model-first path (EasyOCR + TrOCR) and falls back to tesseract when available.
22 changes: 16 additions & 6 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,31 @@ uvicorn
pydantic
sqlalchemy
mysql-connector-python
dotenv
python-dotenv
google-auth
requests
python-jose[cryptography]
bcrypt

torch==2.3.0+cu121
torchaudio==2.3.0+cu121
torchvision==0.18.0+cu121
--extra-index-url https://download.pytorch.org/whl/cu121
# PyTorch (MacOS: CPU/MPS 빌드 자동 설치됨)
torch==2.3.0
torchvision==0.18.0
torchaudio==2.3.0

transformers>=4.40.0
accelerate
sentencepiece
protobuf
python-multipart
easyocr
whisper
whisper
pytesseract
pdf2image
PyMuPDF
python-docx

langchain>=0.2.0
langchain-community
langchain-core
langchain-openai
langchain-ollama
Loading