FastAPI를 이용하여 llm 모델 서빙하는 서비스 EC2에 배포하기

FastAPI를 이용하여 llm 모델 서빙하는 서비스 EC2에 배포하기

fastAPI란 파이썬 3.6부터 제공되는 트랜디하고 높은 성능을 가진 파이썬 프레임워크이다.

t5-small 모델 서빙하는 서버 테스트

server.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from langserve import add_routes
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import pipeline
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from starlette.requests import Request
from starlette.responses import JSONResponse, Response
from dotenv import load_dotenv
from transformers import AutoTokenizer, pipeline
import torch
import requests

app = FastAPI()


@app.get("/")
async def redirect_root_to_docs():
return RedirectResponse("/docs")

# T5 모델 사용
pipe = pipeline("text2text-generation", model="t5-small")
model = HuggingFacePipeline(pipeline=pipe)
add_routes(
app,
model,
path="/t5-small",
)


@app.get("/plugin/test")
async def test_plugin():
return "success"

if __name__ == "__main__":
import uvicorn

uvicorn.run(app, host="0.0.0.0", port=5002)

Dockerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 베이스 이미지 설정
FROM python:3.11-slim

# 작업 디렉토리 설정
WORKDIR /app

# 필요한 시스템 패키지 설치
RUN apt-get clean && apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*

# requirements 파일들 및 미리 다운로드한 패키지 복사
COPY requirements1.txt .
COPY requirements2.txt .
COPY packages /app/packages

# 첫 번째 requirements 파일 설치
RUN pip install --no-cache-dir -r requirements1.txt --default-timeout=300 -i https://pypi.tuna.tsinghua.edu.cn/simple

# 두 번째 requirements 파일 설치 (미리 다운로드한 패키지 포함)
RUN pip install --no-cache-dir -r requirements2.txt --find-links=/app/packages --default-timeout=300 --extra-index-url https://pypi.nvidia.com

# 애플리케이션 소스 코드 복사
COPY . .

# FastAPI 서버 실행
CMD ["uvicorn", "app.server:app", "--host", "0.0.0.0", "--port", "5002"]

Requirements1.txt

1
2
3
4
5
6
7
8
9
fastapi
uvicorn
langserve
langchain-community
langchain-openai
langchain-core
transformers
python-dotenv
requests

Requirements2.txt

1
2
3
torch
nvidia-cudnn-cu12
sse_starlette

이미지 빌드하기

1
% docker build -t lm-test-server:20240724 .   

도커로 실행하기

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
% docker run --name lm-test-server -p 5002:5002 lm-test-server:20240724

/usr/local/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:139: LangChainDeprecationWarning: The class `HuggingFacePipeline` was deprecated in LangChain 0.0.37 and will be removed in 0.3. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFacePipeline`.
warn_deprecated(
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:5002 (Press CTRL+C to quit)

__ ___ .__ __. _______ _______. _______ .______ ____ ____ _______
| | / \ | \ | | / _____| / || ____|| _ \ \ \ / / | ____|
| | / ^ \ | \| | | | __ | (----`| |__ | |_) | \ \/ / | |__
| | / /_\ \ | . ` | | | |_ | \ \ | __| | / \ / | __|
| `----./ _____ \ | |\ | | |__| | .----) | | |____ | |\ \----. \ / | |____
|_______/__/ \__\ |__| \__| \______| |_______/ |_______|| _| `._____| \__/ |_______|

LANGSERVE: Playground for chain "/openai/" is live at:
LANGSERVE: │
LANGSERVE: └──> /openai/playground/
LANGSERVE:
LANGSERVE: Playground for chain "/t5-small/" is live at:
LANGSERVE: │
LANGSERVE: └──> /t5-small/playground/
LANGSERVE:
LANGSERVE: See all available routes at /docs/

LANGSERVE: ⚠️ Using pydantic 2.8.2. OpenAPI docs for invoke, batch, stream, stream_log endpoints will not be generated. API endpoints and playground should work as expected. If you need to see the docs, you can downgrade to pydantic 1. For example, `pip install pydantic==1.10.13`. See https://github.com/tiangolo/fastapi/issues/10360 for details.

브라우저로 접속하여 테스트하기

http://localhost:5002/docs로 접속하여 테스트한다.

EC2에 서비스 올리기

EC2에 도커 설치

1
2
3
4
5
6
7
8
# sudo yum update -y
# amazon-linux-extras install docker -y
# service docker start
Redirecting to /bin/systemctl start docker.service
# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
# usermod -aG docker $USER
# newgrp docker

이미지 옮기기

ECR에 도커이미지를 push 한다.

도커 로그인

1
2
3
% aws configure
% aws ecr get-login-password --region <your-region> | docker login --username AWS --password-stdin <your-account-id>.dkr.ecr.<your-region>.amazonaws.com
Login Succeeded

AWS 자격 증명 파일(~/.aws/credentials)에 아래 것들이 있어야함

1
2
3
aws_access_key_id=...
aws_secret_access_key=...
aws_session_token=...

도커 이미지 푸시

1
2
3
4
# Docker 이미지 태그
docker tag my-project:latest <your-dockerhub-username>/my-project:latest

docker push <your-dockerhub-username>/my-project:latest

EC2에서 이미지 pull 하기

1
2
3
4
5
6
7
$ sudo yum install aws-cli -y
$ aws configure
$ export AWS_ACCESS_KEY_ID=[access_key_id]
$ export AWS_SECRET_ACCESS_KEY=[aws_secret_access_key]
$ export AWS_SESSION_TOKEN=[aws_session_token]
$ aws ecr get-login-password --region [region] | docker login --username AWS --password-stdin [id].dkr.ecr.[region].amazonaws.com
$ docker pull [id].dkr.ecr.[region].amazonaws.com/t5-small-fastapi:20240727

배포하는 EC2에 외부 인터넷이 연동되지 않기에 모델을 다운로드받아 이미지 업로드

download_model.py

1
2
3
4
5
6
7
8
9
10
11
12
13
# download_model.py

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "t5-small"

# 모델과 토크나이저 다운로드
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 모델 저장
model.save_pretrained("./model")
tokenizer.save_pretrained("./model")
1
% brew install transformers

도커 배포 시 참고 사항

start.sh 스크립트에 Python 애플리케이션을 백그라운드에서 실행하도록 설정하면 Docker 컨테이너가 종료되는 원인이 될 수 있다.

따라서, 애플리케이션이 포그라운드에서 실행되도록 수정해야 한다.

수정된 start.sh

1
2
3
4
5
6
7
#!/bin/sh

# Install dependencies
poetry install

# Run the application in the foreground
poetry run python3 src/main.py

Docker 빌드 중에 “No space left on device” 오류

Docker 빌드 중에 “No space left on device” 오류가 발생하는 경우, 이는 Docker 데몬이 실행되고 있는 호스트 머신의 디스크 공간이 부족하기 때문에 발생하는 문제

불필요한 Docker 이미지 및 컨테이너 삭제

1
2
3
4
5
6
7
8
9
10
11
# 중지된 모든 컨테이너 삭제
docker container prune -f

# 사용되지 않는 모든 이미지 삭제
docker image prune -a -f

# 사용되지 않는 모든 네트워크 삭제
docker network prune -f

# 사용되지 않는 모든 볼륨 삭제
docker volume prune -f

IPv4 바인딩 에러

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Installing dependencies from lock file


No dependencies to install or update


Installing the current project: orchestrator (0.1.0)

/root/.cache/pypoetry/virtualenvs/orchestrator-9TtSrW0h-py3.10/lib/python3.10/site-packages/langchain/chat_models/__init__.py:31: LangChainDeprecationWarning: Importing chat models from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:


`from langchain_community.chat_models import ChatOpenAI`.


To install langchain-community run `pip install -U langchain-community`.

warnings.warn(

2024-07-29 11:42:49,494 - WARNING - WARNING! max_length is not default parameter.

max_length was transferred to model_kwargs.

Please make sure that max_length is what you intended.

2024-07-29 11:42:49,504 - DEBUG - Starting new HTTPS connection (1): huggingface.co:443

2024-07-29 11:42:49,738 - DEBUG - https://huggingface.co:443 "GET /api/whoami-v2 HTTP/11" 200 757

2024-07-29 11:42:49,740 - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False

2024-07-29 11:42:49,741 - DEBUG - load_verify_locations cafile='/root/.cache/pypoetry/virtualenvs/orchestrator-9TtSrW0h-py3.10/lib/python3.10/site-packages/certifi/cacert.pem'

2024-07-29 11:42:49,771 - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False

2024-07-29 11:42:49,772 - DEBUG - load_verify_locations cafile='/root/.cache/pypoetry/virtualenvs/orchestrator-9TtSrW0h-py3.10/lib/python3.10/site-packages/certifi/cacert.pem'

INFO: Started server process [24]

INFO: Waiting for application startup.

INFO: Application startup complete.

ERROR: [Errno 99] error while attempting to bind on address ('::1', 8002, 0, 0): cannot assign requested address

INFO: Waiting for application shutdown.

INFO: Application shutdown complete.

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.

Token is valid (permission: fineGrained).

Your token has been saved to /root/.cache/huggingface/token

Login successful

main.py 수정

src/main.py 파일에서 서버를 실행하는 부분을 확인하고, host 매개변수를 0.0.0.0으로 설정

FastAPI를 이용하여 llm 모델 서빙하는 서비스 EC2에 배포하기

https://hamin7.github.io/2024/07/24/fastAPI/

Author

hamin

Posted on

2024-07-24

Updated on

2024-07-29

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.
You need to set client_id and slot_id to show this AD unit. Please set it in _config.yml.