base on ☁️ Build multimodal AI applications with cloud-native stack # Jina-Serve <a href="https://pypi.org/project/jina/"><img alt="PyPI" src="https://img.shields.io/pypi/v/jina?label=Release&style=flat-square"></a> <a href="https://discord.jina.ai"><img src="https://img.shields.io/discord/1106542220112302130?logo=discord&logoColor=white&style=flat-square"></a> <a href="https://pypistats.org/packages/jina"><img alt="PyPI - Downloads from official pypistats" src="https://img.shields.io/pypi/dm/jina?style=flat-square"></a> <a href="https://github.com/jina-ai/jina/actions/workflows/cd.yml"><img alt="Github CD status" src="https://github.com/jina-ai/jina/actions/workflows/cd.yml/badge.svg"></a> Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic. ## Key Features - Native support for all major ML frameworks and data types - High-performance service design with scaling, streaming, and dynamic batching - LLM serving with streaming output - Built-in Docker integration and Executor Hub - One-click deployment to Jina AI Cloud - Enterprise-ready with Kubernetes and Docker Compose support <details> <summary><strong>Comparison with FastAPI</strong></summary> Key advantages over FastAPI: - DocArray-based data handling with native gRPC support - Built-in containerization and service orchestration - Seamless scaling of microservices - One-command cloud deployment </details> ## Install ```bash pip install jina ``` See guides for [Apple Silicon](https://jina.ai/serve/get-started/install/apple-silicon-m1-m2/) and [Windows](https://jina.ai/serve/get-started/install/windows/). ## Core Concepts Three main layers: - **Data**: BaseDoc and DocList for input/output - **Serving**: Executors process Documents, Gateway connects services - **Orchestration**: Deployments serve Executors, Flows create pipelines ## Build AI Services Let's create a gRPC-based AI service using StableLM: ```python from jina import Executor, requests from docarray import DocList, BaseDoc from transformers import pipeline class Prompt(BaseDoc): text: str class Generation(BaseDoc): prompt: str text: str class StableLM(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) self.generator = pipeline( 'text-generation', model='stabilityai/stablelm-base-alpha-3b' ) @requests def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]: generations = DocList[Generation]() prompts = docs.text llm_outputs = self.generator(prompts) for prompt, output in zip(prompts, llm_outputs): generations.append(Generation(prompt=prompt, text=output)) return generations ``` Deploy with Python or YAML: ```python from jina import Deployment from executor import StableLM dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345) with dep: dep.block() ``` ```yaml jtype: Deployment with: uses: StableLM py_modules: - executor.py timeout_ready: -1 port: 12345 ``` Use the client: ```python from jina import Client from docarray import DocList from executor import Prompt, Generation prompt = Prompt(text='suggest an interesting image generation prompt') client = Client(port=12345) response = client.post('/', inputs=[prompt], return_type=DocList[Generation]) ``` ## Build Pipelines Chain services into a Flow: ```python from jina import Flow flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage) with flow: flow.block() ``` ## Scaling and Deployment ### Local Scaling Boost throughput with built-in features: - Replicas for parallel processing - Shards for data partitioning - Dynamic batching for efficient model inference Example scaling a Stable Diffusion deployment: ```yaml jtype: Deployment with: uses: TextToImage timeout_ready: -1 py_modules: - text_to_image.py env: CUDA_VISIBLE_DEVICES: RR replicas: 2 uses_dynamic_batching: /default: preferred_batch_size: 10 timeout: 200 ``` ### Cloud Deployment #### Containerize Services 1. Structure your Executor: ``` TextToImage/ ├── executor.py ├── config.yml ├── requirements.txt ``` 2. Configure: ```yaml # config.yml jtype: TextToImage py_modules: - executor.py metas: name: TextToImage description: Text to Image generation Executor ``` 3. Push to Hub: ```bash jina hub push TextToImage ``` #### Deploy to Kubernetes ```bash jina export kubernetes flow.yml ./my-k8s kubectl apply -R -f my-k8s ``` #### Use Docker Compose ```bash jina export docker-compose flow.yml docker-compose.yml docker-compose up ``` #### JCloud Deployment Deploy with a single command: ```bash jina cloud deploy jcloud-flow.yml ``` ## LLM Streaming Enable token-by-token streaming for responsive LLM applications: 1. Define schemas: ```python from docarray import BaseDoc class PromptDocument(BaseDoc): prompt: str max_tokens: int class ModelOutputDocument(BaseDoc): token_id: int generated_text: str ``` 2. Initialize service: ```python from transformers import GPT2Tokenizer, GPT2LMHeadModel class TokenStreamingExecutor(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) self.model = GPT2LMHeadModel.from_pretrained('gpt2') ``` 3. Implement streaming: ```python @requests(on='/stream') async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument: input = tokenizer(doc.prompt, return_tensors='pt') input_len = input['input_ids'].shape[1] for _ in range(doc.max_tokens): output = self.model.generate(**input, max_new_tokens=1) if output[0][-1] == tokenizer.eos_token_id: break yield ModelOutputDocument( token_id=output[0][-1], generated_text=tokenizer.decode( output[0][input_len:], skip_special_tokens=True ), ) input = { 'input_ids': output, 'attention_mask': torch.ones(1, len(output[0])), } ``` 4. Serve and use: ```python # Server with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep: dep.block() # Client async def main(): client = Client(port=12345, protocol='grpc', asyncio=True) async for doc in client.stream_doc( on='/stream', inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10), return_type=ModelOutputDocument, ): print(doc.generated_text) ``` ## Support Jina-serve is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE). ", Assign "at most 3 tags" to the expected json: {"id":"9320","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"