Text search in Firebase Firestore - Part 1

March 16, 2021
google cloud platform cloud endpoints cloud run algolia cloud functions python terraform docker-compose

Goals

Motivation

Users expect to have Google like search in their apps these days. For example, you may want to search for recipes containing a certain word or ingredient. Firestore doesn’t support native indexing or search for text fields in documents. Additionally, downloading all recipes only to show a few in the app is not practical and does not scale.

Topics covered

Prerequisite

An app where users can create, update or delete recipes. Recipes are stored in Firebase Firestore run in Native mode - a popular choice because of offline support, realtime updates, multi-platform support.

Architecture

Architecture

Search service consists of three components:

Why do you need Events Listener? Why not listen to Firestore events in Recipes Query service? At the time of writing the Cloud Run does not support Firestore triggers.

Algolia, search as a service provider, is used to speed up a development. Recipes Query node is just a thin wrapper around Algolia’s API.

Why need for Recipes Query than?

Search implementation details:

Recipes Query skeleton

It exposes POST /v1/recipes/query that is used to search for recipes. Usage example:

curl -X POST \
     -d '{ "params": "query=stew" }' \
     "https://${RECIPE_QUERY_URL}/v1/recipes/query"

Initially just echoes back query value as a focus is on a service skeleton only for now:

{
    "hits": "stew"
}

NGINX is used as an API Gateway placeholder for local testing. Later is replaced by Cloud Endpoints.

Recipes Query skeleton source code:

$ tree
.
├── docker-compose.yml
├── endpoints
│   └── nginx.conf
└── recipes
    ├── Dockerfile
    ├── requirements.txt
    ├── service
    │   ├── api
    │   │   ├── __init__.py
    │   │   ├── ping.py
    │   │   └── recipes.py
    │   ├── config.py
    │   ├── __init__.py
    │   ├── main.py
    │   └── schemas.py
    └── tests
        ├── conftest.py
        ├── test_api_recipes.py
        └── test_schemas.py

5 directories, 14 files

/docker-compose.yml
version: '3.8'

services:

  recipes:
    build: ./recipes
    command: uvicorn service.main:app --reload --workers 1 --host 0.0.0.0 --port 8000
    volumes:
      - ./recipes/:/recipes
    ports:
      - 6758:8000
    environment:
      - ENVIRONMENT=dev
      - TESTING=0

  nginx:
    image: nginx:1.19.8
    ports:
      - 8080:8080
    volumes:
      - ./endpoints/nginx.conf:/etc/nginx/conf.d/default.conf
    depends_on:
      - recipes

/endpoints/nginx.conf
server {
  listen 8080;

  location /v1/recipes/ {
    proxy_pass http://recipes:8000/;
  }
}

/recipes/.dockerignore
env
.dockerignore
Dockerfile
*.pyc
*.pyo
*.pyd
__pycache__

/recipes/.flake8
[flake8]
max-line-length = 88

/recipes/Dockerfile
FROM gcr.io/nutimi-build/python:3.8.3-slim-buster

# set working directory
WORKDIR /recipes

# set environment varibles
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# install system dependencies
RUN apt-get update \
  && apt-get -y install netcat gcc \
  && apt-get clean

# install python dependencies
RUN pip install --upgrade pip
COPY ./requirements.txt .
RUN pip install -r requirements.txt

/recipes/requirements.txt
asyncpg==0.20.1
black==19.10b0
fastapi==0.58.1
flake8==3.8.3
gunicorn==20.0.4
isort==5.2.2
pytest-cov==2.10.1
pytest==5.4.2
requests==2.23.0
uvicorn==0.11.5

/recipes/service/init.py
import os
import sys

sys.path.append(os.path.dirname(os.path.realpath(__file__)))

/recipes/service/api/init.py

/recipes/service/api/ping.py
from fastapi import APIRouter, Depends

from service.config import Settings, get_settings

router = APIRouter()


@router.get("/ping")
async def pong(settings: Settings = Depends(get_settings)):
    return {
        "ping": "pong!",
        "environment": settings.environment,
        "testing": settings.testing,
    }

/recipes/service/api/recipes.py
from fastapi import APIRouter

from schemas import RecipesQueryInSchema, RecipesQueryOutSchema


router = APIRouter()


@router.post("/query", response_model=RecipesQueryOutSchema)
async def recipes_query(payload: RecipesQueryInSchema) -> RecipesQueryOutSchema:
    return {"hits": payload.params}

/recipes/service/config.py
"""Environment specific configuration."""

import logging
import os
from functools import lru_cache

from pydantic import BaseSettings

log = logging.getLogger(__name__)


class Settings(BaseSettings):
    environment: str = os.getenv("ENVIRONMENT", "dev")
    testing: bool = os.getenv("TESTING", 0)


@lru_cache()
def get_settings() -> BaseSettings:
    """Returns environment settings"""
    log.info("Loading config settings from the environment...")
    return Settings()

/recipes/service/main.py
import logging

from fastapi import FastAPI

from api import ping, recipes

log = logging.getLogger(__name__)


def create_application() -> FastAPI:
    application = FastAPI(root_path="/v1/recipes")
    application.include_router(ping.router)
    application.include_router(recipes.router, tags=["recipes"])
    return application


app = create_application()


@app.on_event("startup")
async def startup_event():
    log.info("Starting up...")


@app.on_event("shutdown")
async def shutdown_event():
    log.info("Shutting down...")

/recipes/service/schemas.py
from pydantic import BaseModel


class RecipesQueryInSchema(BaseModel):
    params: str


class RecipesQueryOutSchema(BaseModel):
    hits: str

/recipes/tests/conftest.py
import pytest
from starlette.testclient import TestClient

from service.main import create_application


@pytest.fixture(scope="module")
def test_app():
    app = create_application()
    with TestClient(app) as test_client:
        yield test_client

/recipes/tests/test_api_recipes.py
import json


class TestRecipesQuery:
    def test_query(self, test_app):
        query = "hello"
        result = test_app.post("/query", data=json.dumps({"params": query}))
        assert result.status_code == 200
        assert result.json()["hits"] == query

/recipes/tests/test_schemas.py
from service.schemas import (
    RecipesQueryOutSchema
)


class TestQuerySchema:
    def test_success(self):
        schema = RecipesQueryOutSchema(**{"hits": "test"})
        assert schema.hits == "test"

Build and run it:

$ docker-compose up -d --build

Check the status:

$ docker-compose ps
            Name                           Command               State               Ports
-------------------------------------------------------------------------------------------------------
search-in-firestore_nginx_1     /docker-entrypoint.sh ngin ...   Up      80/tcp, 0.0.0.0:8080->8080/tcp
search-in-firestore_recipes_1   uvicorn service.main:app - ...   Up      0.0.0.0:6758->8000/tcp

Up and running so test it:

$ docker-compose exec recipes python -m pytest
============================== test session starts ==============================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.10.0, pluggy-0.13.1
rootdir: /recipes
plugins: cov-2.10.1
collected 2 items

tests/test_api_recipes.py .                                               [ 50%]
tests/test_schemas.py .                                                   [100%]

=============================== 2 passed in 0.04s ===============================

Manual sanity checks with curl:

$ curl -w "\n" localhost:6758/ping
{"ping":"pong!","environment":"dev","testing":false}
$ curl -w "\n" -d '{"params": "hello"}' localhost:6758/query
{"hits":"hello"}

Similar but through API gateway (NGINX):

$ curl -w "\n" localhost:8080/v1/recipes/ping
{"ping":"pong!","environment":"dev","testing":false}
$ curl -w "\n" -d '{"params": "hello"}' localhost:8080/v1/recipes/query
{"hits":"hello"}

Firestore Event Listener skeleton

Purpose of that service is to listen to changes in Firestore and relay them to Recipe Query for recipes indexes update.

The simplest approach to intercept Firestore events is with Firestore triggers and Cloud Function.

Firestore Emulator and Functions Framework assist with testing the function locally. Let’s add Event Listener service placeholder and Firestore Emulator:

/docker-compose.yml
version: '3.8'

services:

  recipes:
    build: ./recipes
    command: uvicorn service.main:app --reload --workers 1 --host 0.0.0.0 --port 8000
    volumes:
      - ./recipes/:/recipes
    ports:
      - 6758:8000
    environment:
      - ENVIRONMENT=dev
      - TESTING=0

  firestore_emulator:
    build:
      context: ./events_listener/
      dockerfile: Dockerfile-firestore-emulator
    command: >
      /bin/bash -c
        "java -jar /root/.cache/firebase/emulators/cloud-firestore-emulator-*.jar \
          --functions_emulator events_listener:8002 --host 0.0.0.0 --port 8001"
    ports:
      - 6759:8001

  events_listener:
    build:
      context: ./events_listener/
      dockerfile: Dockerfile-functions
    command: >
      functions-framework
        --target=firestore_event
        --signature-type=event
        --port=8002
    environment:
      - FIRESTORE_EMULATOR_HOST=firestore_emulator:8001
    depends_on:
      - firestore_emulator

  nginx:
    image: nginx:1.19.8
    ports:
      - 8080:8080
    volumes:
      - ./endpoints/nginx.conf:/etc/nginx/conf.d/default.conf
    depends_on:
      - recipes

/events_listener/Dockerfile-firestore-emulator
FROM openjdk:16-slim-buster

RUN apt-get update; apt-get install -y curl \
    && curl -sL https://deb.nodesource.com/setup_15.x | bash - \
    && apt-get install -y nodejs \
    && curl -L https://www.npmjs.com/install.sh | sh \
    && npm install -g firebase-tools \
    && firebase setup:emulators:firestore

/events_listener/Dockerfile-functions
FROM python:3.8.3-slim-buster

ARG PORT

ENV APP_HOME /app

WORKDIR $APP_HOME

ENV PYTHONUNBUFFERED TRUE

# Copy local code to the container image.
COPY . .

# Install production dependencies.
RUN pip install gunicorn functions-framework
RUN pip install -r requirements.txt

/events_listener/main.py
import json


def firestore_event(data, context):
    """Triggered by a change to a Firestore document.
    Args:
        data (dict): The event payload.
        context (google.cloud.functions.Context): Metadata for the event.
    """
    trigger_resource = context.resource

    print("Function triggered by change to: %s" % trigger_resource)

    print("\nOld value:")
    print(json.dumps(data["oldValue"]))

    print("\nNew value:")
    print(json.dumps(data["value"]))

/events_listener/requirements.txt
# nothing here yet

Structure of the project:

.
├── docker-compose.yml
├── endpoints
│   └── nginx.conf
├── events_listener
│   ├── Dockerfile-firestore-emulator
│   ├── Dockerfile-functions
│   ├── main.py
│   └── requirements.txt
└── recipes
    ├── Dockerfile
    ├── requirements.txt
    ├── service
    │   ├── api
    │   │   ├── __init__.py
    │   │   ├── ping.py
    │   │   └── recipes.py
    │   ├── config.py
    │   ├── __init__.py
    │   ├── main.py
    │   └── schemas.py
    └── tests
        ├── conftest.py
        ├── test_api_recipes.py
        └── test_schemas.py

6 directories, 18 files

Build and run it:

$ docker-compose up -d --build

Check the status:

$ docker-compose ps
                  Name                                Command               State               Ports
------------------------------------------------------------------------------------------------------------------
search-in-firestore_events_listener_1      functions-framework --targ ...   Up
search-in-firestore_firestore_emulator_1   /bin/bash -c java -jar /ro ...   Up      0.0.0.0:6759->8001/tcp
search-in-firestore_nginx_1                /docker-entrypoint.sh ngin ...   Up      80/tcp, 0.0.0.0:8080->8080/tcp
search-in-firestore_recipes_1              uvicorn service.main:app - ...   Up      0.0.0.0:6758->8000/tcp

Manual sanity checks with curl that Firestore Emulator runs:

$ curl localhost:6759
Ok

Summary

Post introduces project structure and basic building blocks that are used as a foundation of the next post where we start wiring them together.

Use docker for software development environment isolation

February 20, 2022
docker flutter python productivity

Deploy Google Cloud Endpoints for Cloud Run with Terraform

September 16, 2020
google cloud platform terraform endpoints cloud run