7. Scheduling

Rather than maintaining several CRONs to update facts at certain times, pylytics contains basic scheduling capabilities.

7.1. Defining schedules

Here is an example fact which uses scheduling:

from datetime import timedelta

from pylytics.library.fact import Fact
from pylytics.library.schedule import Schedule
from pylytics.library.column import Metric, DimensionKey

from dimension.store import Manager, Store
from dimension.date import Date
from dimension.time import Time


class Sales(Fact):

    __source__ = DatabaseSource.define(
        database="sales",
        query="SELECT * FROM sales_table"
    )

    __schedule__ = Schedule(repeats=timedelta(hours=1))

    date = DimensionKey('date', Date)
    time = DimensionKey('time', Time)
    store = DimensionKey('store', Store)
    manager = DimensionKey('manager', Manager)
    sales_amount = Metric('sales_amount', int)

It will update every hour.

There are three arguments you can pass into Schedule:

  • repeats
  • starts
  • ends
  • timezone

7.1.1. repeats

This is a timedelta objects which specifies how frequently the fact updates.

If starts is 3pm, and ends is 4pm, and repeats is 30 minutes, then the fact is scheduled to run at 3pm, 3.30pm and 4pm.

This schedule would look like:

__schedule__ = Schedule(repeats=timedelta(minutes=30), starts=time(hour=3),
                        ends=time(hour=4))

The smallest permissible repeats value is 10 minutes. It’s unlikely any fact will need to be updated more frequently than this.

7.2. Default schedule

If no Schedule is defined, the fact will just be scheduled to run at midnight every day.