Querying the Database

More information about `mbpy extract from-db`

To query the database, mbpy needs the data imported first, which can be accomplished with the Importing commands.

Extracting

The extract command has a variety of means to load data, either from the API by querying it directly, or by querying the SQlite database. We can even read in CSV files.

mbpy extract --help

  Usage: mbpy extract [OPTIONS] COMMAND [ARGS]...

 Start ELT sequence.  See mbpy extract --help for extaction commands
 
╭─ Commands ────────────────────────────────────────────────────────────────────────╮
│ from-api           Pass a name like students, classes, academic_info_grades       │
│ from-csv           Define the path to csv file                                    │
│ from-db            Select entities (and associated entities), fields              │
│ from-db-connect    use raw sql                                                    │
│ from-excel         Define the path to csv file                                    │                                                       │
│ from-external-api  Call out to a list endpoint.  Default values work for OA V3    │
│                    students list endpoint. For OA, use --bearer and pass in the   │
│                    bearer via environment variable token after /oauth/token call. │
│                    FIXME: Add this here                                           │
│ from-pipeline      Use pipelines as an extraction tool for more sophisticated     │
│                    data management.                                               │
╰───────────────────────────────────────────────────────────────────────────────────╯

We'll use the from-db command to demonstrate how it works, but as you can see there are some other powerful features.

There is extensive help documentation included with mbpy, available here:

Let's find out how to use this command:

mbpy extract from-db --help

╭─ Additional Options ──────────────────────────────────────────────────────────────╮
│ *  --table   -t  (academic_terms|academic_years|classes|class_attendance_byterm|c │
│                  lass_attendance_bydate|class_timetable|hr_attendance_byterm|hr_a │
│                  ttendance_bydate|memberships|modified_since_dates|parents|parent │
│                  child_relationships|school|school_subjects|students|subject_grou │
│                  p|tasks|task_results|teachers|term_grades|year_groups)           │
│                  [required]                                                       │
│    --fields  -f  Fields of the entity to include                                  │
│                  (TEXT)                                                           │
│    --limit   -l  Define small number for faster dev                               │
│                  (INTEGER)                                                        │
│    --help        Show this message and exit.                                      │
╰───────────────────────────────────────────────────────────────────────────────────╯

We just have to pass it a --table option that includes the name of the entity we are seeking, in this case classes. Since we are starting to build complicated commands, let's use the \ so we can format using line breaks and spaces, making it easier to read:

mbpy \
  EXTRACT from-db \
          --table classes \
  INFO

                              Column info for dataframe
                        ╷         ╷                ╷
                        │ type    │ count non-null │ row 0
╶───────────────────────┼─────────┼────────────────┼────────────────────────────────╴
                     id │ int64   │ 366            │ 11933215
             subject_id │ float64 │ 365            │ 4509.0
            end_term_id │ int64   │ 366            │ 180452
          start_term_id │ int64   │ 366            │ 180451
           program_code │ object  │ 366            │ diploma
           grade_number │ int64   │ 366            │ 13         
       subject_group_id │ float64 │ 365            │ 1709.0
                                 ...
                subject │ object  │ 335            │ {'name': 'English', 'sl':
                        │         │                │ True, 'self_taught': False,
                        │         │                │ 'group_id': 1709, 'group':
                        │         │                │ 'Language Acquisition', 'hl':
                        │         │                │ True, 'program_code':
                        │         │                │ 'diploma', 'levels': ['ab
                        │         │                │ initio', 'B'], 'id': 4509}
         reporting_name │ object  │ 366            │ Happy English Class

Chaining

The above output is the default when plain target INFO is used. It selected all the classes that have been imported and converted it into a pandas dataframe. Since we haven't told it to do anything other than that, the default action of displaying info about the dataframe, such as the column names, types, and count of non-null, is shown. The values for the first selected item is shown in "row 0" column.

There's simply too much information, so let's limiting the columns we are displaying, and filtering for just diploma classes:

mbpy \
  EXTRACT from-db \
          --table classes \
          --field-mode select \
          --fields uniq_id,program_code,grade_number,subject \
          --limit 3 \
  PRINT \
    query \
          --where 'program_code == "diploma"' \

    ╷                     ╷                           ╷              ╷
    │ uniq_id             │ subject                   │ grade_number │ program_code
╶───┼─────────────────────┼───────────────────────────┼──────────────┼──────────────╴
  0 │ happy_english_class │ {'hl': True, 'name':      │ 13           │ diploma
    │                     │ 'English', 'group_id':    │              │
    │                     │ 1709, 'levels': ['ab      │              │
    │                     │ initio', 'B'],            │              │
    │                     │ 'program_code':           │              │
    │                     │ 'diploma', 'sl': True,    │              │
    │                     │ 'id': 4509,               │              │
    │                     │ 'self_taught': False,     │              │
    │                     │ 'group': 'Language        │              │
    │                     │ Acquisition'}             │              │
    ╵                     ╵                           ╵              ╵

Okay, what about querying to find all the high-level classes? The subject column has the key hl, but how do we access to query it?

This is where the power of tranformations comes in. Just like the query command after the chain command, mbpy has a slew of operations that can be performed on the query. There is a spread command that will make a new column for every key, and each column will be called subject.hl and subject.levels etc.

mbpy \
  EXTRACT from-db \
          --table classes \
          --fields uniq_id,program_code,grade_number,subject \
          --limit 3 \
  INFO \
    spread

                              Column info for dataframe
                           ╷          ╷                   ╷
                           │ type     │ count non-null    │ row 0
╶──────────────────────────┼──────────┼───────────────────┼─────────────────────────╴
                subject.id │ int64    │ 3                 │ 4509
       subject.self_taught │ object   │ 1                 │ False
                subject.hl │ bool     │ 3                 │ True
          subject.group_id │ int64    │ 3                 │ 1709
             subject.group │ object   │ 3                 │ Language Acquisition
            subject.levels │ object   │ 1                 │ ['ab initio', 'B']
      subject.program_code │ object   │ 3                 │ diploma
                subject.sl │ bool     │ 3                 │ True
              subject.name │ object   │ 3                 │ English
                   uniq_id │ object   │ 3                 │ happy_english_class
              program_code │ object   │ 3                 │ diploma
              grade_number │ int64    │ 3                 │ 13

So now we can filter for hl classes, and let's arrange the columns in the order we wish to be output, and limit them to just the ones we're interested in:

mbpy \                                                git:main*
  EXTRACT from-db \
          --table classes \
          --fields uniq_id,program_code,grade_number,subject \
          --limit 3 \
  PRINT \
      spread \
      query --where '`subject.hl` == True' \
      keep -c uniq_id -c subject.hl -c program_code -c grade_number \
      to-csv --path /tmp/output.csv

     ╷                          ╷               ╷                 ╷
     │ uniq_id                  │ subject.hl    │ program_code    │ grade_number
╶────┼──────────────────────────┼───────────────┼─────────────────┼─────────────────╴
   0 │ happy_english_class      │ True          │ diploma         │ 13
     ╵                          ╵               ╵                 ╵

Not only did we get this output in the terminal, but it saved to CSV as well.

To learn more about what kinds of commands are available, use the --help feature. The following shows the help files for every command issued on this page:

mbpy extract from-db --help
mbpy extract - chain --help
mbpy extract - chain query --help
mbpy extract - chain print --help
mbpy extract - chain spread --help
mbpy extract - chain keep --help
mbpy extract - to-csv keep --help

The next section will demonstrate how to save this query for easy execution.

pageQuerying via Script

If you are interested in learning more about the extract … chain … formula, skip to this section:

pageQuerying Commands

Last updated