Extracting

About Extracting

After importing, the data can be queried through the command line. For example:

mbpy EXTRACT from-users PRINT

That will produce screens that display the first 5, and last 5 student records, and every column in the student table for those records.

        id      role   oa_id sb_id account_uid
0    10752543    Admin  None  None     None     \
1    10752544    Admin  None  None     None
2    10752545  Advisor  None  None     None
3    10752546    Admin  None  None     None
4    10752547  Advisor  None  None     None

Scroll down to see more columns for these records. (Notice the \ indicates there is wraparound.)

These commands can be used to just get data displayed, but it becomes interesting when paired with available commands to perform manipulations. For example, to find the distribution of users' domains, we could do the following:

mbpy \
    EXTRACT \
        from-users \
            --fields email \
    INFO \
        --value-count domain \
    extract-on \
        --column email \
        --pattern '^(?P<handle>.+)@(?P<domain>.+)$'

          domain  count    percent
0      eduvo.com    332  97.076023
1      endvo.com      3   0.877193
2      gmail.com      2   0.584795
3        icio.us      1   0.292398
4  mediafire.com      1   0.292398

This school is supposed to have only eduvo.com accounts. Looks like we have a few users with wrong email addresses! Let's find them:

mbpy \
    EXTRACT \
        from-users \
            --fields id,email,role,archived \
    PRINT \
    query \
            --where 'not email.str.endswith("@eduvo.com")'

Let's see what we got:

        id                 email                 role    archived
0    10752543           [email protected]    Admin     True
34   10752652        [email protected]  Student    False
50   10752674            [email protected]    Admin     True
51   10752675         [email protected]  Student    False
145  10947491              [email protected]   Parent     True
166  10947512       [email protected]   Parent    False
167  10947513       [email protected]   Parent    False
174  10947520         [email protected]   Parent     True
175  10947521           [email protected]   Parent     True
530  10947916         [email protected]  Student     True
531  10947917            [email protected]  Student     True
...

Let's limit to only those that are active (not archived), and let's give ourselves a link we can click:

mbpy \
    EXTRACT \
        from-users \
            --fields email,profile \
            --where archived b= false \
    PRINT \
    query \
            --where 'not email.str.endswith("@eduvo.com")'

Command-click to load the url in the browser!

               email                                  profile
17     [email protected]  https://demo.managebac.com/teacher/users/10752652
26      [email protected]  https://demo.managebac.com/teacher/users/10752675
58    [email protected]  https://demo.managebac.com/teacher/users/10947512
59    [email protected]  https://demo.managebac.com/teacher/users/10947513
149               [email protected]  https://demo.managebac.com/teacher/users/10947922
150       [email protected]  https://demo.managebac.com/teacher/users/10947923
151          [email protected]  https://demo.managebac.com/teacher/users/10947924
152          [email protected]  https://demo.managebac.com/teacher/users/10947925
153             [email protected]  https://demo.managebac.com/teacher/users/10947926
199              [email protected]  https://demo.managebac.com/teacher/users/10947975
523       [email protected]  https://demo.managebac.com/teacher/users/10948377
524          [email protected]  https://demo.managebac.com/teacher/users/10948386
525          [email protected]  https://demo.managebac.com/teacher/users/10948389
530  [email protected]  https://demo.managebac.com/teacher/users/10948406

The structure of these commands are the following:

mbpy <MODE> <extractor/streamer> <LOADER> [<chain> <chain> ...]

Command / subcommand

Explanation

Comment

<MODE>

Either EXTRACT or STREAM

In extraction mode, it loads all the information into memory. In stream mode, it loads by chunks.

from-students, from-classes, etc.

mbpy EXTRACT --help to find full list

See "Streaming" section

mbpy STREAM --help to find full list

print or pprint or csv

mbpy EXTRACT from-students --help to see full list of Loaders.

<chain>

Peform manipulations after extraction.

Optional.

To learn more about how to query with extractors, continue to Querying Commands:

Querying Commands

PreviousImporting NextStreaming

Last updated 2 years ago

Was this helpful?