Extracting

About Extracting

After importing, the data can be queried through the command line. For example:

mbpy EXTRACT from-users PRINT

That will produce screens that display the first 5, and last 5 student records, and every column in the student table for those records.

        id      role   oa_id sb_id account_uid
0    10752543    Admin  None  None     None     \
1    10752544    Admin  None  None     None
2    10752545  Advisor  None  None     None
3    10752546    Admin  None  None     None
4    10752547  Advisor  None  None     None      

Scroll down to see more columns for these records. (Notice the \ indicates there is wraparound.)

These commands can be used to just get data displayed, but it becomes interesting when paired with available commands to perform manipulations. For example, to find the distribution of users' domains, we could do the following:

mbpy \
    EXTRACT \
        from-users \
            --fields email \
    INFO \
        --value-count domain \
    extract-on \
        --column email \
        --pattern '^(?P<handle>.+)@(?P<domain>.+)$'
          domain  count    percent
0      eduvo.com    332  97.076023
1      endvo.com      3   0.877193
2      gmail.com      2   0.584795
3        icio.us      1   0.292398
4  mediafire.com      1   0.292398

This school is supposed to have only eduvo.com accounts. Looks like we have a few users with wrong email addresses! Let's find them:

mbpy \
    EXTRACT \
        from-users \
            --fields id,email,role,archived \
    PRINT \
    query \
            --where 'not email.str.endswith("@eduvo.com")'

Let's see what we got:

        id                 email                 role    archived
0    10752543           angelica@managebac.com    Admin     True
34   10752652        james.z.hayward@gmail.com  Student    False
50   10752674            support@managebac.com    Admin     True
51   10752675         henry.epelbaum@gmail.com  Student    False
145  10947491              sullivan.m@faria.co   Parent     True
166  10947512       joseph.mccoy@openapply.com   Parent    False
167  10947513       janice.mccoy@openapply.com   Parent    False
174  10947520         martin.hudson@openapp.ly   Parent     True
175  10947521           emma.hudson@openapp.ly   Parent     True
530  10947916         cbalducci0@mediafire.com  Student     True
531  10947917            nbutterick1@webmd.com  Student     True
...

Let's limit to only those that are active (not archived), and let's give ourselves a link we can click:

mbpy \
    EXTRACT \
        from-users \
            --fields email,profile \
            --where archived b= false \
    PRINT \
    query \
            --where 'not email.str.endswith("@eduvo.com")'

Command-click to load the url in the browser!

               email                                  profile
17     james.z.hayward@gmail.com  https://demo.managebac.com/teacher/users/10752652
26      henry.epelbaum@gmail.com  https://demo.managebac.com/teacher/users/10752675
58    joseph.mccoy@openapply.com  https://demo.managebac.com/teacher/users/10947512
59    janice.mccoy@openapply.com  https://demo.managebac.com/teacher/users/10947513
149               kmarc5@ovh.net  https://demo.managebac.com/teacher/users/10947922
150       angela.zheng@endvo.com  https://demo.managebac.com/teacher/users/10947923
151          peter.pan@endvo.com  https://demo.managebac.com/teacher/users/10947924
152          tom.clark@endvo.com  https://demo.managebac.com/teacher/users/10947925
153             janedv@eduvo.org  https://demo.managebac.com/teacher/users/10947926
199              dli@onatlas.com  https://demo.managebac.com/teacher/users/10947975
523       mlangford@haremi.co.uk  https://demo.managebac.com/teacher/users/10948377
524          morrisonjl@live.com  https://demo.managebac.com/teacher/users/10948386
525          choppy1@outlook.com  https://demo.managebac.com/teacher/users/10948389
530  courses@pamojaeducation.com  https://demo.managebac.com/teacher/users/10948406

The structure of these commands are the following:

mbpy <MODE> <extractor/streamer> <LOADER> [<chain> <chain> ...]
Command / subcommandExplanationComment

<MODE>

Either EXTRACT or STREAM

In extraction mode, it loads all the information into memory. In stream mode, it loads by chunks.

<extractor>

from-students, from-classes, etc.

mbpy EXTRACT --help to find full list

<streamer>

See "Streaming" section

mbpy STREAM --help to find full list

<LOADER>

print or pprint or csv

mbpy EXTRACT from-students --help to see full list of Loaders.

<chain>

Peform manipulations after extraction.

Optional.

To learn more about how to query with extractors, continue to Querying Commands:

pageQuerying Commands

Last updated