Skip to content

Data wrangling

Be prepared to embrace the chaos.

Prepare the camera trap data

The directory layout

Your data must be organized following the file tree below.

There will also be a helper function provided by pusilla, which helps you check the directories. (#咕)

Notice that you can have subdirectories under deployments, which will be automatically merged with the parent deployment later. (e.g. /sub_dir/filename_1 will be /sub_dir-filename_1 after serval align)

|- project_name
|   |- collection_name_1
|       |- deployment_name_1
|           |- sub_dir   
|               |- filename_1 
|           |- filename_1
|           |- filename_2
|           |- filename_3
|       |- deployment_name_2
|           |- filename_1
|           |- filename_2
|           |- filename_3
|       |- ...
|       |- trap_info.csv
|   |- collection_name_2
|       |- deployment_name_1
|           |- ...
|       |- ...
|       |- trap_info.csv
|   |- ...

File naming convention

It's not mandatory, but we recommend following the file naming convention for naming projects, collections and deployments.

Example of the directory layout in practice

|- Yunta
|   |- 202009-202012
|       |- 4001
|           |- IMAG0001.JPG
|           |- IMAG0002.JPG
|           |- IMAG0003.JPG
|       |- YT_TQL
|           |- IMAA0001.JPG
|           |- IMAA0002.JPG
|       |- ...
|       |- trap_info.csv
|   |- 202101-202004
|       |- 4001
|           |- ...
|       |- ...
|       |- trap_info.csv
|   |- ...

trap info

The trap_info.csv file should contain #TODO. You can use trap_info.csv as a template.

You can also use pusilla to generate the trap_info template (#咕), where default values are provided to save manual labor.

Data preprocessing

Rename deployments

Use serval rename to rename deployments (deployment_name to deployment_id), check deployment section in data wrangling for details.

Tip

Use the --dryrun option first to preview the upcoming changes,

> serval rename /mnt/data/QiLian --dryrun 
Will rename A120 to A120_202305-202312
Will rename A007 to A007_202305-202312
Will rename A70 to A70_202305-202312
...
Will rename A114 to A114_202305-202312
Will rename 0018 to 0018_202305-202312
Will rename 0008 to 0008_202305-202312
Total directories: 41
Then you can confirm the rename operation by removing --dryrun option in the command.

Generate deployments table

The deployments table can be generated by combining trap_info files using scripts provided by pusilla.

Align and rename resources

Afterward, you'll be ready to continue the workflow, with a directory that appears as follows:

|- project_name
|   |- collection_name_1
|       |- deployment_id_1
|           |- sub_dir-filename_a
|           |- filename_a
|           |- filename_b
|           |- filename_c
|       |- deployment_id_2
|           |- filename_d
|           |- filename_e
|           |- filename_f
|       |- ...
|   |- collection_name_2
|       |- deployment_id_3
|           |- ...
|       |- ...
|   |- ...
|- deployments.csv

Play around with tags

Retrieve tags from resources (serval observe)

You can use serval observe to retrieve tags (i.e. individual and species) from all resources (media or XMP sidecars) in given path, here are a few options to control its behaviour.

--output, -o
The --output (or -o) option determines output directory, which will be created if not exist.
--parallel, -p
Serval runs in serial mode by default as the performance bottleneck is on I/O, you can enable the parallel mode by using this option. Feel safe to try it and see if it improves the performance.
--xmp
This option tells serval only to read XMP files (.xmp) to find the tags.
--independent, -i
By using this, a temporal independence analysis will be performed right after the retrieval, so you don't have to run a serval capture again.

Example

serval observe /mnt/data/diqing --output /mnt/data/diqing/tag-info
command above will recursively find images in /mnt/data/diging and retrieve tags (in digikam taglist) from them. The output file (e.g. tags.csv) will be saved to /mnt/data/diqing/tag-info.
serval observe --xmp --independent /mnt/data/diqing --output /mnt/data/diqing/tag-info 
serval will instead retrieve tags from all xmp files, and start the temporal independent procedure afterwards.

Temporal independence analysis (serval capture)

The temporal independence analysis is performed on a csv file that contains time, deployment and species information (e.g. tags.csv generated by serval observe).

Run:
serval capture --output <OUTPUT_DIR> <CSV_PATH>
and serval will walk you through it:

Example

> serval capture ./test/tags_merge.csv --output ./test/demo
Input the Minimum Time Difference (when considering records as independent) in minutes (e.g. 30): 30  

The Minimum Time Difference should be compared with?
1) Last independent record 2) Last record
Enter a selection (e.g. 1): 1  

Perform analysis on:
1) species 2) individual
Enter a selection: 1  

Here is a sample of the file path (/mnt/data/DiQing/202303-202308/6001/animal/Ere 0029.JPG): 
1): /mnt/data/DiQing/202303-202308/6001/animal
2): /mnt/data/DiQing/202303-202308/6001
3): /mnt/data/DiQing/202303-202308
4): /mnt/data/DiQing
5): /mnt/data
6): /mnt
7): /
Select the path corresponding to the deployment: 2
shape: (2_546, 4)
[snip]
Saved to ./test/demo/Species_temporal_independent.csv
shape: (579, 3)
[snip]
Saved to ./test/demo/Species_temporal_independent_count.csv

Extract resources by tag (serval extract)

The serval extract command can copy resources containing target species (according to tags.csv) while still maintain the directory layout (which may hold information about deployment).

Run:
serval extract --species <SPECIES> --output <OUTPUT_DIR> <CSV_PATH>
and follow the interactive setup process to choose your desired output layout.

Example

serval extract ./test/tags_demo.csv --output ./test/snow_leopard --species "Snow leopard"
it will find all resources tagged with "Snow leopard" in ./test/tags_demo.csv and copy the corresponding file to ./test/snow_leopard, with a directory layout you choose later.

Note

If the tags.csv is generated based on XMP files, instead of copying the original xmp file (e.g. IMG0001.jpg.xmp), serval extract will automatically find the corresponding media file (e.g. IMG0001.jpg) and do the copy.

Prepare identifications for trapper

Patch tags

Generate observations table