Data wrangling
Be prepared to embrace the chaos.
Prepare the camera trap data
The directory layout
Your data must be organized following the file tree below.
There will also be a helper function provided by pusilla, which helps you check the directories. (#咕)
Notice that you can have subdirectories under deployments, which will be
automatically merged with the parent deployment later.
(e.g. /sub_dir/filename_1 will be /sub_dir-filename_1 after serval align
)
|- project_name
| |- collection_name_1
| |- deployment_name_1
| |- sub_dir
| |- filename_1
| |- filename_1
| |- filename_2
| |- filename_3
| |- deployment_name_2
| |- filename_1
| |- filename_2
| |- filename_3
| |- ...
| |- trap_info.csv
| |- collection_name_2
| |- deployment_name_1
| |- ...
| |- ...
| |- trap_info.csv
| |- ...
File naming convention
It's not mandatory, but we recommend following the file naming convention for naming projects, collections and deployments.
Example of the directory layout in practice
|- Yunta
| |- 202009-202012
| |- 4001
| |- IMAG0001.JPG
| |- IMAG0002.JPG
| |- IMAG0003.JPG
| |- YT_TQL
| |- IMAA0001.JPG
| |- IMAA0002.JPG
| |- ...
| |- trap_info.csv
| |- 202101-202004
| |- 4001
| |- ...
| |- ...
| |- trap_info.csv
| |- ...
trap info
The trap_info.csv
file should contain #TODO. You can use trap_info.csv as a template.
You can also use pusilla to generate the trap_info template (#咕), where default values are provided to save manual labor.
Data preprocessing
Rename deployments
Use serval rename
to rename deployments (deployment_name to deployment_id), check deployment section in data wrangling for details.
Tip
Use the --dryrun
option first to preview the upcoming changes,
> serval rename /mnt/data/QiLian --dryrun
Will rename A120 to A120_202305-202312
Will rename A007 to A007_202305-202312
Will rename A70 to A70_202305-202312
...
Will rename A114 to A114_202305-202312
Will rename 0018 to 0018_202305-202312
Will rename 0008 to 0008_202305-202312
Total directories: 41
--dryrun
option in the command.
Generate deployments table
The deployments table can be generated by combining trap_info files using scripts provided by pusilla.
Align and rename resources
Afterward, you'll be ready to continue the workflow, with a directory that appears as follows:
|- project_name
| |- collection_name_1
| |- deployment_id_1
| |- sub_dir-filename_a
| |- filename_a
| |- filename_b
| |- filename_c
| |- deployment_id_2
| |- filename_d
| |- filename_e
| |- filename_f
| |- ...
| |- collection_name_2
| |- deployment_id_3
| |- ...
| |- ...
| |- ...
|- deployments.csv
Play around with tags
Retrieve tags from resources (serval observe)
You can use serval observe
to retrieve tags (i.e. individual and species) from all resources (media or XMP sidecars) in given path, here are a few options to control its behaviour.
--output, -o
The --output
(or -o
) option determines output directory, which will be created if not exist.
--parallel, -p
Serval runs in serial mode by default as the performance bottleneck is on I/O, you can enable the parallel mode by using this option.
Feel safe to try it and see if it improves the performance.
--xmp
This option tells serval only to read XMP files (.xmp) to find the tags.
--independent, -i
By using this, a temporal independence analysis will be performed right after the retrieval, so you don't have to run a serval capture
again.
Example
command above will recursively find images in/mnt/data/diging
and retrieve tags (in digikam taglist) from them. The output file (e.g. tags.csv)
will be saved to /mnt/data/diqing/tag-info
.
serval will instead retrieve tags from all xmp files, and start the temporal independent procedure afterwards.
Temporal independence analysis (serval capture)
The temporal independence analysis is performed on a csv file that contains time, deployment and species information (e.g. tags.csv generated by serval observe
).
Run:
serval capture --output <OUTPUT_DIR> <CSV_PATH>
and serval will walk you through it:
Example
> serval capture ./test/tags_merge.csv --output ./test/demo
Input the Minimum Time Difference (when considering records as independent) in minutes (e.g. 30): 30
The Minimum Time Difference should be compared with?
1) Last independent record 2) Last record
Enter a selection (e.g. 1): 1
Perform analysis on:
1) species 2) individual
Enter a selection: 1
Here is a sample of the file path (/mnt/data/DiQing/202303-202308/6001/animal/Ere 0029.JPG):
1): /mnt/data/DiQing/202303-202308/6001/animal
2): /mnt/data/DiQing/202303-202308/6001
3): /mnt/data/DiQing/202303-202308
4): /mnt/data/DiQing
5): /mnt/data
6): /mnt
7): /
Select the path corresponding to the deployment: 2
shape: (2_546, 4)
[snip]
Saved to ./test/demo/Species_temporal_independent.csv
shape: (579, 3)
[snip]
Saved to ./test/demo/Species_temporal_independent_count.csv
Extract resources by tag (serval extract)
The serval extract
command can copy resources containing target species (according to tags.csv) while still maintain the directory layout
(which may hold information about deployment).
Run:
serval extract --species <SPECIES> --output <OUTPUT_DIR> <CSV_PATH>
and follow the interactive setup process to choose your desired output layout.
Example
it will find all resources tagged with "Snow leopard" in./test/tags_demo.csv
and copy the corresponding file to ./test/snow_leopard
,
with a directory layout you choose later.
Note
If the tags.csv is generated based on XMP files, instead of copying the original xmp file (e.g. IMG0001.jpg.xmp),
serval extract
will automatically find the corresponding media file (e.g. IMG0001.jpg) and do the copy.