| usda | ||
| .gitignore | ||
| cfg_foods.json | ||
| cfg_nutrients.json | ||
| fooddata_miner.py | ||
| LICENSE | ||
| output_nutrients.json | ||
| README.md | ||
Nutrition Facts for Every Edible Emoji
(A script for datamining USDA nutrition facts)
This repository hosts a python script for extracting select nutrition data from USDA's dataset, and a sample of its result which lists nutrition facts for most food emoji with a sensible estimation of their portion size. Where depictions of food items' size or identity differ between different emoji sets, we based our extraction on Twemoji illustrations.
If you just want the data, check out output_nutrients.json. I made this for a game project, so the food data has a few idiosyncrasies related to my work.
Contents:
fooddata_miner.py— a Python 3 script. Its command line is described below.cfg_nutrients.json— specifies a set of nutrients to be extracted — defining their output names, units (gram, milligram or microgram) and targeting them to FDC nutrient identifiers.cfg_foods.json— specifies a set of foods to be extracted, targeting them to "FDC ID" values which identify an entry in USDA's dataset.output_nutrients.json— the output of the script when run with the configuration files listed above and the full set of USDA nutrition data (some branded foods are referenced).usda/— folder for USDA nutrition data. The script can scan ZIP files so there is no need to extract them.- We've included snapshots of USDA's Foundation Foods, Survey Foods and Legacy Foods data as zip files in this repository. Branded foods comprise the remaining 90% of USDA food data and are not included for reaons of size and runtime.
Using the script fooddata_miner.py
python3 fooddata_miner.py [-f foods_config] [-n nutrients_config] usda_data_files...
Arguments:
-n nutrients_configrefers to a JSON file configuring nutrients in the format described below. Food configuration is required to run, but this argument defaults tocfg_nutrients.json.-f foods_configrefers to a JSON file configuring food items in the format described below. Food configuration is required to run, but this argument defaults tocfg_foods.json.usda_data_filesis a list of ZIP or JSON files containing USDA nutrition datasets.- Archives with
zipextension will be searched for files withjsonextension and these will be scanned. - Non-
zipfiles are assumed to be in JSON format and scanned. - All file extensions are case insensitive.
- Archives with
Example: ./fooddata_miner.py -f cfg_foods.json -n cfg_nutrients.json usda/*.zip
Expect the script to take anywhere from 10-30 seconds up to several minutes depending on your computer's single-core speed and whether you're using branded food data (which slows the process down by a factor of 10 or so).
Nutrient Configuration
The nutrient JSON file consists of a single object mapping nutrient keys to a numeric nutrient ID used in FDC's datasets. The nutrient ID is typically a four-digit number.
Nutrient keys must end in .g, .mg or .ug or .µg (these last two are interchangeable). Output data will use those units regardless of the units used in the dataset.
For example, the property "Na.mg": 1093 specifies that Sodium (here we use its element symbol Na) should be calculated in milligrams and is represented by ID 1093 in USDA's JSON files.
Food Item Configuration
The food item configuration file is somewhat tailored to my own uses. The file contains a JSON array of objects with the following properties:
id— the unique identifier of the food item, used as a key in the output file and displayed in the script's log as it works.portion.g— portion size in grams. Nutrition data will be scaled to this portion size by the script. This property is copied to the output data.- Any key from the Nutrient Configuration, such as
Na.mg— added to the item's nutrition after portion size scaling. This is a number in the units implied by the key. The units used here must match the units used in nutrient configuration. - any field prefixed with
.— this property is copied to the output data, with the prefix dot removed. This is useful for project-specific data.
Any other properties in a food items configuration entry will be ignored.