Syft
Categories:
Getting started
In order to test and develop in the Syft repo you will need the following dependencies installed:
- Golang
- Docker
- Python (>= 3.9)
- make
Initial setup
Run once after cloning to install development tools:
make bootstrap
Make sure you’ve updated your docker settings so the default docker socket path is available.
Go to
docker → settings → advancedand ensure “Allow the default Docker socket to be used” is checked.Use the default docker context, run:
docker context use default
Useful commands
Common commands for ongoing development:
make help- List all available commandsmake lint- Check code formatting and lintingmake lint-fix- Auto-fix formatting issuesmake unit- Run unit testsmake integration- Run integration testsmake cli- Run CLI testsmake snapshot- Build release snapshot with all binaries and packages
Testing
Levels of testing
unit(make unit): The default level of test which is distributed throughout the repo are unit tests. Any_test.gofile that does not reside somewhere within the/testdirectory is a unit test. Other forms of testing should be organized in the/testdirectory. These tests should focus on the correctness of functionality in depth. % test coverage metrics only considers unit tests and no other forms of testing.integration(make integration): located withincmd/syft/internal/test/integration, these tests focus on the behavior surfaced by the common library entrypoints from thesyftpackage and make light assertions about the results surfaced. Additionally, these tests tend to make diversity assertions for enum-like objects, ensuring that as enum values are added to a definition that integration tests will automatically fail if no test attempts to use that enum value. For more details see the “Data diversity and freshness assertions” section below.cli(make cli): located with intest/cli, these are tests that test the correctness of application behavior from a snapshot build. This should be used in cases where a unit or integration test will not do or if you are looking for in-depth testing of code in thecmd/package (such as testing the proper behavior of application configuration, CLI switches, and glue code before syft library calls).acceptance(make install-test): located withintest/compareandtest/install, these are smoke-like tests that ensure that application packaging and installation works as expected. For example, during release we provide RPM packages as a download artifact. We also have an accompanying RPM acceptance test that installs the RPM from a snapshot build and ensures the output of a syft invocation matches canned expected output. New acceptance tests should be added for each release artifact and architecture supported (when possible).
Data diversity and freshness assertions
It is important that tests against the codebase are flexible enough to begin failing when they do not cover “enough” of the objects under test. “Cover” in this case does not mean that some percentage of the code has been executed during testing, but instead that there is enough diversity of data input reflected in testing relative to the definitions available.
For instance, consider an enum-like value like so:
type Language string
const (
Java Language = "java"
JavaScript Language = "javascript"
Python Language = "python"
Ruby Language = "ruby"
Go Language = "go"
)
Say we have a test that exercises all the languages defined today:
func TestCatalogPackages(t *testing.T) {
testTable := []struct {
// ... the set of test cases that test all languages
}
for _, test := range cases {
t.Run(test.name, func (t *testing.T) {
// use inputFixturePath and assert that syft.CatalogPackages() returns the set of expected Package objects
// ...
})
}
}
Where each test case has a inputFixturePath that would result with packages from each language. This test is
brittle since it does not assert that all languages were exercised directly and future modifications (such as
adding a new language) won’t be covered by any test cases.
To address this, the enum-like object should have a definition of all objects that can be used in testing:
type Language string
// const( Java Language = ..., ... )
var AllLanguages = []Language{
Java,
JavaScript,
Python,
Ruby,
Go,
Rust,
}
Allowing testing to automatically fail when adding a new language:
func TestCatalogPackages(t *testing.T) {
testTable := []struct {
// ... the set of test cases that (hopefully) covers all languages
}
// new stuff...
observedLanguages := strset.New()
for _, test := range cases {
t.Run(test.name, func (t *testing.T) {
// use inputFixturePath and assert that syft.CatalogPackages() returns the set of expected Package objects
// ...
// new stuff...
for _, actualPkg := range actual {
observedLanguages.Add(string(actualPkg.Language))
}
})
}
// new stuff...
for _, expectedLanguage := range pkg.AllLanguages {
if !observedLanguages.Contains(expectedLanguage) {
t.Errorf("failed to test language=%q", expectedLanguage)
}
}
}
This is a better test since it will fail when someone adds a new language but fails to write a test case that should exercise that new language. This method is ideal for integration-level testing, where testing correctness in depth is not needed (that is what unit tests are for) but instead testing in breadth to ensure that units are well integrated.
A similar case can be made for data freshness; if the quality of the results will be diminished if the input data is not kept up to date then a test should be written (when possible) to assert any input data is not stale.
An example of this is the static list of licenses that is stored in internal/spdxlicense for use by the SPDX
presenters. This list is updated and published periodically by an external group and syft can grab and update this
list by running go generate ./... from the root of the repo.
An integration test has been written to grabs the latest license list version externally and compares that version with the version generated in the codebase. If they differ, the test fails, indicating to someone that there is an action needed to update it.
Key Takeaway
Try and write tests that fail when data assumptions change and not just when code changes.Snapshot tests
The format objects make a lot of use of “snapshot” testing, where you save the expected output bytes from a call into the
git repository and during testing make a comparison of the actual bytes from the subject under test with the golden
copy saved in the repo. The “golden” files are stored in the test-fixtures/snapshot directory relative to the go
package under test and should always be updated by invoking go test on the specific test file with a specific CLI
update flag provided.
Many of the Format tests make use of this approach, where the raw SBOM report is saved in the repo and the test
compares that SBOM with what is generated from the latest presenter code. The following command can be used to
update the golden files for the various snapshot tests:
make update-format-golden-files
These flags are defined at the top of the test files that have tests that use the snapshot files.
Snapshot testing is only as good as the manual verification of the golden snapshot file saved to the repo! Be careful and diligent when updating these files.
Test fixtures
Syft uses a sophisticated test fixture caching system to speed up test execution. Test fixtures include pre-built test images, language-specific package manifests, and other test data. Rather than rebuilding fixtures on every checkout, Syft can download a pre-built cache from GitHub Container Registry.
Common fixture commands:
make fixtures- Intelligently download or rebuild fixtures as neededmake build-fixtures- Manually build all fixtures from scratchmake clean-cache- Remove all cached test fixturesmake check-docker-cache- Verify docker cache size is within limits
When to use each command:
- First time setup: Run
make fixturesafter cloning the repository. This will download the latest fixture cache. - Tests failing unexpectedly: Try
make clean-cachefollowed bymake fixturesto ensure you have fresh fixtures. - Working offline: Set
DOWNLOAD_TEST_FIXTURE_CACHE=falseand runmake build-fixturesto build fixtures locally without downloading. - Modifying test fixtures: After changing fixture source files, run
make build-fixturesto rebuild affected fixtures.
The fixture system tracks input fingerprints and only rebuilds fixtures when their source files change. This makes the development cycle faster while ensuring tests always run against the correct fixture data.
Code generation
Syft generates several types of code and data files that need to be kept in sync with external sources or internal structures:
What gets generated:
- JSON Schema - Generated from Go structs to define the Syft JSON output format
- SPDX License List - Up-to-date list of license identifiers from the SPDX project
- CPE Dictionary Index - Index of Common Platform Enumeration identifiers for vulnerability matching
When to regenerate:
Run code generation after:
- Modifying the
pkg.Packagestruct or related types (requires JSON schema regeneration) - SPDX releases a new license list
- CPE dictionary updates are available
Generation commands:
make generate- Run all generation tasksmake generate-json-schema- Generate JSON schema from Go typesmake generate-license-list- Download and generate latest SPDX license listmake generate-cpe-dictionary-index- Generate CPE dictionary index
After running generation commands, review the changes carefully and commit them as part of your pull request. The CI pipeline will verify that generated files are up to date.
Adding a new cataloger
Catalogers must fulfill the pkg.Cataloger interface in order to add packages to the SBOM.
All catalogers are registered as tasks in Syft’s task-based cataloging system:
- Add your cataloger to
DefaultPackageTaskFactories()usingnewSimplePackageTaskFactoryornewPackageTaskFactory - Tag the task appropriately to indicate when it should run:
pkgcataloging.InstalledTag- for packages positively installedpkgcataloging.DeclaredTag- for packages described in manifests (places where we intend to install software, but does not describe installed software)pkgcataloging.ImageTag- should run when scanning container imagespkgcataloging.DirectoryTag- should run when scanning directories/filesystemspkgcataloging.LanguageTag- for language-specific packagespkgcataloging.OSTag- for OS-specific packages- Ecosystem tags like
"java","python","alpine", etc.
- If your cataloger needs configuration, add it to
pkgcataloging.Config
The task system orchestrates all catalogers through CreateSBOMConfig,
which manages task execution, parallelism, and configuration.
generic.NewCataloger is an abstraction syft used to make writing common components easier (see the alpine cataloger for example usage).
It takes the following information as input:
- A
catalogerNameto identify the cataloger uniquely among all other catalogers. - Pairs of file globs as well as parser functions to parse those files.
These parser functions return a slice of
pkg.Packageas well as a slice ofartifact.Relationshipto describe how the returned packages are related. See this the alpine cataloger parser function as an example.
Identified packages share a common pkg.Package struct so be sure that when the new cataloger is constructing a new package it is using the Package struct.
If you want to return more information than what is available on the pkg.Package struct then you can do so in the pkg.Package.Metadata field, which accepts any type.
Metadata types tend to be unique for each pkg.Type but this is not required.
See the pkg package for examples of the different metadata types that are supported today.
When encoding to JSON, metadata type names are determined by reflection and mapped according to internal/packagemetadata/names.go.
Finally, here is an example of where the package construction is done within the alpine cataloger:
Interested in building a new cataloger?
Checkout the list of issues with the new-cataloger label!
If you have questions about implementing a cataloger, feel free to file an issue or reach out to us on discourse!
Troubleshooting
Cannot build test fixtures with Artifactory repositories
Some companies have Artifactory setup internally as a solution for sourcing secure dependencies. If you’re seeing an issue where the unit tests won’t run because of the below error then this section might be relevant for your use case.
[ERROR] [ERROR] Some problems were encountered while processing the POMs
If you’re dealing with an issue where the unit tests will not pull/build certain java fixtures check some of these settings:
- a
settings.xmlfile should be available to help you communicate with your internal artifactory deployment - this can be moved to
syft/pkg/cataloger/java/test-fixtures/java-builds/example-jenkins-plugin/to help build the unit test-fixtures - you’ll also want to modify the
build-example-jenkins-plugin.shto usesettings.xml
For more information on this setup and troubleshooting see issue 1895
Next Steps
Understanding the Codebase
- Architecture - Learn about package structure, core library flow, cataloger design patterns, and file searching
- API Reference - Explore the public Go API, type definitions, and function signatures
Contributing Your Work
- Pull Requests - Guidelines for submitting PRs and working with reviewers
- Issues and Discussions - Where to get help and report issues
Finding Work
- New Cataloger Issues - Great first contributions for adding ecosystem support
- Good First Issues - Beginner-friendly issues
Getting Help
- Anchore Discourse - Community discussions and questions