pax_global_header 0000666 0000000 0000000 00000000064 14766323153 0014524 g ustar 00root root 0000000 0000000 52 comment=e6c2c2fd06d44240baa225cfd73a9116e9ea0027
ciftools-java-ciftools-java-7.0.1/ 0000775 0000000 0000000 00000000000 14766323153 0017031 5 ustar 00root root 0000000 0000000 ciftools-java-ciftools-java-7.0.1/.github/ 0000775 0000000 0000000 00000000000 14766323153 0020371 5 ustar 00root root 0000000 0000000 ciftools-java-ciftools-java-7.0.1/.github/workflows/ 0000775 0000000 0000000 00000000000 14766323153 0022426 5 ustar 00root root 0000000 0000000 ciftools-java-ciftools-java-7.0.1/.github/workflows/build.yml 0000664 0000000 0000000 00000002340 14766323153 0024247 0 ustar 00root root 0000000 0000000 name: SonarCloud
on:
push:
branches:
- master
pull_request:
types: [opened, synchronize, reopened]
jobs:
build:
name: Build and analyze
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0 # Shallow clones should be disabled for a better relevancy of analysis
- name: Set up JDK 17
uses: actions/setup-java@v3
with:
java-version: 17
distribution: 'zulu' # Alternative distribution options are available.
- name: Cache SonarCloud packages
uses: actions/cache@v3
with:
path: ~/.sonar/cache
key: ${{ runner.os }}-sonar
restore-keys: ${{ runner.os }}-sonar
- name: Cache Maven packages
uses: actions/cache@v3
with:
path: ~/.m2
key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-m2
- name: Build and analyze
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # Needed to get PR information, if any
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
run: mvn -B verify org.sonarsource.scanner.maven:sonar-maven-plugin:sonar -Dsonar.projectKey=rcsb_ciftools-java ciftools-java-ciftools-java-7.0.1/.gitignore 0000664 0000000 0000000 00000000232 14766323153 0021016 0 ustar 00root root 0000000 0000000 .idea/
target/
*.iml
.DS_Store
site/
/bin/
.classpath
.project
.metadata
tmp/
*.tmp
*.bak
*.swp
*~.nib
local.properties
.settings/
.loadpath
.recommenders ciftools-java-ciftools-java-7.0.1/CHANGELOG.md 0000664 0000000 0000000 00000022665 14766323153 0020655 0 ustar 00root root 0000000 0000000 CIFTools Changelog
=============
This project uses semantic versioning. Furthermore, this project provides code that was generated from schemata. Any schema change that introduces a breaking change in the generated code is considered as breaking for the whole project. Additional information is provided below when this occurs (named `Breaking schema changes`). Most of these occur in experimental categories and are unlikely to affect your code. `Breaking API changes` will be avoided starting with version 1.0.0.
ciftools-java 7.0.1 - March 2025
-------------
* cache results of `DelegatingColumn#getArray()` to avoid performance penalty if schema type and actual data type differ (fixes #13)
ciftools-java 7.0.0 - March 2025
-------------
### Breaking schema changes
* cif-core:
* removal of `Atom`, `CifCore`, `Diffraction`, `DiffrnOrient`, `Display`, `Model`, `Publication`, `Structure`, and `Valence` categories
### General
* dependency and schema updates
ciftools-java 6.0.0 - March 2024
-------------
### Breaking schema changes
* cif-core:
* renaming of `atom_site` tensors
* `atom_type_scat_versus_stol_list` from float to String
* drops `citation_author_key`
* drops `citation_editor_id`
* `journal_index_id` from int to String
* `refln_f_complex_su` from float to String
* mmCIF/ihm-extension:
* drops `ihm_entry_collection_mapping.id`
ciftools-java 5.0.2 - October 2023
-------------
### Bug fixes
* treat numbers that exceed `Integer.MAX_VALUE` as String
ciftools-java 5.0.1 - May 2023
-------------
### Bug fixes
* harden detection of scientific notation in number type logic
ciftools-java 5.0.0 - January 2023
-------------
### Breaking schema changes
* cif-core:
* dropped `diffrn_standard` (duplicate of `diffrn_standards`) and renaming/retyping of several diffraction-related categories
ciftools-java 4.0.5 - January 2023
-------------
### Bug fixes
* fix text writing when non-English number formats are used on the platform
ciftools-java 4.0.4 - November 2022
-------------
### Bug fixes
* fix test failures on Java 17 (subtle gzip differences, #12)
ciftools-java 4.0.3 - October 2022
-------------
### General
* schema update (mainly description in EM sub-schema)
ciftools-java 4.0.2 - June 2022
-------------
### Bug fixes
* write `null` instead of empty map if all values are present and no mask is needed - otherwise other software might refuse to load files written by ciftools-java
ciftools-java 4.0.1 - June 2022
-------------
### Bug fixes
* fix encoding classification when converting text to binary without schema
ciftools-java 4.0.0 - May 2022
-------------
### Bug fixes
* update gson dependency to 2.8.9
### Breaking schema changes
* mmCIF/modelCIF:
* `ma_protocol_step.method_type_other_details` -> `ma_protocol_step.details`
* cif-core:
* case changes for many column names, this affects Java access methods unless explicitly aliased by the dictionary
* changes to handling of value ranges and standard uncertainty values (e.g. for melting points & temperature values in `chemical` category)
* `citation_journal_issue` changed from int to String type
* `citation_year` changes from String to int type
ciftools-java 3.0.1 - November 2021
-------------
### Bug fixes
* proper handling of strings such as: `''cytochrome P450`
ciftools-java 3.0.0 - September 2021
-------------
### New features
* add support for the CIF model extension (https://raw.githubusercontent.com/ihmwg/MA-dictionary/master/mmcif_ma.dic), relevant for AlphaFold models and other predicted structures
### Bug fixes
* names in cifcore implementation now follow spec and are case-insensitive
### Breaking schema changes
* mmCIF:
* `em_focused_ion_beam.duration` changed from int to float type
* `em_map.symmetry_space_group` changed from String to int type
* `pdbx_struct_ncs_virus_gen.oper_id` changed from String to int type
* `struct_ncs_ens_gen.oper_id` changed from String to int type
* `struct_ncs_oper.id` changed from String to int type
* cif-core:
* case changes for many column names, this affects Java access methods unless explicitly aliased by the dictionary
* `atom_type_scat_versus_stol_list` changed from String to float type
* `model_site_adp_eigen_system` changed from String to `model_site_adp_eigenvalues` and `model_site_adp_eigenvectors` of float type
ciftools-java 2.0.2
-------------
### General
* expose #getColumnNames for categories
* minimized overhead by schema validation that implicitly happens when files are requested in a certain schema
(previously validation would trigger decoding of all columns)
ciftools-java 2.0.1
-------------
### Bug fixes
* overflow could result in allocation of arrays with negative size
ciftools-java 2.0.0
-------------
### Bug fixes
* avoid enigmatic NullPointerException for #values() of empty columns - now returned Stream will be empty
### Breaking schema changes
* mmCIF: changes to IHM, EM, and branched entities (see https://github.com/rcsb/ciftools-java/commit/caf1bd678dc89d73291e344e2c8ec999735ffc87)
ciftools-java 1.0.0
-------------
### General
* stable release that targets Java 11
ciftools-java 0.10.1
-------------
### New features
* reintroduce Java 8 support
ciftools-java 0.10.0
-------------
### New features
* schema now validates that it is compatible to the provided `CifFile` instance
### Breaking API changes
* added `SchemaProvider#validate(CifFile)` that allows providers to set up hooks for validation
* introduces custom exceptions
* accessing an empty column throws `EmptyColumnException`
* trying to apply an incorrect schema to a file throws `SchemaMismatchException`
ciftools-java 0.9.1
-------------
### Bug fixes
* adds missing cifcore categories/columns
ciftools-java 0.9.0
-------------
### New features
* access to (primitive) data array for all columns
### Breaking API changes
* renames #getBinaryDataUnsafe to #getArray
ciftools-java 0.8.0
-------------
### New features
* adds support for arbitrary schemata
* clean mmCIF support
* core-CIF support for CCDC files
* schema support also during CifFile building
### Breaking API changes
* not compatible with java 8 anymore
* detaches CIF model from any schema - type-safe access now requires to specify SchemaProvider
* several package and class names changed
ciftools-java 0.7.1
-------------
### New features
* adds experimental support for CCDC files
ciftools-java 0.7.0
-------------
### New features
* support for case insensitive handling of category and column names
* `ProxyCategory` to delay class lookup for as long as possible
* generic parsing option (`new CifOptions.CifOptionsBuilder().generic(true).build()`) that completely bypasses the
schema
* employs lazy loading of the class map used to instantiate categories and columns
### Breaking API changes
* internal: use `Deque` to handle encoding chain - make @cleberecht proud
* removes exposure of `LinkedHashMap`
* removes UTF-8 support, CIF is assumed to be plain ASCII
### Bug fixes
* updates fetch URL to RCSB
ciftools-java 0.6.3
-------------
### General
* change BinaryCIF URL to RCSB resources
ciftools-java 0.6.2
-------------
### Bug fixes
* avoids construction of `Gson` instance in `CifOptions` - thanks @BobHanson
ciftools-java 0.6.1
-------------
### Bug fixes
* file format specification during reading is now honored correctly
ciftools-java 0.6.0
-------------
### New features
* binaryCIF reading is now no-copy (i.e. the `InputStream` is directly consumed by readers/decoders rather than copied
into a `byte[]`)
### Breaking API changes
* changes (internal) reader classes to work on an `InputStream` rather than on `byte[]`
* removes single-row encoding capabilities (performance was same but code complexity increased)
ciftools-java 0.5.4
-------------
### Bug fixes
* writing of text CIF is now thread-safe
ciftools-java 0.5.3
-------------
### General
* moving to Java 11 for development - build is still targeting Java 8
ciftools-java 0.5.2
-------------
### Bug fixes
* category builder keeps order of registered columns
ciftools-java 0.5.1
-------------
### New features
* tweaks to builder
* no explicit call to `leaveColumn()` required any more when `Column` was created via `enterColumn()`
* binaryCIF now retains types for non-standard columns - text data still handles them as `StrColumn` in any case
### Bug fixes
* stops leaking of GSON dependency to dependents
ciftools-java 0.5.0
-------------
### Breaking API changes
* addresses flaw in API definition where invoking `build()` on `IntColumnBuilder`, `FloatColumnBuilder`, or
`StrColumnBuilder` returned a generic `Column` rather than the concrete implementation
ciftools-java 0.4.1
-------------
### Bug fixes
* fixed bug in IntervalQuantizationCodec
ciftools-java 0.4.0
-------------
### New features
* GZIP support
* automatic file type detection during reading: gzipped or plain, binary or text
* several convenience methods provided by `CifIO` and `CifOptions`
* support for other dictionary extensions: `chem_comp`, `entity_branch`, `ihm`
* category and column filtering during writing of files
### Breaking API changes
* merged `CifReader` and `CifWriter` into `CifIO` - e.g. `CifReader.readText(inputStream)` ->
`CifIO.readFromInputStream(inputStream)`
* drop single row behavior due to difficult detection and the risk of misinterpretation: there are rare cases where
categories only having a single row in the dictionary contain multiple values in reality
### Bug fixes
* removed instances of duplicated code
* typos in documentation
ciftools-java 0.3.0
-------------
### General
* initial release
ciftools-java-ciftools-java-7.0.1/LICENSE.md 0000664 0000000 0000000 00000002101 14766323153 0020427 0 ustar 00root root 0000000 0000000 The MIT License
Copyright (c) 2019 - now, Sebastian Bittrich
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE. ciftools-java-ciftools-java-7.0.1/README.md 0000664 0000000 0000000 00000021500 14766323153 0020306 0 ustar 00root root 0000000 0000000 [](https://maven-badges.herokuapp.com/maven-central/org.rcsb/ciftools-java)
[](https://github.com/rcsb/ciftools-java/blob/master/CHANGELOG.md)
[](https://doi.org/10.5281/zenodo.3948501)
# CIFTools
CIFTools implements reading and writing of CIF files ([specification](http://www.iucr.org/resources/cif/spec/version1.1/cifsyntax))
as well as their efficiently encoded counterpart, called BinaryCIF. The idea is to have a robust, type-safe
implementation for the handling of CIF files which does not care about the origin of the data: both conventional
text-based and binary files should be handled the same way.
## Getting Started
CIFTools is distributed by maven. To get started, append your `pom.xml` by:
```xml
org.rcsb
ciftools-java
7.0.0
```
Requires Java 11.
## File Parsing Example
```Java
class Demo {
public static void main(String[] args) {
String pdbId = "1acj";
boolean parseBinary = true;
// CIF and BinaryCIF are stored in the same data structure
// to access the data, it does not matter where and in which format the data came from
// all relevant IO operations are exposed by the CifIO class
CifFile cifFile;
if (parseBinary) {
// parse binary CIF from RCSB PDB
cifFile = CifIO.readFromURL(new URL("https://models.rcsb.org/" + pdbId + ".bcif"));
} else {
// parse CIF from RCSB PDB
cifFile = CifIO.readFromURL(new URL("https://files.rcsb.org/download/" + pdbId + ".cif"));
}
// fine-grained options are available in the CifOptions class
// access can be generic or using a specified schema - currently supports MMCIF and CIF_CORE
// you can even use a custom dictionary
MmCifFile mmCifFile = cifFile.as(StandardSchemata.MMCIF);
// get first block of CIF
MmCifBlock data = mmCifFile.getFirstBlock();
// get category with name '_atom_site' from first block - access is type-safe, all categories
// are inferred from the CIF schema
AtomSite atomSite = data.getAtomSite();
FloatColumn cartnX = atomSite.getCartnX();
// obtain entry id
String entryId = data.getEntry().getId().get(0);
System.out.println(entryId);
// calculate the average x-coordinate - #values() returns as DoubleStream as defined by the
// schema for column 'Cartn_x'
OptionalDouble averageCartnX = cartnX.values().average();
averageCartnX.ifPresent(System.out::println);
// print the last residue sequence id - this time #values() returns an IntStream
OptionalInt lastLabelSeqId = atomSite.getLabelSeqId().values().max();
lastLabelSeqId.ifPresent(System.out::println);
// print record type - or #values() may be text
Optional groupPdb = data.getAtomSite().getGroupPDB().values().findFirst();
groupPdb.ifPresent(System.out::println);
}
}
```
No difference exists in the API between text-based and binary CIF files. CIF files organize data in blocks, which contain
categories (e.g. `AtomSite`), which contain columns (e.g. `CartnX`), which contain values of a particular type (e.g.
`double` values representing x-coordinates of atoms). The correct names and types for all defined categories and column
from the CIF dictionary are provided.
Just as in Mol* implementation, all parsing and decoding is done as lazily as possible. This makes it cheap to acquire
the data structure and hardly wastes any time on preparing information you will never access. In contrast to
[MMTF](https://doi.org/10.1371/journal.pcbi.1005575), all data can be accessed if needed.
## Model Creation Example
```Java
class Demo {
public static void main(String[] args) {
// all builder functionality is exposed by the CifBuilder class
// again access can be generic or following a given schema
MmCifFile cifFile = CifBuilder.enterFile(StandardSchemata.MMCIF)
// create a block
.enterBlock("1EXP")
// create a category with name 'entry'
.enterEntry()
// set value of column 'id'
.enterId()
// to '1EXP'
.add("1EXP")
// leave current column
.leaveColumn()
// and category
.leaveCategory()
// create atom site category
.enterAtomSite()
// and specify some x-coordinates
.enterCartnX()
.add(1.0, -2.4, 4.5)
// values can be unknown or not specified
.markNextUnknown()
.add(-3.14, 5.0)
.leaveColumn()
// after leaving, the builder is in AtomSite again and provides column names
.enterCartnY()
.add(0.0, -1.0, 2.72)
.markNextNotPresent()
.add(42, 100)
.leaveColumn()
// leaving the builder will release the CifFile instance
.leaveCategory()
.leaveBlock()
.leaveFile();
// the created CifFile instance behaves like a parsed file and can be processed or written as needed
System.out.println(new String(CifIO.writeText(cifFile)));
System.out.println(cifFile.getFirstBlock().getEntry().getId().get(0));
cifFile.getFirstBlock()
.getAtomSite()
.getCartnX()
.values()
.forEach(System.out::println);
}
}
```
A step-wise builder is provided for the creation of `CifFile` instances. If a schema is provided, the builder is aware
of category and column names and the corresponding type described by a column (e.g. the `add` function called above is
not overloaded, but rather will only accept `String` values while in `entry.id` and only `double` values in
`atom_site.Cartn_x`.
## Read AlphaFold Model & Convert to BinaryCIF
```Java
class Demo {
public static void main(String[] args) {
String id = "AF-Q76EI6-F1-model_v4";
CifFile cifFile = CifIO.readFromURL(new URL("https://alphafold.ebi.ac.uk/files/" + id + ".cif"));
MmCifFile mmCifFile = cifFile.as(StandardSchemata.MMCIF);
// access to properties from the model-extension is provided
// print average per-residue confidence score provided by AlphaFold
System.out.println(mmCifFile.getFirstBlock()
.getMaQaMetricLocal()
.getMetricValue()
.values()
.average()
.orElseThrow());
// convert to BinaryCIF representation
byte[] output = CifIO.writeBinary(mmCifFile);
}
}
```
Computed structure models, e.g. from [AlphaFold](https://alphafold.ebi.ac.uk/), are supported. Access to categories and
columns defined by the mmCIF model extension is provided. This includes e.g. quality/confidence scores of the prediction.
Structure data can be converted to BinaryCIF files for more efficient storage & parsing of millions of files.
## Performance
The implementation can read the full PDB archive (154,015 files) in little over 2 minutes. This is achieved by lazy decoding and
parsing - all columns are decoded the first time when they are actually requested. Thus, the parsing overhead is kept
minimal. Ciftools-java combines the compression and read performance of MMTF and the convenience of the CIF format.

Handling gzipped files slows down parsing in most cases. The reduced files are either native MMTF files or contain a similar selection of
CIF categories (i.e. they provide primarily atomic coordinates).
## Contributions & Related Projects
- [molstar/ciftools](https://github.com/molstar/ciftools) a TypeScript/JavaScript implementation
- [molstar/BinaryCIF](https://github.com/molstar/BinaryCIF) BinaryCIF format specification
- [rcsb/py-mmcif](https://github.com/rcsb/py-mmcif) Python mmCIF Core Access Library
The implementation is based on a number of other projects, namely:
- [CIFtools.js](https://github.com/dsehnal/CIFTools.js) by David Sehnal
- [Mol*](https://molstar.github.io) by Alexander Rose and David Sehnal
- [MMTF](https://doi.org/10.1371/journal.pcbi.1005575) by RCSB
## References
- Sehnal D, Bittrich S, Velankar S, Koča J, Svobodová R, Burley SK, Rose AS (2020) BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management. PLoS Comput Biol 16(10): e1008247. https://doi.org/10.1371/journal.pcbi.1008247
ciftools-java-ciftools-java-7.0.1/performance.png 0000664 0000000 0000000 00000357777 14766323153 0022072 0 ustar 00root root 0000000 0000000 PNG
IHDR v z L pHYs :h&[ tEXtSoftware www.inkscape.org< IDATxy|]U?ϳ҉IR
-9{8ebzѫ^(8zpBpPA&GPcr>TDKAgڴ49{=?+a3'M>+/a?k?IXTpּ1Tj U=XD@ Vy9wUUstPzAjZLDDDDDDDDDDDDDD4~Na"""""jJB@DDDDDk g2XҴg/ =]xݭ?ZM7Xkmm}5k[+L,>y{-$}bػT] VIEU}z F%@2