Skip to content

Commit e6a377e

Browse files
author
Release Manager
committed
gh-37399: Developer Guide: Document practices for data files Topics here: - ext_data is legacy location only, see #33037 - separate repos for large data files - importlib_resources Follow-ups: - use of Features <!-- ^^^^^ Please provide a concise, informative and self-explanatory title. Don't put issue numbers in there, do this in the PR body below. For example, instead of "Fixes #1234" use "Introduce new method to calculate 1+1" --> <!-- Describe your changes here in detail --> <!-- Why is this change required? What problem does it solve? --> <!-- If this PR resolves an open issue, please link to it here. For example "Fixes #12345". --> <!-- If your change requires a documentation PR, please link it appropriately. --> ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> <!-- If your change requires a documentation PR, please link it appropriately --> <!-- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> <!-- Feel free to remove irrelevant items. --> - [x] The title is concise, informative, and self-explanatory. - [ ] The description explains in detail what this PR is about. - [ ] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation accordingly. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on - #12345: short description why this is a dependency - #34567: ... --> <!-- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> URL: #37399 Reported by: Matthias Köppe Reviewer(s): gmou3, Gonzalo Tornaría, Matthias Köppe, Sebastian Oehms
2 parents 3ad892a + d3a4313 commit e6a377e

File tree

2 files changed

+61
-11
lines changed

2 files changed

+61
-11
lines changed

src/doc/en/developer/coding_basics.rst

Lines changed: 60 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -89,9 +89,9 @@ In particular,
8989
Files and directory structure
9090
=============================
9191

92-
Roughly, the Sage directory tree is layout like this. Note that we use
93-
``SAGE_ROOT`` in the following as a shortcut for the (arbitrary) name
94-
of the directory containing the Sage sources:
92+
Roughly, the Sage directory tree is laid out like this. Note that we
93+
use ``SAGE_ROOT`` in the following as a shortcut for the name of the
94+
directory containing the Sage sources:
9595

9696
.. CODE-BLOCK:: text
9797
@@ -104,7 +104,7 @@ of the directory containing the Sage sources:
104104
setup.py
105105
...
106106
sage/ # Sage library
107-
ext_data/ # extra Sage resources (formerly src/ext)
107+
ext_data/ # extra Sage resources (legacy)
108108
bin/ # the scripts in local/bin that are tracked
109109
upstream/ # tarballs of upstream sources
110110
local/ # installed binaries
@@ -149,15 +149,36 @@ Adding new top-level packages below :mod:`sage` should be done
149149
sparingly. It is often better to create subpackages of existing
150150
packages.
151151

152-
Non-Python Sage source code and supporting files can be included in one
153-
of the following places:
152+
Non-Python Sage source code and small supporting files can be
153+
included in one of the following places:
154154

155155
- In the directory of the Python code that uses that file. When the
156156
Sage library is installed, the file will be installed in the same
157-
location as the Python code. For example,
158-
``SAGE_ROOT/src/sage/interfaces/maxima.py`` needs to use the file
159-
``SAGE_ROOT/src/sage/interfaces/maxima.lisp`` at runtime, so it refers
160-
to it as ::
157+
location as the Python code. This is referred to as "package data".
158+
159+
The preferred way to access the data from Python is using the
160+
`importlib.resources API
161+
<https://importlib-resources.readthedocs.io/en/latest/using.html>`_,
162+
in particular the function :func:`importlib.resources.files`.
163+
Using it, you can:
164+
165+
- open a resource for text reading: ``fd = files(package).joinpath(resource).open('rt')``
166+
- open a resource for binary reading: ``fd = files(package).joinpath(resource).open('rb')``
167+
- read a resource as text: ``text = files(package).joinpath(resource).read_text()``
168+
- read a resource as bytes: ``bytes = files(package).joinpath(resource).read_bytes()``
169+
- open an xz-compressed resource for text reading: ``fd = lzma.open(files(package).joinpath(resource).open('rb'), 'rt')``
170+
- open an xz-compressed resource for binary reading: ``fd = lzma.open(files(package).joinpath(resource).open('rb'), 'rb')``
171+
172+
If the file needs to be used outside of Python, then the
173+
preferred way is using the context manager
174+
:func:`importlib.resources.as_file`. It should be imported in the
175+
same way as shown above.
176+
177+
- Older code in the Sage library accesses
178+
the package data in more direct ways. For example,
179+
``SAGE_ROOT/src/sage/interfaces/maxima.py`` uses the file
180+
``SAGE_ROOT/src/sage/interfaces/maxima.lisp`` at runtime, so it
181+
refers to it as::
161182

162183
os.path.join(os.path.dirname(__file__), 'sage-maxima.lisp')
163184

@@ -169,11 +190,39 @@ of the following places:
169190
from sage.env import SAGE_EXTCODE
170191
file = os.path.join(SAGE_EXTCODE, 'directory', 'file')
171192

172-
In both cases, the files must be listed (explicitly or via wildcards) in
193+
This practice is deprecated, see :issue:`33037`.
194+
195+
In all cases, the files must be listed (explicitly or via wildcards) in
173196
the section ``options.package_data`` of the file
174197
``SAGE_ROOT/pkgs/sagemath-standard/setup.cfg.m4`` (or the corresponding
175198
file of another distribution).
176199

200+
Large data files should not be added to the Sage source tree. Instead, it
201+
is proposed to do the following:
202+
203+
- create a separate git repository and upload them there [2]_,
204+
205+
- add metadata to the repository that make it a pip-installable
206+
package (distribution package), as explained for example in the
207+
`Python Packaging User Guide
208+
<https://packaging.python.org/en/latest/tutorials/packaging-projects/>`_,
209+
210+
- `upload it to PyPI
211+
<https://packaging.python.org/en/latest/tutorials/packaging-projects/#uploading-the-distribution-archives>`_,
212+
213+
- create metadata in ``SAGE_ROOT/build/pkgs`` that make your new
214+
pip-installable package known to Sage; see :ref:`chapter-packaging`.
215+
216+
For guiding examples of external repositories that host large data
217+
files, see https://github.com/sagemath/conway-polynomials, and
218+
https://github.com/gmou3/matroid-database.
219+
220+
.. [2]
221+
222+
It is also suggested that the files are compressed, e.g., through
223+
the command ``xz -e``. They can then be read via a command such as
224+
``lzma.open(file, 'rt')``.
225+
177226
178227
Learn by copy/paste
179228
===================

src/doc/en/developer/coding_in_python.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Coding in Python for Sage
77
This chapter discusses some issues with, and advice for, coding in
88
Sage.
99

10+
.. _section-python-language-standard:
1011

1112
Python language standard
1213
========================

0 commit comments

Comments
 (0)