Skip to content

Commit 635dee1

Browse files
committed
Align privacy and security with writing assistance APIs
See webmachinelearning/writing-assistance-apis#47. Closes #3. Closes #10.
1 parent 95cb82e commit 635dee1

File tree

3 files changed

+29
-28
lines changed

3 files changed

+29
-28
lines changed

README.md

Lines changed: 2 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -287,23 +287,9 @@ After a developer has a `Translator` or `LanguageDetector` object, further calls
287287

288288
This design means that the implementation must have all information about the capabilities of its translation and language detection models available beforehand, i.e. "shipped with the browser". (Either as part of the browser binary, or through some out-of-band update mechanism that eagerly pushes updates.)
289289

290-
## Privacy considerations
290+
## Privacy and security considerations
291291

292-
This proposal as-is has privacy issues, which we are actively thinking about how to address. They are all centered around how sites that use this API might be able to uniquely fingerprint the user.
293-
294-
The most obvious identifier in the current API design is the list of supported languages, and especially their availability status (`"unavailable"`, `"downloadable"`, `"downloading"`, and `"available"`). For example, as of the time of this writing [Firefox supports 9 languages](https://www.mozilla.org/firefox/features/translate/), which can each be [independently downloaded](https://support.mozilla.org/kb/website-translation#w_configure-installed-languages). With a naive implementation, this gives 9 bits of identifying information, which various sites can all correlate.
295-
296-
Some sort of mitigation may be necessary here. We believe this is adjacent to other areas that have seen similar mitigation, such as the [Local Font Access API](https://github.com/WICG/local-font-access/blob/main/README.md). Possible techniques are:
297-
298-
* Grouping language packs to reduce the number of bits, so that downloading one language also downloads others in its group.
299-
* Partitioning download status by top-level site, introducing a fake download (which takes time but does not actually download anything) for the second-onward site to download a language pack.
300-
* Only exposing a fixed set of languages to this API, e.g. based on the user's locale or the document's main language.
301-
302-
As a first step, we require that detecting the availability of translation/detection be done via individual calls to `Translator.availability()` and `LanguageDetector.availability()`. This allows browsers to implement possible mitigation techniques, such as detecting excessive calls to these methods and starting to return `"unavailable"`.
303-
304-
Another way in which this API might enhance the web's fingerprinting surface is if translation and language detection models are updated separately from browser versions. In that case, differing results from different versions of the model provide additional fingerprinting bits beyond those already provided by the browser's major version number. Mandating that older browser versions not receive updates or be able to download models from too far into the future might be a possible remediation for this.
305-
306-
Finally, we intend to prohibit (in the specification) any use of user-specific information in producing the results. For example, it would not be permissible to fine-tune the translation model based on information the user has entered into the browser in the past.
292+
Please see [the Writing Assistance APIs specification](https://webmachinelearning.github.io/writing-assistance-apis/#privacy), where we have centralized the normative privacy and security considerations that apply to these APIs as well as the writing assistance APIs.
307293

308294
### Permissions policy, iframes, and workers
309295

index.bs

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ This specification depends on the Infra Standard. [[!INFRA]]
4646

4747
As with the rest of the web platform, human languages are identified in these APIs by BCP 47 language tags, such as "`ja`", "`en-US`", "`sr-Cyrl`", or "`de-CH-1901-x-phonebk-extended`". The specific algorithms used for validation, canonicalization, and language tag matching are those from the <cite>ECMAScript Internationalization API Specification</cite>, which in turn defers some of its processing to <cite>Unicode Locale Data Markup Language (LDML)</cite>. [[BCP47]] [[!ECMA-402]] [[UTS35]].
4848

49-
These APIs are part of a family of APIs expected to be powered by machine learning models, which share common API surface idioms and specification patterns. Currently, the specification text for these shared parts lives in [[WRITING-ASSISTANCE-APIS#supporting]]. Implementing these APIs requires implementing that shared infrastructure, but does not require implementing or exposing the actual writing assistance APIs. [[!WRITING-ASSISTANCE-APIS]]
49+
These APIs are part of a family of APIs expected to be powered by machine learning models, which share common API surface idioms and specification patterns. Currently, the specification text for these shared parts lives in [[WRITING-ASSISTANCE-APIS#supporting]], and the common privacy and security considerations are discussed in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Implementing these APIs requires implementing that shared infrastructure, and conforming to those privacy and security considerations. But it does not require implementing or exposing the actual writing assistance APIs. [[!WRITING-ASSISTANCE-APIS]]
5050

5151
<h2 id="translator-api">The translator API</h2>
5252

@@ -209,15 +209,15 @@ A <dfn>language arc</dfn> is a [=tuple=] of two strings, a <dfn for="language ar
209209

210210
1. [=Assert=]: this algorithm is running [=in parallel=].
211211

212-
1. If there is some error attempting to determine what language arcs the user agent supports translating text between, which the user agent believes to be transient (such that re-querying the [=translator language arc availabilities=] could stop producing such an error), then return null.
212+
1. If there is some error attempting to determine what language arcs the user agent [=model availability/can support=] translating text between, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
213213

214214
1. Return a [=map=] from [=language arcs=] to {{Availability}} values, where each key is a [=language arc=] that the user agent supports translating text between, filled according to the following constraints:
215215

216-
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=] without performing any downloading operations, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/available}}".
216+
* If the user agent [=model availability/currently supports=] translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/available}}".
217217

218-
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after finishing a currently-ongoing download, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/downloading}}".
218+
* If the user agent believes it will be able to [=model availability/support=] translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after finishing a download that is already ongoing, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/downloading}}".
219219

220-
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after performing a not-currently ongoing download, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/downloadable}}".
220+
* If the user agent believes it will be able to [=model availability/support=] translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after performing a not-currently ongoing download, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/downloadable}}".
221221

222222
* The [=map/keys=] must not include any [=language arcs=] that [=language arc/overlap=] with the other [=map/keys=].
223223
</div>
@@ -375,6 +375,8 @@ The <dfn attribute for="Translator">inputQuota</dfn> getter steps are to return
375375

376376
If (|sourceLanguage|, |targetLanguage|) [=language arc/can be fulfilled by the identity translation=], then the resulting translation should be |input|.
377377

378+
The translation process must conform to the guidance given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]], notably including (but not limited to) [[WRITING-ASSISTANCE-APIS#privacy-user-input]] and [[WRITING-ASSISTANCE-APIS#security-runtime]].
379+
378380
1. While true:
379381

380382
1. Wait for the next chunk of translated text to be produced, for the translation process to finish, or for the result of calling |stopProducing| to become true.
@@ -456,7 +458,7 @@ When translation fails, the following possible reasons may be surfaced to the we
456458
<tr>
457459
<td>"{{UnknownError}}"
458460
<td>
459-
<p>All other scenarios, or if the user agent would prefer not to disclose the failure reason.
461+
<p>All other scenarios, including if the user agent believes it cannot translate and also meet the requirements given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Or, if the user agent would prefer not to disclose the failure reason.
460462
</table>
461463

462464
<p class="note">This table does not give the complete list of exceptions that can be surfaced by the translator API. It only contains those which can come from certain [=implementation-defined=] steps.
@@ -584,7 +586,7 @@ dictionary LanguageDetectionResult {
584586

585587
1. [=Assert=]: this algorithm is running [=in parallel=].
586588

587-
1. If there is some error attempting to determine what languages the user agent supports detecting, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
589+
1. If there is some error attempting to determine what language detection capabilities the user agent [=model availability/can support=], which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
588590

589591
1. Let |partition| be the result of [=getting the language availabilities partition=] given the purpose of detecting text written in that language.
590592

@@ -710,6 +712,8 @@ The <dfn attribute for="LanguageDetector">inputQuota</dfn> getter steps are to r
710712

711713
If an error occurred during language detection, then return an [=error information=] according to the guidance in [[#language-detector-errors]].
712714

715+
The detection process must conform to the guidance given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]], notably including (but not limited to) [[WRITING-ASSISTANCE-APIS#privacy-user-input]] and [[WRITING-ASSISTANCE-APIS#security-runtime]].
716+
713717
1. [=map/Sort in descending order=] |rawResult| with a less than algorithm which given [=map/entries=] |a| and |b|, returns true if |a|'s [=map/value=] is less than |b|'s [=map/value=].
714718

715719
1. Let |results| be an empty [=list=].
@@ -784,11 +788,19 @@ When language detection fails, the following possible reasons may be surfaced to
784788
<tr>
785789
<td>"{{UnknownError}}"
786790
<td>
787-
<p>All other scenarios, or if the user agent would prefer not to disclose the failure reason.
791+
<p>All other scenarios, including if the user agent believes it cannot detect and also meet the requirements given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Or, if the user agent would prefer not to disclose the failure reason.
788792
</table>
789793

790794
<p class="note">This table does not give the complete list of exceptions that can be surfaced by the language detector API. It only contains those which can come from certain [=implementation-defined=] steps.
791795

792796
<h3 id="language-detector-permissions-policy">Permissions policy integration</h3>
793797

794798
Access to the language detector API is gated behind the [=policy-controlled feature=] "<dfn permission>language-detector</dfn>", which has a [=policy-controlled feature/default allowlist=] of <code>[=default allowlist/'self'=]</code>.
799+
800+
<h2 id="privacy">Privacy considerations</h2>
801+
802+
Please see [[WRITING-ASSISTANCE-APIS#privacy]] for a discussion of privacy considerations for the translator and language detector APIs. That text was written to apply to all APIs sharing the same infrastructure, as noted in [[#dependencies]].
803+
804+
<h2 id="security">Security considerations</h2>
805+
806+
Please see [[WRITING-ASSISTANCE-APIS#security]] for a discussion of security considerations for the translator and language detector APIs. That text was written to apply to all APIs sharing the same infrastructure, as noted in [[#dependencies]].

security-privacy-questionnaire.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,12 @@ This feature exposes two main pieces of information:
1111

1212
- The actual results of translations and language detections, which can be dependent on the AI models in use.
1313

14+
The privacy implications of both of these are discussed, in general terms, [in the _Writing Assistance APIs_ specification](https://webmachinelearning.github.io/writing-assistance-apis/#privacy), which was written to cover all APIs with similar concerns.
15+
1416
> 02. Do features in your specification expose the minimum amount of information
1517
> necessary to implement the intended functionality?
1618
17-
We believe so. It's possible that we could remove the exposure of the availability information. However, it would almost certainly be inferrable via timing side-channels. (I.e., if downloading a language pack is required, then the web developer can observe the first translation taking longer.)
19+
We believe so. It's possible that we could remove the exposure of the download status information. However, it would almost certainly be inferrable via timing side-channels. (I.e., if downloading a language pack is required, then the web developer can observe the first translation taking longer.)
1820

1921
> 03. Do the features in your specification expose personal information,
2022
> personally-identifiable information (PII), or information derived from
@@ -69,7 +71,7 @@ None.
6971
7072
We use permissions policy to disallow the usage of these features by default in third-party (cross-origin) contexts. However, the top-level site can delegate to cross-origin iframes.
7173

72-
It's also possible that the [anti-fingerprinting considerations](./README.md#privacy-considerations) will require some sort of distinction between first- and third-party contexts. For example, partitioning download status, or only using the top-level site's detected language, or similar.
74+
Otherwise, some of the possible [anti-fingerprinting mitigations](https://webmachinelearning.github.io/writing-assistance-apis/#privacy-availability) involve partitioning information across sites, which is kind of like distinguishing between first- and third-party contexts.
7375

7476
> 14. How do the features in this specification work in the context of a browser’s
7577
> Private Browsing or Incognito mode?
@@ -81,9 +83,10 @@ Another possible area of discussion here is whether cloud-based translation APIs
8183
> 15. Does this specification have both "Security Considerations" and "Privacy
8284
> Considerations" sections?
8385
84-
There is no specification yet, but there is a [privacy considerations](./README.md#privacy-considerations) section in the explainer.
86+
Yes:
8587

86-
We do not anticipate significant security risks for this feature at this time.
88+
* [Privacy considerations](https://webmachinelearning.github.io/translation-api/#privacy) (delegates to [the corresponding section in _Writing Assistance APIs_](https://webmachinelearning.github.io/writing-assistance-apis/#privacy))
89+
* [Security considerations](https://webmachinelearning.github.io/translation-api/#security) (delegates to [the corresponding section in _Writing Assistance APIs_](https://webmachinelearning.github.io/writing-assistance-apis/#security))
8790

8891
> 16. Do features in your specification enable origins to downgrade default
8992
> security protections?

0 commit comments

Comments
 (0)