You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-16Lines changed: 2 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -287,23 +287,9 @@ After a developer has a `Translator` or `LanguageDetector` object, further calls
287
287
288
288
This design means that the implementation must have all information about the capabilities of its translation and language detection models available beforehand, i.e. "shipped with the browser". (Either as part of the browser binary, or through some out-of-band update mechanism that eagerly pushes updates.)
289
289
290
-
## Privacy considerations
290
+
## Privacy and security considerations
291
291
292
-
This proposal as-is has privacy issues, which we are actively thinking about how to address. They are all centered around how sites that use this API might be able to uniquely fingerprint the user.
293
-
294
-
The most obvious identifier in the current API design is the list of supported languages, and especially their availability status (`"unavailable"`, `"downloadable"`, `"downloading"`, and `"available"`). For example, as of the time of this writing [Firefox supports 9 languages](https://www.mozilla.org/firefox/features/translate/), which can each be [independently downloaded](https://support.mozilla.org/kb/website-translation#w_configure-installed-languages). With a naive implementation, this gives 9 bits of identifying information, which various sites can all correlate.
295
-
296
-
Some sort of mitigation may be necessary here. We believe this is adjacent to other areas that have seen similar mitigation, such as the [Local Font Access API](https://github.com/WICG/local-font-access/blob/main/README.md). Possible techniques are:
297
-
298
-
* Grouping language packs to reduce the number of bits, so that downloading one language also downloads others in its group.
299
-
* Partitioning download status by top-level site, introducing a fake download (which takes time but does not actually download anything) for the second-onward site to download a language pack.
300
-
* Only exposing a fixed set of languages to this API, e.g. based on the user's locale or the document's main language.
301
-
302
-
As a first step, we require that detecting the availability of translation/detection be done via individual calls to `Translator.availability()` and `LanguageDetector.availability()`. This allows browsers to implement possible mitigation techniques, such as detecting excessive calls to these methods and starting to return `"unavailable"`.
303
-
304
-
Another way in which this API might enhance the web's fingerprinting surface is if translation and language detection models are updated separately from browser versions. In that case, differing results from different versions of the model provide additional fingerprinting bits beyond those already provided by the browser's major version number. Mandating that older browser versions not receive updates or be able to download models from too far into the future might be a possible remediation for this.
305
-
306
-
Finally, we intend to prohibit (in the specification) any use of user-specific information in producing the results. For example, it would not be permissible to fine-tune the translation model based on information the user has entered into the browser in the past.
292
+
Please see [the Writing Assistance APIs specification](https://webmachinelearning.github.io/writing-assistance-apis/#privacy), where we have centralized the normative privacy and security considerations that apply to these APIs as well as the writing assistance APIs.
Copy file name to clipboardExpand all lines: index.bs
+20-8Lines changed: 20 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,7 @@ This specification depends on the Infra Standard. [[!INFRA]]
46
46
47
47
As with the rest of the web platform, human languages are identified in these APIs by BCP 47 language tags, such as "`ja`", "`en-US`", "`sr-Cyrl`", or "`de-CH-1901-x-phonebk-extended`". The specific algorithms used for validation, canonicalization, and language tag matching are those from the <cite>ECMAScript Internationalization API Specification</cite>, which in turn defers some of its processing to <cite>Unicode Locale Data Markup Language (LDML)</cite>. [[BCP47]][[!ECMA-402]][[UTS35]].
48
48
49
-
These APIs are part of a family of APIs expected to be powered by machine learning models, which share common API surface idioms and specification patterns. Currently, the specification text for these shared parts lives in [[WRITING-ASSISTANCE-APIS#supporting]]. Implementing these APIs requires implementing that shared infrastructure, but does not require implementing or exposing the actual writing assistance APIs. [[!WRITING-ASSISTANCE-APIS]]
49
+
These APIs are part of a family of APIs expected to be powered by machine learning models, which share common API surface idioms and specification patterns. Currently, the specification text for these shared parts lives in [[WRITING-ASSISTANCE-APIS#supporting]], and the common privacy and security considerations are discussed in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Implementing these APIs requires implementing that shared infrastructure, and conforming to those privacy and security considerations. But it does not require implementing or exposing the actual writing assistance APIs. [[!WRITING-ASSISTANCE-APIS]]
50
50
51
51
<h2 id="translator-api">The translator API</h2>
52
52
@@ -209,15 +209,15 @@ A <dfn>language arc</dfn> is a [=tuple=] of two strings, a <dfn for="language ar
209
209
210
210
1. [=Assert=]: this algorithm is running [=in parallel=].
211
211
212
-
1. If there is some error attempting to determine what language arcs the user agent supports translating text between, which the user agent believes to be transient (such that re-querying the [=translator language arc availabilities=] could stop producing such an error), then return null.
212
+
1. If there is some error attempting to determine what language arcs the user agent [=model availability/can support=]translating text between, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
213
213
214
214
1. Return a [=map=] from [=language arcs=] to {{Availability}} values, where each key is a [=language arc=] that the user agent supports translating text between, filled according to the following constraints:
215
215
216
-
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=] without performing any downloading operations, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/available}}".
216
+
* If the user agent [=model availability/currently supports=] translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/available}}".
217
217
218
-
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after finishing a currently-ongoing download, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/downloading}}".
218
+
* If the user agent believes it will be able to [=model availability/support=]translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after finishing a download that is already ongoing, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/downloading}}".
219
219
220
-
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after performing a not-currently ongoing download, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/downloadable}}".
220
+
* If the user agent believes it will be able to [=model availability/support=] translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after performing a not-currently ongoing download, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{Availability/downloadable}}".
221
221
222
222
* The [=map/keys=] must not include any [=language arcs=] that [=language arc/overlap=] with the other [=map/keys=].
223
223
</div>
@@ -375,6 +375,8 @@ The <dfn attribute for="Translator">inputQuota</dfn> getter steps are to return
375
375
376
376
If (|sourceLanguage|, |targetLanguage|) [=language arc/can be fulfilled by the identity translation=], then the resulting translation should be |input|.
377
377
378
+
The translation process must conform to the guidance given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]], notably including (but not limited to) [[WRITING-ASSISTANCE-APIS#privacy-user-input]] and [[WRITING-ASSISTANCE-APIS#security-runtime]].
379
+
378
380
1. While true:
379
381
380
382
1. Wait for the next chunk of translated text to be produced, for the translation process to finish, or for the result of calling |stopProducing| to become true.
@@ -456,7 +458,7 @@ When translation fails, the following possible reasons may be surfaced to the we
456
458
<tr>
457
459
<td>"{{UnknownError}}"
458
460
<td>
459
-
<p>All other scenarios, or if the user agent would prefer not to disclose the failure reason.
461
+
<p>All other scenarios, including if the user agent believes it cannot translate and also meet the requirements given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Or, if the user agent would prefer not to disclose the failure reason.
460
462
</table>
461
463
462
464
<p class="note">This table does not give the complete list of exceptions that can be surfaced by the translator API. It only contains those which can come from certain [=implementation-defined=] steps.
1. [=Assert=]: this algorithm is running [=in parallel=].
586
588
587
-
1. If there is some error attempting to determine what languages the user agent supports detecting, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
589
+
1. If there is some error attempting to determine what language detection capabilities the user agent [=model availability/can support=], which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
588
590
589
591
1. Let |partition| be the result of [=getting the language availabilities partition=] given the purpose of detecting text written in that language.
590
592
@@ -710,6 +712,8 @@ The <dfn attribute for="LanguageDetector">inputQuota</dfn> getter steps are to r
710
712
711
713
If an error occurred during language detection, then return an [=error information=] according to the guidance in [[#language-detector-errors]].
712
714
715
+
The detection process must conform to the guidance given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]], notably including (but not limited to) [[WRITING-ASSISTANCE-APIS#privacy-user-input]] and [[WRITING-ASSISTANCE-APIS#security-runtime]].
716
+
713
717
1. [=map/Sort in descending order=] |rawResult| with a less than algorithm which given [=map/entries=] |a| and |b|, returns true if |a|'s [=map/value=] is less than |b|'s [=map/value=].
714
718
715
719
1. Let |results| be an empty [=list=].
@@ -784,11 +788,19 @@ When language detection fails, the following possible reasons may be surfaced to
784
788
<tr>
785
789
<td>"{{UnknownError}}"
786
790
<td>
787
-
<p>All other scenarios, or if the user agent would prefer not to disclose the failure reason.
791
+
<p>All other scenarios, including if the user agent believes it cannot detect and also meet the requirements given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Or, if the user agent would prefer not to disclose the failure reason.
788
792
</table>
789
793
790
794
<p class="note">This table does not give the complete list of exceptions that can be surfaced by the language detector API. It only contains those which can come from certain [=implementation-defined=] steps.
Access to the language detector API is gated behind the [=policy-controlled feature=] "<dfn permission>language-detector</dfn>", which has a [=policy-controlled feature/default allowlist=] of <code>[=default allowlist/'self'=]</code>.
799
+
800
+
<h2 id="privacy">Privacy considerations</h2>
801
+
802
+
Please see [[WRITING-ASSISTANCE-APIS#privacy]] for a discussion of privacy considerations for the translator and language detector APIs. That text was written to apply to all APIs sharing the same infrastructure, as noted in [[#dependencies]].
803
+
804
+
<h2 id="security">Security considerations</h2>
805
+
806
+
Please see [[WRITING-ASSISTANCE-APIS#security]] for a discussion of security considerations for the translator and language detector APIs. That text was written to apply to all APIs sharing the same infrastructure, as noted in [[#dependencies]].
Copy file name to clipboardExpand all lines: security-privacy-questionnaire.md
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,10 +11,12 @@ This feature exposes two main pieces of information:
11
11
12
12
- The actual results of translations and language detections, which can be dependent on the AI models in use.
13
13
14
+
The privacy implications of both of these are discussed, in general terms, [in the _Writing Assistance APIs_ specification](https://webmachinelearning.github.io/writing-assistance-apis/#privacy), which was written to cover all APIs with similar concerns.
15
+
14
16
> 02. Do features in your specification expose the minimum amount of information
15
17
> necessary to implement the intended functionality?
16
18
17
-
We believe so. It's possible that we could remove the exposure of the availability information. However, it would almost certainly be inferrable via timing side-channels. (I.e., if downloading a language pack is required, then the web developer can observe the first translation taking longer.)
19
+
We believe so. It's possible that we could remove the exposure of the download status information. However, it would almost certainly be inferrable via timing side-channels. (I.e., if downloading a language pack is required, then the web developer can observe the first translation taking longer.)
18
20
19
21
> 03. Do the features in your specification expose personal information,
20
22
> personally-identifiable information (PII), or information derived from
@@ -69,7 +71,7 @@ None.
69
71
70
72
We use permissions policy to disallow the usage of these features by default in third-party (cross-origin) contexts. However, the top-level site can delegate to cross-origin iframes.
71
73
72
-
It's also possible that the [anti-fingerprinting considerations](./README.md#privacy-considerations) will require some sort of distinction between first- and third-party contexts. For example, partitioning download status, or only using the top-level site's detected language, or similar.
74
+
Otherwise, some of the possible [anti-fingerprinting mitigations](https://webmachinelearning.github.io/writing-assistance-apis/#privacy-availability) involve partitioning information across sites, which is kind of like distinguishing between first- and third-party contexts.
73
75
74
76
> 14. How do the features in this specification work in the context of a browser’s
75
77
> Private Browsing or Incognito mode?
@@ -81,9 +83,10 @@ Another possible area of discussion here is whether cloud-based translation APIs
81
83
> 15. Does this specification have both "Security Considerations" and "Privacy
82
84
> Considerations" sections?
83
85
84
-
There is no specification yet, but there is a [privacy considerations](./README.md#privacy-considerations) section in the explainer.
86
+
Yes:
85
87
86
-
We do not anticipate significant security risks for this feature at this time.
88
+
*[Privacy considerations](https://webmachinelearning.github.io/translation-api/#privacy) (delegates to [the corresponding section in _Writing Assistance APIs_](https://webmachinelearning.github.io/writing-assistance-apis/#privacy))
89
+
*[Security considerations](https://webmachinelearning.github.io/translation-api/#security) (delegates to [the corresponding section in _Writing Assistance APIs_](https://webmachinelearning.github.io/writing-assistance-apis/#security))
87
90
88
91
> 16. Do features in your specification enable origins to downgrade default
0 commit comments