Fix merge conflicts#37
Merged
ynimmaga merged 585 commits intoynimmaga:openvino_backendfrom Mar 24, 2025
Merged
Conversation
Differential Revision: D71001041 Pull Request resolved: pytorch#9168
Adds TOSA support for logical not in Arm backend. cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 Signed-off-by: Måns Nilsson <mans.nilsson@arm.com> Co-authored-by: Yufeng Shi <yufeng.shi@arm.com>
Summary: Fix a typo in executorch documentation https://pytorch.org/executorch/main/backends-xnnpack.html Reviewed By: cccclai Differential Revision: D70645356 Co-authored-by: Frank Yu <frankyu@meta.com>
### Summary Just to make it consistent with its `linux` counterpart, let's update the reference from M1 to just macOS. ### Test plan CI cc @larryliu0820 @lucylq
The old value of 4 min was to restrictive when running bigger models on some machines and caused github runners to fail sometimes. cc @digantdesai @freddan80 @per @oscarandersson8218 Signed-off-by: Zingo Andersen <zingo.andersen@arm.com>
…torch#9083) This makes it more easy to run the scripts from various tools like your editor. Signed-off-by: Zingo Andersen <zingo.andersen@arm.com>
- Remove xfails - Refactor test_mm to use testing pipelines Signed-off-by: Erik Lundell <erik.lundell@arm.com>
…t-buck) (pytorch#9159)" (pytorch#9187) This reverts commit 70d4427.
…g builds (pytorch#9044)" (pytorch#9188) This reverts commit 8f7bc8d.
This reverts commit 2889483.
### Summary - add SXR2330P ### Test plan ```bash python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator -s $SERIAL_NO -m SM8650 -b build-android ```
Needed after extension llm third-party deps moved into tokenizers subdir
### Summary Update XNNPACK backend doc page to cover quantization schemes and generally clean up the format. Update backend template doc with additional detail. ### Test plan Built docs locally and verified contents. cc @mergennachin @byjlw
### Summary We don't need to duplicate the deps source. ### Test plan CI
…ytorch#9190) This reverts commit 05a160e. Revert "Revert "Make serial parallel_for "polyfill" iterate backwards in debug builds (pytorch#9044)"" This reverts commit 815eaff. Revert "Revert "Unbreak optimized kernels buck build (and check it in unittest-buck) (pytorch#9159)"" This reverts commit 10bb615.
Needed to efficiently use parallel_for with BroadcastIndexesRange.
…ch#9058) Now all the apply functions share a common implementation, which means further changes (e.g., parallel_for, generating specialized dtypes for the case where all inputs have the same type) don't need to be repeated 3 times. (Interestingly, this seems to increase the effectiveness of the following parallelization change. Not entirely sure why, but I checked the generated code for optimized op_where and it seems to have improved, which is surprising.)
Internal model got a 5.7% latency improvement (313.8 ms before, 296.0 ms after).
Not sure why the threadpool extension isn't mentioned or built in here; we should follow up on that.
… args (pytorch#9203) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#9173 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/196/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/196/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/196/orig @diff-train-skip-merge Co-authored-by: Stephen Jia <ssjia@meta.com>
CI is assigning Python 3.9 machines, but ET requires 3.10.
Use old one. No need to use new API
Differential Revision: D71023514 Pull Request resolved: pytorch#9196
This is step (5) of pytorch#8932. At this exact moment, this rebuild is inefficient because it rebuilds the whole portable op library, but ops don't support optional parallelization just yet. This will become less true when we roll out parallel_for support across portable ops immediately following this PR.
I attempted to port `at::parallel_reduce` to ExecuTorch and use that in reduce_util.h, but it turned out to be much trickier than expected. (In brief: parallel reduction requires two steps: 1) split the input range into chunks and reduce over them (easily done like parallel_for), and then 2) combine the sub-results from chunks. The reduction function accepted by reduce_over_dim is not well-suited to step (2).) Instead, I ported the parallelization strategy used by binary_kernel_reduce_lastdim: just parallelize over the *non*-reduced dimensions of the tensor. I don't understand why this strategy isn't generally applicable and we aren't otherwise capable of parallelizing reductions, so I haven't gated it to the case where we are reducing over a contiguous last dimension. I will send a follow-up that packages up this strategy nicely and uses it in our reduction portable ops.
No need to re-calculate numel() here.
…rallelization PoC (pytorch#9139)
Everything but the Python test (which depends on //caffe2:torch) is fine.
Differential Revision: D71073675 Pull Request resolved: pytorch#9206
Differential Revision: D70184325 Pull Request resolved: pytorch#8488
Differential Revision: D71404805 Pull Request resolved: pytorch#9456
Differential Revision: D71634148 Pull Request resolved: pytorch#9522
### Summary - QC backend changes for adopting LPBQ - test case: conv2d 16a4w - refactor a bit ### Test plan ```bash python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_conv2d_block -s $SERIAL_NO -m SM8650 -b build-android ```
Differential Revision: D71699118 Pull Request resolved: pytorch#9529
Differential Revision: D71698626 Pull Request resolved: pytorch#9528
Differential Revision: D71591385 Pull Request resolved: pytorch#9493
…ytorch#9489) Note that this includes ops that are not currently implemented. These ops are added for completeness. Signed-off-by: Erik Lundell <erik.lundell@arm.com>
- Change constant test data to random generators. - Tests on Ethos-U55 are meant to xfail as int 16 tables are currently not supported. - For other tests, add flaky marker. Remove increased qtol, since the inaccuracies only show up sporadically. Signed-off-by: Erik Lundell <erik.lundell@arm.com>
We've been seeing flakes due to missing torchgen that have become more common. After some investigation, it appears that pytorch#8688 was probably overzealous: installing pytorch was probably also installing torchgen, so let's ~~install pytorch~~just run the macos setup script to avoid proliferating configurations. Test Plan: unittest-buck / macos on this PR, monitor to see if failures go away.
Differential Revision: D71699919 Pull Request resolved: pytorch#9530
…9434) ### Summary The final diff as part of pytorch#9117. This is the big one that affects users — we finally move the core build scripts into `scripts/` ### Test plan CI cc @larryliu0820 @lucylq
### Summary TSIA after pytorch#9117. ### Test plan N/A
Differential Revision: D71667998 Pull Request resolved: pytorch#9523
Arm backend: support for CEIL op - Update unary operator factory with CEIL op - Rename and refactor test_floor to handle similar ops Signed-off-by: Madeleine Dunn <madeleine.dunn@arm.com>
Differential Revision: D71713725 Pull Request resolved: pytorch#9546
…9548) Updates torchao pin to enable shared embedding quantization.
Fix the maven deps. Should use fbjni. Add AndroidManifest.xml for test.
Add pytorch#9354 by [Inklingdq](https://github.com/Inklingdq) which was accidentally merged to `viable/strict` to `main`. Co-authored-by: Inkling <64665980+Inklingdq@users.noreply.github.com>
One of the current drawback of using pinned PyTorch commit on CI is that we need to build PyTorch wheel on all MacOS jobs because it doesn't have Docker image. Building PyTorch wheel is usually not too bad because we have sccache in place to make the compilation faster. However, it's still slower than using a prebuilt wheel, and sccache is also not available on GitHub MacOS runner `macos-latest-xlarge` (no access to S3). As all MacOS jobs are building exactly the same PyTorch wheel, the proposal here is to cache the wheel on S3 `gha-artifacts` bucket which is publicly readable, i.e. https://gha-artifacts.s3.us-east-1.amazonaws.com/cached_artifacts/pytorch/executorch/pytorch_wheels/Darwin/311/torch-2.7.0a0%2Bgit295f2ed-cp311-cp311-macosx_14_0_arm64.whl. The job can check for matching wheel from S3 and use it instead. If there is no such wheel, it will continue building PyTorch normally. Once a new wheel is built and if the runner has write access to S3, it will upload the wheel so that other jobs can pick it up going forward. ### Testing All CI jobs pass (failures are pre-existing from trunk). Here are some quick number on how this helps reduce the durations of different MacOS jobs. * Apple workflow: * build-benchmark-app: [BEFORE](https://github.com/pytorch/executorch/actions/runs/14002229786/job/39210715922) ~80m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214390843) ~44m * build-frameworks-ios: [BEFORE](https://github.com/pytorch/executorch/actions/runs/14002229786/job/39210732212) ~80m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214394644) ~ 44m * build-demo-ios: [BEFORE](https://github.com/pytorch/executorch/actions/runs/14003433493/job/39213882743) ~ 55m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214390955) ~23m * Apple perf workflow: * build-benchmark-app: [BEFORE](https://github.com/pytorch/executorch/actions/runs/13982706236/job/39208203350) ~80m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001347585/job/39214401072) ~48m * export model (llama): [BEFORE](https://github.com/pytorch/executorch/actions/runs/13982706236/job/39150917351) ~30m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001347585/job/39214401617) ~13m * All MacOS jobs in pull and trunk: * BEFORE ~417 on commit b195ed9 → AFTER ~268m Overall, I'm seeing the duration for all MacOS jobs reducing by close to 2x. This is very useful to reduce the cost running MacOS jobs (remember the budget request to OSS team because of the $$$ GitHub MacOS runners)
No targets and currently isn't being run
Owner
|
LGTM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merged executorch main branch into openvino_backend and resolved conflicts.