Skip to content

Fix merge conflicts#37

Merged
ynimmaga merged 585 commits intoynimmaga:openvino_backendfrom
cavusmustafa:fix_merge_conflicts
Mar 24, 2025
Merged

Fix merge conflicts#37
ynimmaga merged 585 commits intoynimmaga:openvino_backendfrom
cavusmustafa:fix_merge_conflicts

Conversation

@cavusmustafa
Copy link
Copy Markdown
Collaborator

Merged executorch main branch into openvino_backend and resolved conflicts.

dpalmasan and others added 30 commits March 11, 2025 20:28
Differential Revision: D71001041

Pull Request resolved: pytorch#9168
Adds TOSA support for logical not in Arm backend.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

Signed-off-by: Måns Nilsson <mans.nilsson@arm.com>
Co-authored-by: Yufeng Shi <yufeng.shi@arm.com>
Summary:
Fix a typo in executorch documentation

https://pytorch.org/executorch/main/backends-xnnpack.html

Reviewed By: cccclai

Differential Revision: D70645356

Co-authored-by: Frank Yu <frankyu@meta.com>
### Summary
Just to make it consistent with its `linux` counterpart, let's update
the reference from M1 to just macOS.

### Test plan

CI

cc @larryliu0820 @lucylq
The old value of 4 min was to restrictive when running bigger models on
some machines and caused github runners to fail sometimes.


cc @digantdesai @freddan80 @per @oscarandersson8218

Signed-off-by: Zingo Andersen <zingo.andersen@arm.com>
…torch#9083)

This makes it more easy to run the scripts from various tools like your
editor.

Signed-off-by: Zingo Andersen <zingo.andersen@arm.com>
- Remove xfails
- Refactor test_mm to use testing pipelines

Signed-off-by: Erik Lundell <erik.lundell@arm.com>
### Summary
- add SXR2330P

### Test plan
```bash
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator -s $SERIAL_NO -m SM8650 -b build-android
```
Needed after extension llm third-party deps moved into tokenizers subdir
### Summary
Update XNNPACK backend doc page to cover quantization schemes and
generally clean up the format. Update backend template doc with
additional detail.

### Test plan
Built docs locally and verified contents.

cc @mergennachin @byjlw
### Summary
We don't need to duplicate the deps source.

### Test plan

CI
…ytorch#9190)

This reverts commit 05a160e.

Revert "Revert "Make serial parallel_for "polyfill" iterate backwards in
debug builds (pytorch#9044)""

This reverts commit 815eaff.

Revert "Revert "Unbreak optimized kernels buck build (and check it in
unittest-buck) (pytorch#9159)""

This reverts commit 10bb615.
Needed to efficiently use parallel_for with BroadcastIndexesRange.
…ch#9058)

Now all the apply functions share a common implementation, which means
further changes (e.g., parallel_for, generating specialized dtypes for
the case where all inputs have the same type) don't need to be repeated
3 times.

(Interestingly, this seems to increase the effectiveness of the
following parallelization change. Not entirely sure why, but I checked
the generated code for optimized op_where and it seems to have improved,
which is surprising.)
Internal model got a 5.7% latency improvement (313.8 ms before, 296.0 ms
after).
Not sure why the threadpool extension isn't mentioned or built in here;
we should follow up on that.
… args (pytorch#9203)

This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#9173 by
@SS-JIA
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/196/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/196/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/196/orig
@diff-train-skip-merge

Co-authored-by: Stephen Jia <ssjia@meta.com>
CI is assigning Python 3.9 machines, but ET requires 3.10.
Use old one. No need to use new API
Differential Revision: D71023514

Pull Request resolved: pytorch#9196
This is step (5) of pytorch#8932.

At this exact moment, this rebuild is inefficient because it rebuilds
the whole portable op library, but ops don't support optional
parallelization just yet. This will become less true when we roll out
parallel_for support across portable ops immediately following this PR.
I attempted to port `at::parallel_reduce` to ExecuTorch and use that
in reduce_util.h, but it turned out to be much trickier than expected.

(In brief: parallel reduction requires two steps: 1) split the input
range into chunks and reduce over them (easily done like
parallel_for), and then 2) combine the sub-results from chunks. The
reduction function accepted by reduce_over_dim is not well-suited to
step (2).)

Instead, I ported the parallelization strategy used by
binary_kernel_reduce_lastdim: just parallelize over the *non*-reduced
dimensions of the tensor. I don't understand why this strategy isn't
generally applicable and we aren't otherwise capable of parallelizing
reductions, so I haven't gated it to the case where we are reducing
over a contiguous last dimension.

I will send a follow-up that packages up this strategy nicely and uses
it in our reduction portable ops.
Everything but the Python test (which depends on //caffe2:torch) is
fine.
Differential Revision: D71073675

Pull Request resolved: pytorch#9206
jackzhxng and others added 26 commits March 22, 2025 01:31
Differential Revision: D70184325

Pull Request resolved: pytorch#8488
Differential Revision: D71404805

Pull Request resolved: pytorch#9456
Differential Revision: D71634148

Pull Request resolved: pytorch#9522
### Summary
- QC backend changes for adopting LPBQ
- test case: conv2d 16a4w
- refactor a bit

### Test plan
```bash
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_conv2d_block -s $SERIAL_NO -m SM8650 -b build-android
```
Differential Revision: D71699118

Pull Request resolved: pytorch#9529
Differential Revision: D71698626

Pull Request resolved: pytorch#9528
Differential Revision: D71591385

Pull Request resolved: pytorch#9493
…ytorch#9489)

Note that this includes ops that are not currently implemented. These
ops are added for completeness.

Signed-off-by: Erik Lundell <erik.lundell@arm.com>
- Change constant test data to random generators.
- Tests on Ethos-U55 are meant to xfail as int 16 tables are currently
not supported.
- For other tests, add flaky marker. Remove increased qtol, since the
inaccuracies only show up sporadically.

Signed-off-by: Erik Lundell <erik.lundell@arm.com>
We've been seeing flakes due to missing torchgen that have become more
common. After some investigation, it appears that pytorch#8688 was probably
overzealous: installing pytorch was probably also installing torchgen,
so let's ~~install pytorch~~just run the macos setup script to avoid
proliferating configurations.

Test Plan: unittest-buck / macos on this PR, monitor to see if failures
go away.
Differential Revision: D71699919

Pull Request resolved: pytorch#9530
…9434)

### Summary
The final diff as part of
pytorch#9117. This is the big one
that affects users — we finally move the core build scripts into
`scripts/`

### Test plan

CI

cc @larryliu0820 @lucylq
Differential Revision: D71667998

Pull Request resolved: pytorch#9523
Arm backend: support for CEIL op
- Update unary operator factory with CEIL op
- Rename and refactor test_floor to handle similar ops

Signed-off-by: Madeleine Dunn <madeleine.dunn@arm.com>
Differential Revision: D71713725

Pull Request resolved: pytorch#9546
…9548)

Updates torchao pin to enable shared embedding quantization.
Fix the maven deps. Should use fbjni.
Add AndroidManifest.xml for test.
Add pytorch#9354 by
[Inklingdq](https://github.com/Inklingdq) which was accidentally merged
to `viable/strict` to `main`.

Co-authored-by: Inkling <64665980+Inklingdq@users.noreply.github.com>
One of the current drawback of using pinned PyTorch commit on CI is that
we need to build PyTorch wheel on all MacOS jobs because it doesn't have
Docker image. Building PyTorch wheel is usually not too bad because we
have sccache in place to make the compilation faster. However, it's
still slower than using a prebuilt wheel, and sccache is also not
available on GitHub MacOS runner `macos-latest-xlarge` (no access to
S3).

As all MacOS jobs are building exactly the same PyTorch wheel, the
proposal here is to cache the wheel on S3 `gha-artifacts` bucket which
is publicly readable, i.e.
https://gha-artifacts.s3.us-east-1.amazonaws.com/cached_artifacts/pytorch/executorch/pytorch_wheels/Darwin/311/torch-2.7.0a0%2Bgit295f2ed-cp311-cp311-macosx_14_0_arm64.whl.
The job can check for matching wheel from S3 and use it instead. If
there is no such wheel, it will continue building PyTorch normally. Once
a new wheel is built and if the runner has write access to S3, it will
upload the wheel so that other jobs can pick it up going forward.

### Testing

All CI jobs pass (failures are pre-existing from trunk). Here are some
quick number on how this helps reduce the durations of different MacOS
jobs.

* Apple workflow:
* build-benchmark-app:
[BEFORE](https://github.com/pytorch/executorch/actions/runs/14002229786/job/39210715922)
~80m →
[AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214390843)
~44m
* build-frameworks-ios:
[BEFORE](https://github.com/pytorch/executorch/actions/runs/14002229786/job/39210732212)
~80m →
[AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214394644)
~ 44m
* build-demo-ios:
[BEFORE](https://github.com/pytorch/executorch/actions/runs/14003433493/job/39213882743)
~ 55m →
[AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214390955)
~23m
* Apple perf workflow:
* build-benchmark-app:
[BEFORE](https://github.com/pytorch/executorch/actions/runs/13982706236/job/39208203350)
~80m →
[AFTER](https://github.com/pytorch/executorch/actions/runs/14001347585/job/39214401072)
~48m
* export model (llama):
[BEFORE](https://github.com/pytorch/executorch/actions/runs/13982706236/job/39150917351)
~30m →
[AFTER](https://github.com/pytorch/executorch/actions/runs/14001347585/job/39214401617)
~13m
* All MacOS jobs in pull and trunk:
* BEFORE ~417 on commit b195ed9 → AFTER
~268m

Overall, I'm seeing the duration for all MacOS jobs reducing by close to
2x. This is very useful to reduce the cost running MacOS jobs (remember
the budget request to OSS team because of the $$$ GitHub MacOS runners)
No targets and currently isn't being run
@ynimmaga
Copy link
Copy Markdown
Owner

LGTM

@ynimmaga ynimmaga merged commit 0030fb9 into ynimmaga:openvino_backend Mar 24, 2025
13 of 261 checks passed
@cavusmustafa cavusmustafa had a problem deploying to upload-benchmark-results March 26, 2025 22:33 — with GitHub Actions Error
@cavusmustafa cavusmustafa had a problem deploying to upload-benchmark-results March 26, 2025 22:33 — with GitHub Actions Error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.