Conversation
keryell
left a comment
There was a problem hiding this comment.
There are some ideas but think generic & functional programming! :-)
|
The |
| constexpr float infinity = std::numeric_limits<float>::infinity(); | ||
| constexpr float pi = 3.1415926535897932385f; | ||
| constexpr real_t infinity = std::numeric_limits<real_t>::infinity(); | ||
| constexpr real_t pi = 3.1415926535897932385f; |
There was a problem hiding this comment.
literals are still assuming float which means that if we change real_t to something else we will get some conversions. maybe make our own user-defined literal could help abstract this.
real_t operator "" _r(long double d) { return d; }
this would force conversion to be on the literals directly such that they can be constant folded
There was a problem hiding this comment.
Just use constinit.
I guess a lot of constexpr should be constinit for FPGA. To mention in the poster...
keryell
left a comment
There was a problem hiding this comment.
Great work!
But I am not convinced by the new design.
I cannot see a reason to use this explicit visitor pattern from a pre-modern C++ or Java world when there was no generic lambdas...
You have created a new explicit visitor with all the object-dependent implementation in it breaking the encapsulation by in-lining here all what was in the hit function before.
After looking at the code now with the increased complexity with new kitchen-sink visitor requiring direct access to public object members or über-friendship, I am even less convinced...
Having a distributed hit like before does prevent to add some hierarchy if you have 2 different implementations of sphere for example, by having a hit implemented in some sphere specific class which is inherited by the various spheres.
To avoid blocking this PR I suggest to split it in 2, by moving this complex refactoring in a new one where we can think more about it.
Thanks for this experimentation! It would be interesting if there is any impact on performance and QoR now we have it...
| } | ||
| }; | ||
| } // namespace raytracer::visitor | ||
| #endif No newline at end of file |
There was a problem hiding this comment.
Missing end-of-line.
Yes, this is good to have this in its own file.
There was a problem hiding this comment.
Curious... An IDE configuration bug?
| point p; // hit point | ||
| vec normal; // normal at hit point | ||
| bool front_face; // to check if hit point is on the outer surface | ||
| /*local coordinates for rectangles | ||
| and mercator coordinates for spheres */ | ||
| real_t u; | ||
| real_t v; | ||
|
|
||
| // To set if the hit point is on the front face |
There was a problem hiding this comment.
While you are here, add more Doxygen
| point p; // hit point | |
| vec normal; // normal at hit point | |
| bool front_face; // to check if hit point is on the outer surface | |
| /*local coordinates for rectangles | |
| and mercator coordinates for spheres */ | |
| real_t u; | |
| real_t v; | |
| // To set if the hit point is on the front face | |
| point p; ///< Hit point | |
| vec normal; ///< Normal at hit point | |
| bool front_face; ///< To check if hit point is on the outer surface | |
| /// Local coordinates for rectangles and mercator coordinates for spheres | |
| real_t u; | |
| real_t v; | |
| /// To set if the hit point is on the front face |
|
Discussion about the refactoring can be continued on #43 |
keryell
left a comment
There was a problem hiding this comment.
Functional + generic C++ = Zen++ :-)
keryell
left a comment
There was a problem hiding this comment.
By looking at the code I wonder whether we cannot merge dev_visit(monostate_dispatch into a simpler monostate_visit including the visit in the implementation.
|
Where are we about this? |
|
@lforg37 what's up with this? |
|
The current version is based on this branch, commits have been added. |
| bool hit(const ray& r, real_t min, real_t max, hit_record& rec, | ||
| material_t& hit_material_type) const { | ||
| hit_record temp_rec; | ||
| material_t temp_material_type; |
There was a problem hiding this comment.
I have the feeling that if we remove the std::monostate from the material it will always create a sphere here even when there is no intersection.
| /// of the ray hitting a smoke particle | ||
| const auto distance_inside_boundary = (rec2.t - rec1.t) * ray_length; | ||
| const auto hit_distance = neg_inv_density * sycl::log(rng.float_t()); | ||
| auto rng = LocalPseudoRNG { toseed(r.direction()) }; |
There was a problem hiding this comment.
Ah, I realize you have removed the context passed everywhere and replaced basically the random generator with some local computation.
What was the rationale?
There was a problem hiding this comment.
As getting a value from the RNG update its internal state, having a shared RNG creates a very long chain of read after write dependency that prevent the HLS compiler to parallelize the otherwise independent computation steps.
There was a problem hiding this comment.
Perhaps we could have this context with some special HLS decorations to ignore some dependencies.
There was a problem hiding this comment.
There was a problem hiding this comment.
Otherwise, perhaps something like this pseudo HLS-SYCL code:
auto rng(auto local_seed = std::source_location::current()) {
return [=] {
static auto state = local_seed.line();
#pragma HLS dependence variable=state inter false
state = crunch(state);
return mix(state);
};
}or
auto rng(auto local_seed = std::source_location::current()) {
return [state = local_seed.line()] mutable {
#pragma HLS dependence variable=state inter false
state = crunch(state);
return mix(state);
};
}Perhaps we need extension intel/llvm#3746 to have the big picture working.
There was a problem hiding this comment.
Above is basically a good idea, if I understand the intention, to separate the state into multiple independent RNGs. I'm not sure what you're trying to achieve with the false dependence pragma, however.
There was a problem hiding this comment.
@stephenneuendorffer yes this is the idea to distribute the state here in a lazy way to even initialize the seed with something depending from the source location (here with the line number, we could also use the file name, etc.).
I has put probably too much here with the
#pragma HLS dependence variable=state inter falseas a way to have an II=1 without consuming too much latency at the price of a semantics change in the code and hoping we still have enough randomness...
But I agree this is a second order optimization if we are still in trouble...
Perhaps this code will not work with SYCL-HLS because it uses some function pointer. If so, we can try with a macro to avoid returning the lambda, or perhaps just have a macro which inline a random generator wherever we want one... Ahem... :-)
There was a problem hiding this comment.
I made some comparisons between this version and the main one.
You have not update the README about how to run the program.
It looks like time ./sycl-rt 800 480 50 100 is a way to get the same behaviour as before.
It is a little bit faster than before, I guess because there are some parts of the variant which have been removed...
Also I wonder how it is possible to have the motion blur working without a random generator... by having the time or motion parameter included in the hash? It starts being painful...
| uint32_t shifted4 = (y2 & 63) << 10; | ||
| uint32_t shifted5 = (z1 & 63) << 5; | ||
| uint32_t shifted6 = (z2 & 63); | ||
| return shifted1 ^ shifted2 ^ shifted3 ^ shifted4 ^ shifted5 ^ shifted6; |
There was a problem hiding this comment.
Ah I read too quickly and was thinking to 63 for 5 bits...
Actually that seems like a good idea then to have an xor. Would more than 6 bits better than? 127 or 255?
keryell
left a comment
There was a problem hiding this comment.
There are still some super old comments and a merge conflict
|
@lforg37 where are we on this? |
| } | ||
| }; | ||
| } // namespace raytracer::visitor | ||
| #endif No newline at end of file |
There was a problem hiding this comment.
Curious... An IDE configuration bug?
| auto& rng = ctx.rng; | ||
| bool scatter(auto& ctx, const ray& r_in, const hit_record& rec, color& attenuation, | ||
| ray& scattered) const { | ||
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
There was a problem hiding this comment.
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; | |
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
| auto& rng = ctx.rng; | ||
| bool scatter(auto&, const ray& r_in, const hit_record& rec, color& attenuation, | ||
| ray& scattered) const { | ||
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
There was a problem hiding this comment.
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; | |
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
| // Attenuation of the ray hitting the object is modified based on the color | ||
| // at hit point | ||
| auto& rng = ctx.rng; | ||
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
There was a problem hiding this comment.
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; | |
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
| auto& rng = ctx.rng; | ||
| bool scatter(auto& ctx, const ray& r_in, const hit_record& rec, color& attenuation, | ||
| ray& scattered) const { | ||
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
There was a problem hiding this comment.
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; | |
| LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
I do not know why I am getting bored here...
| std::memcpy(&x2, &val2.x(), sizeof(uint32_t)); | ||
| std::memcpy(&y2, &val2.y(), sizeof(uint32_t)); | ||
| std::memcpy(&z2, &val2.z(), sizeof(uint32_t)); | ||
| uint32_t shifted1 = x1 << 26; |
| uint32_t toseed(vec const& val1, vec const& val2) { | ||
| uint32_t x1, y1, z1, x2, y2, z2; | ||
| std::memcpy(&x1, &val1.x(), sizeof(uint32_t)); | ||
| std::memcpy(&y1, &val1.y(), sizeof(uint32_t)); |
There was a problem hiding this comment.
| std::memcpy(&y1, &val1.y(), sizeof(uint32_t)); | |
| std::memcpy(&y1, &val1.y(), sizeof y1); |
everywhere.
This is clearer to use the type of the destination. Also it is shorter since you can avoid () with sizeof expression. :-)
| #include <algorithm> | ||
| #include <chrono> | ||
| #include <cstdint> | ||
| #include <ctime> |
| @return uint32_t | ||
| */ | ||
| uint32_t toseed(vec const& val1, vec const& val2) { | ||
| uint32_t x1, y1, z1, x2, y2, z2; |
There was a problem hiding this comment.
You can use the C++ version everywhere instead of the C version : std::uint32_t
| int main(int argc, char* argv[]) { | ||
| if (argc < 5 || argc > 7) { | ||
| std::cerr << "Usage: sycl-rt OUT_WIDTH OUT_HEIGHT DEPTH SAMPLES " | ||
| "[SPHERE_INC [RAND_SEED]]" |
There was a problem hiding this comment.
Update the README to describe this new API
Add monostate in front of each variant (#41), replace remaining
floatwithreal_t